From patchwork Tue Feb 26 14:29:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10830367 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BC8831669 for ; Tue, 26 Feb 2019 14:29:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA88D2A083 for ; Tue, 26 Feb 2019 14:29:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9DFC92C525; Tue, 26 Feb 2019 14:29:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9C5B62A083 for ; Tue, 26 Feb 2019 14:29:25 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9541A89E8C; Tue, 26 Feb 2019 14:29:24 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id E780889E3F for ; Tue, 26 Feb 2019 14:29:20 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2019 06:29:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,415,1544515200"; d="scan'208";a="150113354" Received: from delly.ld.intel.com ([10.103.238.201]) by fmsmga001.fm.intel.com with ESMTP; 26 Feb 2019 06:29:19 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Tue, 26 Feb 2019 14:29:03 +0000 Message-Id: <20190226142911.9789-2-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> References: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 1/9] drm/i915/perf: rework aging tail workaround X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We're about to introduce an options to open the perf stream, giving the user ability to configure how often it wants the kernel to poll the OA registers for available data. Right now the workaround against the OA tail pointer race condition requires at least twice the internal kernel polling timer to make any data available. This changes introduce checks on the OA data written into the circular buffer to make as much data as possible available on the first iteration of the polling timer. v2: Use OA_TAKEN macro without the gtt_offset (Lionel) Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 32 ++--- drivers/gpu/drm/i915/i915_perf.c | 200 ++++++++++++++----------------- 2 files changed, 103 insertions(+), 129 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index cc09caf3870e..feb0a377f353 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1892,6 +1892,12 @@ struct drm_i915_private { */ struct ratelimit_state spurious_report_rs; + /** + * For rate limiting any notifications of tail pointer + * race. + */ + struct ratelimit_state tail_pointer_race; + bool periodic; int period_exponent; @@ -1932,23 +1938,11 @@ struct drm_i915_private { spinlock_t ptr_lock; /** - * One 'aging' tail pointer and one 'aged' - * tail pointer ready to used for reading. - * - * Initial values of 0xffffffff are invalid - * and imply that an update is required - * (and should be ignored by an attempted - * read) - */ - struct { - u32 offset; - } tails[2]; - - /** - * Index for the aged tail ready to read() - * data up to. + * The last HW tail reported by HW. The data + * might not have made it to memory yet + * though. */ - unsigned int aged_tail_idx; + u32 aging_tail; /** * A monotonic timestamp for when the current @@ -1967,6 +1961,12 @@ struct drm_i915_private { * data to userspace. */ u32 head; + + /** + * The last tail verified tail that can be + * read by userspace. + */ + u32 tail; } oa_buffer; u32 gen7_latched_oastatus1; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 9ebf99f3d8d3..4687ab719fa7 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -233,23 +233,14 @@ * for this earlier, as part of the oa_buffer_check to avoid lots of redundant * read() attempts. * - * In effect we define a tail pointer for reading that lags the real tail - * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough - * time for the corresponding reports to become visible to the CPU. - * - * To manage this we actually track two tail pointers: - * 1) An 'aging' tail with an associated timestamp that is tracked until we - * can trust the corresponding data is visible to the CPU; at which point - * it is considered 'aged'. - * 2) An 'aged' tail that can be used for read()ing. - * - * The two separate pointers let us decouple read()s from tail pointer aging. - * - * The tail pointers are checked and updated at a limited rate within a hrtimer - * callback (the same callback that is used for delivering EPOLLIN events) - * - * Initially the tails are marked invalid with %INVALID_TAIL_PTR which - * indicates that an updated tail pointer is needed. + * We workaround this issue in oa_buffer_check() by reading the reports in the + * OA buffer, starting from the tail reported by the HW until we find 2 + * consecutive reports with their first 2 dwords of not at 0. Those dwords are + * also set to 0 once read and the whole buffer is cleared upon OA buffer + * initialization. The first dword is the reason for this report while the + * second is the timestamp, making the chances of having those 2 fields at 0 + * fairly unlikely. A more detailed explanation is available in + * oa_buffer_check(). * * Most of the implementation details for this workaround are in * oa_buffer_check_unlocked() and _append_oa_reports() @@ -262,7 +253,6 @@ * enabled without any periodic sampling. */ #define OA_TAIL_MARGIN_NSEC 100000ULL -#define INVALID_TAIL_PTR 0xffffffff /* frequency for checking whether the OA unit has written new reports to the * circular OA buffer... @@ -449,10 +439,10 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv) */ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) { + u32 gtt_offset = i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma); int report_size = dev_priv->perf.oa.oa_buffer.format_size; unsigned long flags; - unsigned int aged_idx; - u32 head, hw_tail, aged_tail, aging_tail; + u32 hw_tail; u64 now; /* We have to consider the (unlikely) possibility that read() errors @@ -461,16 +451,6 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) */ spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); - /* NB: The head we observe here might effectively be a little out of - * date (between head and tails[aged_idx].offset if there is currently - * a read() in progress. - */ - head = dev_priv->perf.oa.oa_buffer.head; - - aged_idx = dev_priv->perf.oa.oa_buffer.aged_tail_idx; - aged_tail = dev_priv->perf.oa.oa_buffer.tails[aged_idx].offset; - aging_tail = dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset; - hw_tail = dev_priv->perf.oa.ops.oa_hw_tail_read(dev_priv); /* The tail pointer increases in 64 byte increments, @@ -480,63 +460,75 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) now = ktime_get_mono_fast_ns(); - /* Update the aged tail - * - * Flip the tail pointer available for read()s once the aging tail is - * old enough to trust that the corresponding data will be visible to - * the CPU... - * - * Do this before updating the aging pointer in case we may be able to - * immediately start aging a new pointer too (if new data has become - * available) without needing to wait for a later hrtimer callback. - */ - if (aging_tail != INVALID_TAIL_PTR && - ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) > - OA_TAIL_MARGIN_NSEC)) { - - aged_idx ^= 1; - dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx; + if (hw_tail == dev_priv->perf.oa.oa_buffer.aging_tail) { + /* If the HW tail hasn't move since the last check and the HW + * tail has been aging for long enough, declare it the new + * tail. + */ + if ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) > + OA_TAIL_MARGIN_NSEC) { + dev_priv->perf.oa.oa_buffer.tail = + dev_priv->perf.oa.oa_buffer.aging_tail; + } + } else { + u32 head, tail, landed_report_heads; - aged_tail = aging_tail; + /* NB: The head we observe here might effectively be a little out of + * date (between head and tails[aged_idx].offset if there is currently + * a read() in progress. + */ + head = dev_priv->perf.oa.oa_buffer.head - gtt_offset; - /* Mark that we need a new pointer to start aging... */ - dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR; - aging_tail = INVALID_TAIL_PTR; - } + hw_tail -= gtt_offset; + tail = hw_tail; - /* Update the aging tail - * - * We throttle aging tail updates until we have a new tail that - * represents >= one report more data than is already available for - * reading. This ensures there will be enough data for a successful - * read once this new pointer has aged and ensures we will give the new - * pointer time to age. - */ - if (aging_tail == INVALID_TAIL_PTR && - (aged_tail == INVALID_TAIL_PTR || - OA_TAKEN(hw_tail, aged_tail) >= report_size)) { - struct i915_vma *vma = dev_priv->perf.oa.oa_buffer.vma; - u32 gtt_offset = i915_ggtt_offset(vma); - - /* Be paranoid and do a bounds check on the pointer read back - * from hardware, just in case some spurious hardware condition - * could put the tail out of bounds... + /* Walk the stream backward until we find at least 2 reports + * with dword 0 & 1 not at 0. Since the circular buffer + * pointers progress by increments of 64 bytes and that + * reports can be up to 256 bytes long, we can't tell whether + * a report has fully landed in memory before the first 2 + * dwords of the following report have effectively landed. + * + * This is assuming that the writes of the OA unit land in + * memory in the order they were written to. + * If not : (╯°□°)╯︵ ┻━┻ */ - if (hw_tail >= gtt_offset && - hw_tail < (gtt_offset + OA_BUFFER_SIZE)) { - dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = - aging_tail = hw_tail; - dev_priv->perf.oa.oa_buffer.aging_timestamp = now; - } else { - DRM_ERROR("Ignoring spurious out of range OA buffer tail pointer = %u\n", - hw_tail); + landed_report_heads = 0; + while (OA_TAKEN(tail, head) >= report_size) { + u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1); + u8 *report = dev_priv->perf.oa.oa_buffer.vaddr + previous_tail; + u32 *report32 = (void *) report; + + /* Head of the report indicated by the HW tail register has + * indeed landed into memory. + */ + if (report32[0] != 0 || report[1] != 0) { + landed_report_heads++; + + if (landed_report_heads >= 2) + break; + } + + tail = previous_tail; + } + + if (abs(tail - hw_tail) >= (2 * report_size)) { + if (__ratelimit(&dev_priv->perf.oa.tail_pointer_race)) { + DRM_NOTE("unlanded report(s) head=0x%x " + "tail=0x%x hw_tail=0x%x\n", + head, tail, hw_tail); + } } + + dev_priv->perf.oa.oa_buffer.tail = gtt_offset + tail; + dev_priv->perf.oa.oa_buffer.aging_tail = gtt_offset + hw_tail; + dev_priv->perf.oa.oa_buffer.aging_timestamp = now; } spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); - return aged_tail == INVALID_TAIL_PTR ? - false : OA_TAKEN(aged_tail, head) >= report_size; + return OA_TAKEN(dev_priv->perf.oa.oa_buffer.tail - gtt_offset, + dev_priv->perf.oa.oa_buffer.head - gtt_offset) >= report_size; } /** @@ -655,7 +647,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, u32 mask = (OA_BUFFER_SIZE - 1); size_t start_offset = *offset; unsigned long flags; - unsigned int aged_tail_idx; u32 head, tail; u32 taken; int ret = 0; @@ -666,18 +657,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); head = dev_priv->perf.oa.oa_buffer.head; - aged_tail_idx = dev_priv->perf.oa.oa_buffer.aged_tail_idx; - tail = dev_priv->perf.oa.oa_buffer.tails[aged_tail_idx].offset; + tail = dev_priv->perf.oa.oa_buffer.tail; spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); - /* - * An invalid tail pointer here means we're still waiting for the poll - * hrtimer callback to give us a pointer - */ - if (tail == INVALID_TAIL_PTR) - return -EAGAIN; - /* * NB: oa_buffer.head/tail include the gtt_offset which we don't want * while indexing relative to oa_buf_base. @@ -806,13 +789,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, } /* - * The above reason field sanity check is based on - * the assumption that the OA buffer is initially - * zeroed and we reset the field after copying so the - * check is still meaningful once old reports start - * being overwritten. + * Clear out the first 2 dword as a mean to detect unlanded + * reports. */ - report32[0] = 0; + report32[0] = report32[1] = 0; } if (start_offset != *offset) { @@ -944,7 +924,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, u32 mask = (OA_BUFFER_SIZE - 1); size_t start_offset = *offset; unsigned long flags; - unsigned int aged_tail_idx; u32 head, tail; u32 taken; int ret = 0; @@ -955,17 +934,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); head = dev_priv->perf.oa.oa_buffer.head; - aged_tail_idx = dev_priv->perf.oa.oa_buffer.aged_tail_idx; - tail = dev_priv->perf.oa.oa_buffer.tails[aged_tail_idx].offset; + tail = dev_priv->perf.oa.oa_buffer.tail; spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); - /* An invalid tail pointer here means we're still waiting for the poll - * hrtimer callback to give us a pointer - */ - if (tail == INVALID_TAIL_PTR) - return -EAGAIN; - /* NB: oa_buffer.head/tail include the gtt_offset which we don't want * while indexing relative to oa_buf_base. */ @@ -1020,13 +992,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, if (ret) break; - /* The above report-id field sanity check is based on - * the assumption that the OA buffer is initially - * zeroed and we reset the field after copying so the - * check is still meaningful once old reports start - * being overwritten. + /* Clear out the first 2 dwords as a mean to detect unlanded + * reports. */ - report32[0] = 0; + report32[0] = report32[1] = 0; } if (start_offset != *offset) { @@ -1397,8 +1366,8 @@ static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv) I915_WRITE(GEN7_OASTATUS1, gtt_offset | OABUFFER_SIZE_16M); /* tail */ /* Mark that we need updated tail pointers to read from... */ - dev_priv->perf.oa.oa_buffer.tails[0].offset = INVALID_TAIL_PTR; - dev_priv->perf.oa.oa_buffer.tails[1].offset = INVALID_TAIL_PTR; + dev_priv->perf.oa.oa_buffer.aging_tail = + dev_priv->perf.oa.oa_buffer.tail = gtt_offset; spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); @@ -1453,8 +1422,8 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv) I915_WRITE(GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK); /* Mark that we need updated tail pointers to read from... */ - dev_priv->perf.oa.oa_buffer.tails[0].offset = INVALID_TAIL_PTR; - dev_priv->perf.oa.oa_buffer.tails[1].offset = INVALID_TAIL_PTR; + dev_priv->perf.oa.oa_buffer.aging_tail = + dev_priv->perf.oa.oa_buffer.tail = gtt_offset; /* * Reset state used to recognise context switches, affecting which @@ -2048,6 +2017,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, ratelimit_set_flags(&dev_priv->perf.oa.spurious_report_rs, RATELIMIT_MSG_ON_RELEASE); + ratelimit_state_init(&dev_priv->perf.oa.tail_pointer_race, + 5 * HZ, 10); + ratelimit_set_flags(&dev_priv->perf.oa.tail_pointer_race, + RATELIMIT_MSG_ON_RELEASE); + stream->sample_size = sizeof(struct drm_i915_perf_record_header); format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size; From patchwork Tue Feb 26 14:29:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10830365 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E2FD13B5 for ; Tue, 26 Feb 2019 14:29:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CECA2A083 for ; Tue, 26 Feb 2019 14:29:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 410402AF10; Tue, 26 Feb 2019 14:29:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EE1B02A083 for ; Tue, 26 Feb 2019 14:29:24 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 405AB89E3F; Tue, 26 Feb 2019 14:29:24 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id E9AB489E3F for ; Tue, 26 Feb 2019 14:29:21 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2019 06:29:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,415,1544515200"; d="scan'208";a="150113360" Received: from delly.ld.intel.com ([10.103.238.201]) by fmsmga001.fm.intel.com with ESMTP; 26 Feb 2019 06:29:20 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Tue, 26 Feb 2019 14:29:04 +0000 Message-Id: <20190226142911.9789-3-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> References: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 2/9] drm/i915/perf: move pollin setup to non hw specific code X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP This isn't really gen specific stuff, so just move it to the common code. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 4687ab719fa7..55d25255bd67 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1389,11 +1389,6 @@ static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv) * memory... */ memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE); - - /* Maybe make ->pollin per-stream state if we support multiple - * concurrent streams in the future. - */ - dev_priv->perf.oa.pollin = false; } static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv) @@ -1447,12 +1442,6 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv) * memory... */ memset(dev_priv->perf.oa.oa_buffer.vaddr, 0, OA_BUFFER_SIZE); - - /* - * Maybe make ->pollin per-stream state if we support multiple - * concurrent streams in the future. - */ - dev_priv->perf.oa.pollin = false; } static int alloc_oa_buffer(struct drm_i915_private *dev_priv) @@ -1881,6 +1870,12 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; + /* + * Maybe make ->pollin per-stream state if we support multiple + * concurrent streams in the future. + */ + dev_priv->perf.oa.pollin = false; + dev_priv->perf.oa.ops.oa_enable(stream); if (dev_priv->perf.oa.periodic) From patchwork Tue Feb 26 14:29:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10830371 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41EA113B5 for ; Tue, 26 Feb 2019 14:29:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F5E22B667 for ; Tue, 26 Feb 2019 14:29:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 23AF72C52D; Tue, 26 Feb 2019 14:29:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C5EDB2B667 for ; Tue, 26 Feb 2019 14:29:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6C3AA89EBD; Tue, 26 Feb 2019 14:29:27 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 01DF689E3F for ; Tue, 26 Feb 2019 14:29:22 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2019 06:29:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,415,1544515200"; d="scan'208";a="150113365" Received: from delly.ld.intel.com ([10.103.238.201]) by fmsmga001.fm.intel.com with ESMTP; 26 Feb 2019 06:29:22 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Tue, 26 Feb 2019 14:29:05 +0000 Message-Id: <20190226142911.9789-4-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> References: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 3/9] drm/i915/perf: only append status when data is available X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP The only bit of the status register we currently report in the i915-perf stream is the "report loss" bit. Only report this when we have some data to report with it. There was a kind of inconsistency here in that we could report report loss without appending the reports associated with the loss. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 54 ++++++++++++++++++++------------ 1 file changed, 34 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 55d25255bd67..4504d4e18633 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -636,6 +636,7 @@ static int append_oa_sample(struct i915_perf_stream *stream, * Returns: 0 on success, negative error code on failure. */ static int gen8_append_oa_reports(struct i915_perf_stream *stream, + u32 oastatus, char __user *buf, size_t count, size_t *offset) @@ -681,6 +682,21 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, head, tail)) return -EIO; + /* + * If there is nothing to read, don't append the status report yet, + * wait until we have some data available. + */ + if (!OA_TAKEN(tail, head)) + return 0; + + if (oastatus & GEN8_OASTATUS_REPORT_LOST) { + ret = append_oa_status(stream, buf, count, offset, + DRM_I915_PERF_RECORD_OA_REPORT_LOST); + if (ret) + return ret; + I915_WRITE(GEN8_OASTATUS, + oastatus & ~GEN8_OASTATUS_REPORT_LOST); + } for (/* none */; (taken = OA_TAKEN(tail, head)); @@ -880,16 +896,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream, oastatus = I915_READ(GEN8_OASTATUS); } - if (oastatus & GEN8_OASTATUS_REPORT_LOST) { - ret = append_oa_status(stream, buf, count, offset, - DRM_I915_PERF_RECORD_OA_REPORT_LOST); - if (ret) - return ret; - I915_WRITE(GEN8_OASTATUS, - oastatus & ~GEN8_OASTATUS_REPORT_LOST); - } - - return gen8_append_oa_reports(stream, buf, count, offset); + return gen8_append_oa_reports(stream, oastatus, buf, count, offset); } /** @@ -913,6 +920,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream, * Returns: 0 on success, negative error code on failure. */ static int gen7_append_oa_reports(struct i915_perf_stream *stream, + u32 oastatus1, char __user *buf, size_t count, size_t *offset) @@ -956,6 +964,21 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, head, tail)) return -EIO; + /* + * If there is nothing to read, don't append the status report yet, + * wait until we have some data available. + */ + if (!OA_TAKEN(tail, head)) + return 0; + + if (unlikely(oastatus1 & GEN7_OASTATUS1_REPORT_LOST)) { + ret = append_oa_status(stream, buf, count, offset, + DRM_I915_PERF_RECORD_OA_REPORT_LOST); + if (ret) + return ret; + dev_priv->perf.oa.gen7_latched_oastatus1 |= + GEN7_OASTATUS1_REPORT_LOST; + } for (/* none */; (taken = OA_TAKEN(tail, head)); @@ -1089,16 +1112,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream, oastatus1 = I915_READ(GEN7_OASTATUS1); } - if (unlikely(oastatus1 & GEN7_OASTATUS1_REPORT_LOST)) { - ret = append_oa_status(stream, buf, count, offset, - DRM_I915_PERF_RECORD_OA_REPORT_LOST); - if (ret) - return ret; - dev_priv->perf.oa.gen7_latched_oastatus1 |= - GEN7_OASTATUS1_REPORT_LOST; - } - - return gen7_append_oa_reports(stream, buf, count, offset); + return gen7_append_oa_reports(stream, oastatus1, buf, count, offset); } /** From patchwork Tue Feb 26 14:29:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10830369 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 174EA1669 for ; Tue, 26 Feb 2019 14:29:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05E052B667 for ; Tue, 26 Feb 2019 14:29:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EE07D2C52C; Tue, 26 Feb 2019 14:29:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9BDAC2B667 for ; Tue, 26 Feb 2019 14:29:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E56E189EB1; Tue, 26 Feb 2019 14:29:26 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0FEDC89E3F for ; Tue, 26 Feb 2019 14:29:24 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2019 06:29:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,415,1544515200"; d="scan'208";a="150113370" Received: from delly.ld.intel.com ([10.103.238.201]) by fmsmga001.fm.intel.com with ESMTP; 26 Feb 2019 06:29:23 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Tue, 26 Feb 2019 14:29:06 +0000 Message-Id: <20190226142911.9789-5-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> References: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 4/9] drm/i915/perf: introduce a versioning of the i915-perf uapi X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Reporting this version will help application figure out what level of the support the running kernel provides. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.c | 3 +++ include/uapi/drm/i915_drm.h | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index c6354f6cdbdb..1ce58036dbb3 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -447,6 +447,9 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_MMAP_GTT_COHERENT: value = INTEL_INFO(dev_priv)->has_coherent_ggtt; break; + case I915_PARAM_PERF_REVISION: + value = 1; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 8304a7f1ec3f..d92d6e8f2cc7 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -562,6 +562,12 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_MMAP_GTT_COHERENT 52 +/* + * Revision of the i915-perf uAPI. The value returned helps determine what + * i915-perf features are available. See drm_i915_perf_property_id. + */ +#define I915_PARAM_PERF_REVISION 53 + /* Must be kept compact -- no holes and well documented */ typedef struct drm_i915_getparam { @@ -1602,23 +1608,31 @@ enum drm_i915_perf_property_id { * Open the stream for a specific context handle (as used with * execbuffer2). A stream opened for a specific context this way * won't typically require root privileges. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_CTX_HANDLE = 1, /** * A value of 1 requests the inclusion of raw OA unit reports as * part of stream samples. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_SAMPLE_OA, /** * The value specifies which set of OA unit metrics should be * be configured, defining the contents of any OA unit reports. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_OA_METRICS_SET, /** * The value specifies the size and layout of OA unit reports. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_OA_FORMAT, @@ -1628,6 +1642,8 @@ enum drm_i915_perf_property_id { * from this exponent as follows: * * 80ns * 2^(period_exponent + 1) + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_OA_EXPONENT, @@ -1659,6 +1675,8 @@ struct drm_i915_perf_open_param { * to close and re-open a stream with the same configuration. * * It's undefined whether any pending data for the stream will be lost. + * + * This ioctl is available in perf revision 1. */ #define I915_PERF_IOCTL_ENABLE _IO('i', 0x0) @@ -1666,6 +1684,8 @@ struct drm_i915_perf_open_param { * Disable data capture for a stream. * * It is an error to try and read a stream that is disabled. + * + * This ioctl is available in perf revision 1. */ #define I915_PERF_IOCTL_DISABLE _IO('i', 0x1) From patchwork Tue Feb 26 14:29:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10830373 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E44DF1669 for ; Tue, 26 Feb 2019 14:29:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D24402C525 for ; Tue, 26 Feb 2019 14:29:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C652D2C530; Tue, 26 Feb 2019 14:29:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 479CA2C525 for ; Tue, 26 Feb 2019 14:29:30 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6D73E89EB8; Tue, 26 Feb 2019 14:29:29 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2613489E69 for ; Tue, 26 Feb 2019 14:29:25 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2019 06:29:25 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,415,1544515200"; d="scan'208";a="150113376" Received: from delly.ld.intel.com ([10.103.238.201]) by fmsmga001.fm.intel.com with ESMTP; 26 Feb 2019 06:29:24 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Tue, 26 Feb 2019 14:29:07 +0000 Message-Id: <20190226142911.9789-6-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> References: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 5/9] drm/i915/perf: add new open param to configure polling of OA buffer X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP This new parameter let's the application choose how often the OA buffer should be checked on the CPU side for data availability. Longer polling period tend to reduce CPU overhead if the application does not care about somewhat real time data collection. v2: Allow disabling polling completely with 0 value (Lionel) v3: Version the new parameter (Joonas) Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 6 +++++ drivers/gpu/drm/i915/i915_perf.c | 43 ++++++++++++++++++++++++++------ include/uapi/drm/i915_drm.h | 10 ++++++++ 3 files changed, 52 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index feb0a377f353..b54929cbf1f9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1394,6 +1394,12 @@ struct i915_perf_stream { * @oa_config: The OA configuration used by the stream. */ struct i915_oa_config *oa_config; + + /** + * @poll_oa_period: The period in nanoseconds at which the OA + * buffer should be checked for available data. + */ + u64 poll_oa_period; }; /** diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 4504d4e18633..5ef9164a22a0 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -254,11 +254,11 @@ */ #define OA_TAIL_MARGIN_NSEC 100000ULL -/* frequency for checking whether the OA unit has written new reports to the - * circular OA buffer... +/* The default frequency for checking whether the OA unit has written new + * reports to the circular OA buffer... */ -#define POLL_FREQUENCY 200 -#define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY) +#define DEFAULT_POLL_FREQUENCY 200 +#define DEFAULT_POLL_PERIOD (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY) /* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */ static int zero; @@ -335,6 +335,8 @@ static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = { * @oa_format: An OA unit HW report format * @oa_periodic: Whether to enable periodic OA unit sampling * @oa_period_exponent: The OA unit sampling period is derived from this + * @poll_oa_period: The period at which the CPU will check for OA data + * availability * * As read_properties_unlocked() enumerates and validates the properties given * to open a stream of metrics the configuration is built up in the structure @@ -351,6 +353,7 @@ struct perf_open_properties { int oa_format; bool oa_periodic; int oa_period_exponent; + u64 poll_oa_period; }; static void free_oa_config(struct drm_i915_private *dev_priv, @@ -1892,9 +1895,9 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream) dev_priv->perf.oa.ops.oa_enable(stream); - if (dev_priv->perf.oa.periodic) + if (dev_priv->perf.oa.periodic && stream->poll_oa_period) hrtimer_start(&dev_priv->perf.oa.poll_check_timer, - ns_to_ktime(POLL_PERIOD), + ns_to_ktime(stream->poll_oa_period), HRTIMER_MODE_REL_PINNED); } @@ -2258,13 +2261,15 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer) struct drm_i915_private *dev_priv = container_of(hrtimer, typeof(*dev_priv), perf.oa.poll_check_timer); + struct i915_perf_stream *stream = dev_priv->perf.oa.exclusive_stream; if (oa_buffer_check_unlocked(dev_priv)) { dev_priv->perf.oa.pollin = true; wake_up(&dev_priv->perf.oa.poll_wq); } - hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD)); + hrtimer_forward_now(hrtimer, + ns_to_ktime(stream->poll_oa_period)); return HRTIMER_RESTART; } @@ -2585,6 +2590,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, stream->dev_priv = dev_priv; stream->ctx = specific_ctx; + stream->poll_oa_period = props->poll_oa_period; ret = i915_oa_stream_init(stream, param, props); if (ret) @@ -2640,6 +2646,7 @@ static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent) /** * read_properties_unlocked - validate + copy userspace stream open properties * @dev_priv: i915 device instance + * @open_flags: Flags set by userspace for the opening of the stream * @uprops: The array of u64 key value pairs given by userspace * @n_props: The number of key value pairs expected in @uprops * @props: The stream configuration built up while validating properties @@ -2653,6 +2660,7 @@ static u64 oa_exponent_to_ns(struct drm_i915_private *dev_priv, int exponent) * rule out defining new properties with ordering requirements in the future. */ static int read_properties_unlocked(struct drm_i915_private *dev_priv, + u32 open_flags, u64 __user *uprops, u32 n_props, struct perf_open_properties *props) @@ -2661,6 +2669,7 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, u32 i; memset(props, 0, sizeof(struct perf_open_properties)); + props->poll_oa_period = DEFAULT_POLL_PERIOD; if (!n_props) { DRM_DEBUG("No i915 perf properties given\n"); @@ -2764,6 +2773,14 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, props->oa_periodic = true; props->oa_period_exponent = value; break; + case DRM_I915_PERF_PROP_POLL_OA_DELAY: + if (value > 0 && value < 100000 /* 100us */) { + DRM_DEBUG("OA availability timer too small (%lluns < 100us)\n", + value); + return -EINVAL; + } + props->poll_oa_period = value; + break; case DRM_I915_PERF_PROP_MAX: MISSING_CASE(id); return -EINVAL; @@ -2772,6 +2789,17 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, uprop += 2; } + /* + * Blocking read need to be waken up by some mechanism. If no polling + * of the HEAD/TAIL register is done by the kernel, we'll never be + * able to wake up. + */ + if ((open_flags & I915_PERF_FLAG_FD_NONBLOCK) == 0 && + !props->poll_oa_period) { + DRM_DEBUG("Requesting a blocking stream with no polling period.\n"); + return -EINVAL; + } + return 0; } @@ -2822,6 +2850,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data, } ret = read_properties_unlocked(dev_priv, + param->flags, u64_to_user_ptr(param->properties_ptr), param->num_properties, &props); diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index d92d6e8f2cc7..a04de844d95e 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1647,6 +1647,16 @@ enum drm_i915_perf_property_id { */ DRM_I915_PERF_PROP_OA_EXPONENT, + /** + * Specifying this property sets up a hrtimer in nanoseconds at which + * the i915 driver will check the OA buffer for available data. A + * value of 0 means no hrtimer will be started. Values below 100 + * microseconds are not allowed. + * + * This property is available in perf revision 2. + */ + DRM_I915_PERF_PROP_POLL_OA_DELAY, + DRM_I915_PERF_PROP_MAX /* non-ABI */ }; From patchwork Tue Feb 26 14:29:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10830375 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E6381669 for ; Tue, 26 Feb 2019 14:29:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F0D012C525 for ; Tue, 26 Feb 2019 14:29:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E4DDF2C530; Tue, 26 Feb 2019 14:29:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 36E812C525 for ; Tue, 26 Feb 2019 14:29:31 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7EC0889ED3; Tue, 26 Feb 2019 14:29:29 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 350B789E69 for ; Tue, 26 Feb 2019 14:29:26 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2019 06:29:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,415,1544515200"; d="scan'208";a="150113381" Received: from delly.ld.intel.com ([10.103.238.201]) by fmsmga001.fm.intel.com with ESMTP; 26 Feb 2019 06:29:25 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Tue, 26 Feb 2019 14:29:08 +0000 Message-Id: <20190226142911.9789-7-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> References: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 6/9] drm/i915: handle interrupts from the OA unit X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP The OA unit can notify that its circular buffer is half full through an interrupt and we would like to give the application the ability to make use of this interrupt to get rid of CPU checks on the OA buffer. This change wires up the interrupt to the i915-perf stream and leaves it ignored for now. v2: Use spin_lock_irq() to access the IMR register on Haswell (Chris) Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 21 +++++++++++++ drivers/gpu/drm/i915/i915_irq.c | 39 ++++++++++++++++++++----- drivers/gpu/drm/i915/i915_perf.c | 26 +++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 7 +++++ drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++ 5 files changed, 88 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index b54929cbf1f9..8faa9cb2b620 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1400,6 +1400,12 @@ struct i915_perf_stream { * buffer should be checked for available data. */ u64 poll_oa_period; + + /** + * @oa_interrupt_monitor: Whether the stream will be notified by OA + * interrupts. + */ + bool oa_interrupt_monitor; }; /** @@ -1892,6 +1898,21 @@ struct drm_i915_private { wait_queue_head_t poll_wq; bool pollin; + /** + * Atomic counter incremented by the interrupt + * handling code for each OA half full interrupt + * received. + */ + atomic64_t half_full_count; + + /** + * Copy of the atomic half_full_count that was last + * processed in the i915-perf driver. If both counters + * differ, there is data available to read in the OA + * buffer. + */ + u64 half_full_count_last; + /** * For rate limiting any notifications of spurious * invalid OA reports diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 7c7e84e86c6a..1028d0d5542d 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1171,6 +1171,12 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv) return; } +static void notify_perfmon_buffer_half_full(struct drm_i915_private *i915) +{ + atomic64_inc(&i915->perf.oa.half_full_count); + wake_up_all(&i915->perf.oa.poll_wq); +} + static void vlv_c0_read(struct drm_i915_private *dev_priv, struct intel_rps_ei *ei) { @@ -1447,6 +1453,9 @@ static void snb_gt_irq_handler(struct drm_i915_private *dev_priv, GT_RENDER_CS_MASTER_ERROR_INTERRUPT)) DRM_DEBUG("Command parser error, gt_iir 0x%08x\n", gt_iir); + if (gt_iir & GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT) + notify_perfmon_buffer_half_full(dev_priv); + if (gt_iir & GT_PARITY_ERROR(dev_priv)) ivybridge_parity_error_irq_handler(dev_priv, gt_iir); } @@ -1468,6 +1477,12 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir) tasklet_hi_schedule(&engine->execlists.tasklet); } +static void gen8_perfmon_handler(struct drm_i915_private *i915, u32 iir) +{ + if (iir & GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT) + notify_perfmon_buffer_half_full(i915); +} + static void gen8_gt_irq_ack(struct drm_i915_private *i915, u32 master_ctl, u32 gt_iir[4]) { @@ -1477,6 +1492,7 @@ static void gen8_gt_irq_ack(struct drm_i915_private *i915, GEN8_GT_BCS_IRQ | \ GEN8_GT_VCS1_IRQ | \ GEN8_GT_VCS2_IRQ | \ + GEN8_GT_WDBOX_OACS_IRQ | \ GEN8_GT_VECS_IRQ | \ GEN8_GT_PM_IRQ | \ GEN8_GT_GUC_IRQ) @@ -1499,7 +1515,7 @@ static void gen8_gt_irq_ack(struct drm_i915_private *i915, raw_reg_write(regs, GEN8_GT_IIR(2), gt_iir[2]); } - if (master_ctl & GEN8_GT_VECS_IRQ) { + if (master_ctl & (GEN8_GT_VECS_IRQ | GEN8_GT_WDBOX_OACS_IRQ)) { gt_iir[3] = raw_reg_read(regs, GEN8_GT_IIR(3)); if (likely(gt_iir[3])) raw_reg_write(regs, GEN8_GT_IIR(3), gt_iir[3]); @@ -1523,9 +1539,11 @@ static void gen8_gt_irq_handler(struct drm_i915_private *i915, gt_iir[1] >> GEN8_VCS2_IRQ_SHIFT); } - if (master_ctl & GEN8_GT_VECS_IRQ) { + if (master_ctl & (GEN8_GT_VECS_IRQ | GEN8_GT_WDBOX_OACS_IRQ)) { gen8_cs_irq_handler(i915->engine[VECS], gt_iir[3] >> GEN8_VECS_IRQ_SHIFT); + gen8_perfmon_handler(i915, + gt_iir[3] >> GEN8_WD_IRQ_SHIFT); } if (master_ctl & (GEN8_GT_PM_IRQ | GEN8_GT_GUC_IRQ)) { @@ -2936,6 +2954,8 @@ gen11_other_irq_handler(struct drm_i915_private * const i915, { if (instance == OTHER_GTPM_INSTANCE) return gen6_rps_irq_handler(i915, iir); + if (instance == OTHER_WDOAPERF_INSTANCE) + return gen8_perfmon_handler(i915, iir); WARN_ONCE(1, "unhandled other interrupt instance=0x%x, iir=0x%x\n", instance, iir); @@ -3769,6 +3789,10 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev) gt_irqs |= GT_BLT_USER_INTERRUPT | GT_BSD_USER_INTERRUPT; } + /* We only expose the i915/perf interface on HSW+. */ + if (IS_HASWELL(dev_priv)) + gt_irqs |= GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT; + GEN3_IRQ_INIT(GT, dev_priv->gt_irq_mask, gt_irqs); if (INTEL_GEN(dev_priv) >= 6) { @@ -3898,7 +3922,8 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv) GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT, 0, GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT | - GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT + GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT | + GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT << GEN8_WD_IRQ_SHIFT }; dev_priv->pm_ier = 0x0; @@ -4017,12 +4042,12 @@ static void gen11_gt_irq_postinstall(struct drm_i915_private *dev_priv) /* * RPS interrupts will get enabled/disabled on demand when RPS itself - * is enabled/disabled. + * is enabled/disabled, just enable the OA interrupt for now. */ - dev_priv->pm_ier = 0x0; + dev_priv->pm_ier = GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT; dev_priv->pm_imr = ~dev_priv->pm_ier; - I915_WRITE(GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0); - I915_WRITE(GEN11_GPM_WGBOXPERF_INTR_MASK, ~0); + I915_WRITE(GEN11_GPM_WGBOXPERF_INTR_ENABLE, dev_priv->pm_ier); + I915_WRITE(GEN11_GPM_WGBOXPERF_INTR_MASK, dev_priv->pm_imr); } static void icp_irq_postinstall(struct drm_device *dev) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 5ef9164a22a0..3ab389edf1de 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -337,6 +337,7 @@ static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = { * @oa_period_exponent: The OA unit sampling period is derived from this * @poll_oa_period: The period at which the CPU will check for OA data * availability + * @oa_interrupt_monitor: Whether we should monitor the OA interrupt. * * As read_properties_unlocked() enumerates and validates the properties given * to open a stream of metrics the configuration is built up in the structure @@ -354,6 +355,7 @@ struct perf_open_properties { bool oa_periodic; int oa_period_exponent; u64 poll_oa_period; + bool oa_interrupt_monitor; }; static void free_oa_config(struct drm_i915_private *dev_priv, @@ -1838,6 +1840,13 @@ static void gen7_oa_enable(struct i915_perf_stream *stream) */ gen7_init_oa_buffer(dev_priv); + if (stream->oa_interrupt_monitor) { + spin_lock_irq(&dev_priv->irq_lock); + gen5_enable_gt_irq(dev_priv, + GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT); + spin_unlock_irq(&dev_priv->irq_lock); + } + I915_WRITE(GEN7_OACONTROL, (ctx_id & GEN7_OACONTROL_CTX_MASK) | (period_exponent << @@ -1864,6 +1873,9 @@ static void gen8_oa_enable(struct i915_perf_stream *stream) */ gen8_init_oa_buffer(dev_priv); + if (stream->oa_interrupt_monitor) + I915_WRITE(GEN8_OA_IMR, ~GEN8_OA_IMR_MASK_INTR); + /* * Note: we don't rely on the hardware to perform single context * filtering and instead filter on the cpu based on the context-id @@ -1893,6 +1905,10 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream) */ dev_priv->perf.oa.pollin = false; + dev_priv->perf.oa.half_full_count_last = 0; + atomic64_set(&dev_priv->perf.oa.half_full_count, + dev_priv->perf.oa.half_full_count_last); + dev_priv->perf.oa.ops.oa_enable(stream); if (dev_priv->perf.oa.periodic && stream->poll_oa_period) @@ -1905,6 +1921,13 @@ static void gen7_oa_disable(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; + if (stream->oa_interrupt_monitor) { + spin_lock_irq(&dev_priv->irq_lock); + gen5_disable_gt_irq(dev_priv, + GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT); + spin_unlock_irq(&dev_priv->irq_lock); + } + I915_WRITE(GEN7_OACONTROL, 0); if (intel_wait_for_register(dev_priv, GEN7_OACONTROL, GEN7_OACONTROL_ENABLE, 0, @@ -1916,6 +1939,8 @@ static void gen8_oa_disable(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; + I915_WRITE(GEN8_OA_IMR, 0xffffffff); + I915_WRITE(GEN8_OACONTROL, 0); if (intel_wait_for_register(dev_priv, GEN8_OACONTROL, GEN8_OA_COUNTER_ENABLE, 0, @@ -2591,6 +2616,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, stream->dev_priv = dev_priv; stream->ctx = specific_ctx; stream->poll_oa_period = props->poll_oa_period; + stream->oa_interrupt_monitor = props->oa_interrupt_monitor; ret = i915_oa_stream_init(stream, param, props); if (ret) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 730bb1917fd1..62e93a492d25 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -229,6 +229,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define MAX_ENGINE_CLASS 4 #define OTHER_GTPM_INSTANCE 1 +#define OTHER_WDOAPERF_INSTANCE 2 #define MAX_ENGINE_INSTANCE 3 /* PCI config space */ @@ -641,6 +642,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define OABUFFER_SIZE_8M (6 << 3) #define OABUFFER_SIZE_16M (7 << 3) +#define GEN8_OA_IMR _MMIO(0x2b20) +#define GEN8_OA_IMR_MASK_INTR (1 << 28) + /* * Flexible, Aggregate EU Counter Registers. * Note: these aren't contiguous @@ -2923,7 +2927,9 @@ enum i915_power_well_id { #define GT_BLT_USER_INTERRUPT (1 << 22) #define GT_BSD_CS_ERROR_INTERRUPT (1 << 15) #define GT_BSD_USER_INTERRUPT (1 << 12) +#define GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT (1 << 12) /* bdw+ */ #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1 (1 << 11) /* hsw+; rsvd on snb, ivb, vlv */ +#define GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT (1 << 9) /* ivb+ but only used on hsw+ */ #define GT_CONTEXT_SWITCH_INTERRUPT (1 << 8) #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT (1 << 5) /* !snb */ #define GT_RENDER_PIPECTL_NOTIFY_INTERRUPT (1 << 4) @@ -7246,6 +7252,7 @@ enum { #define GEN8_DE_PIPE_B_IRQ (1 << 17) #define GEN8_DE_PIPE_A_IRQ (1 << 16) #define GEN8_DE_PIPE_IRQ(pipe) (1 << (16 + (pipe))) +#define GEN8_GT_WDBOX_OACS_IRQ (1 << 7) #define GEN8_GT_VECS_IRQ (1 << 6) #define GEN8_GT_GUC_IRQ (1 << 5) #define GEN8_GT_PM_IRQ (1 << 4) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 1b96b0960adc..c9c460612a56 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2304,6 +2304,8 @@ int intel_init_render_ring_buffer(struct intel_engine_cs *engine) if (HAS_L3_DPF(dev_priv)) engine->irq_keep_mask = GT_RENDER_L3_PARITY_ERROR_INTERRUPT; + if (IS_HASWELL(dev_priv)) + engine->irq_keep_mask |= GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT; engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT; From patchwork Tue Feb 26 14:29:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10830379 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2304C13B5 for ; Tue, 26 Feb 2019 14:29:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 119F12A3BF for ; Tue, 26 Feb 2019 14:29:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0658C2C530; Tue, 26 Feb 2019 14:29:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 778552A3BF for ; Tue, 26 Feb 2019 14:29:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DF99689EFF; Tue, 26 Feb 2019 14:29:34 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3E24A89E69 for ; Tue, 26 Feb 2019 14:29:27 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2019 06:29:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,415,1544515200"; d="scan'208";a="150113385" Received: from delly.ld.intel.com ([10.103.238.201]) by fmsmga001.fm.intel.com with ESMTP; 26 Feb 2019 06:29:26 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Tue, 26 Feb 2019 14:29:09 +0000 Message-Id: <20190226142911.9789-8-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> References: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 7/9] drm/i915/perf: add interrupt enabling parameter X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP This let's the application choose to be driven by the interrupt mechanism of the HW. In conjuction with long periods for checks for the availability of data on the CPU, this can reduce the CPU load when doing capture of OA data. v2: Version the new parameter (Joonas) Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 54 +++++++++++++++++++++++--------- include/uapi/drm/i915_drm.h | 10 ++++++ 2 files changed, 50 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 3ab389edf1de..39801a6e3021 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -243,7 +243,7 @@ * oa_buffer_check(). * * Most of the implementation details for this workaround are in - * oa_buffer_check_unlocked() and _append_oa_reports() + * oa_buffer_check() and _append_oa_reports() * * Note for posterity: previously the driver used to define an effective tail * pointer that lagged the real pointer by a 'tail margin' measured in bytes @@ -418,9 +418,11 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv) return oastatus1 & GEN7_OASTATUS1_TAIL_MASK; } + /** - * oa_buffer_check_unlocked - check for data and update tail ptr state + * oa_buffer_check - check for data and update tail ptr state * @dev_priv: i915 device instance + * @lock: whether to take the oa_buffer spin lock * * This is either called via fops (for blocking reads in user ctx) or the poll * check hrtimer (atomic ctx) to check the OA buffer tail pointer and check @@ -442,8 +444,9 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv) * * Returns: %true if the OA buffer contains data, else %false */ -static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) +static bool oa_buffer_check(struct drm_i915_private *dev_priv, bool lock) { + u64 half_full_count = atomic64_read(&dev_priv->perf.oa.half_full_count); u32 gtt_offset = i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma); int report_size = dev_priv->perf.oa.oa_buffer.format_size; unsigned long flags; @@ -454,7 +457,8 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) * could result in an OA buffer reset which might reset the head, * tails[] and aged_tail state. */ - spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); + if (lock) + spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); hw_tail = dev_priv->perf.oa.ops.oa_hw_tail_read(dev_priv); @@ -530,7 +534,10 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) dev_priv->perf.oa.oa_buffer.aging_timestamp = now; } - spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); + dev_priv->perf.oa.half_full_count_last = half_full_count; + + if (lock) + spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); return OA_TAKEN(dev_priv->perf.oa.oa_buffer.tail - gtt_offset, dev_priv->perf.oa.oa_buffer.head - gtt_offset) >= report_size; @@ -1124,9 +1131,9 @@ static int gen7_oa_read(struct i915_perf_stream *stream, * i915_oa_wait_unlocked - handles blocking IO until OA data available * @stream: An i915-perf stream opened for OA metrics * - * Called when userspace tries to read() from a blocking stream FD opened - * for OA metrics. It waits until the hrtimer callback finds a non-empty - * OA buffer and wakes us. + * Called when userspace tries to read() from a blocking stream FD opened for + * OA metrics. It waits until either the hrtimer callback finds a non-empty OA + * buffer or the OA interrupt kicks in and wakes us. * * Note: it's acceptable to have this return with some false positives * since any subsequent read handling will return -EAGAIN if there isn't @@ -1143,7 +1150,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) return -EIO; return wait_event_interruptible(dev_priv->perf.oa.poll_wq, - oa_buffer_check_unlocked(dev_priv)); + oa_buffer_check(dev_priv, true)); } /** @@ -1962,6 +1969,10 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream) dev_priv->perf.oa.ops.oa_disable(stream); + dev_priv->perf.oa.half_full_count_last = 0; + atomic64_set(&dev_priv->perf.oa.half_full_count, + dev_priv->perf.oa.half_full_count_last); + if (dev_priv->perf.oa.periodic) hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer); } @@ -2288,7 +2299,7 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer) perf.oa.poll_check_timer); struct i915_perf_stream *stream = dev_priv->perf.oa.exclusive_stream; - if (oa_buffer_check_unlocked(dev_priv)) { + if (oa_buffer_check(dev_priv, true)) { dev_priv->perf.oa.pollin = true; wake_up(&dev_priv->perf.oa.poll_wq); } @@ -2324,6 +2335,16 @@ static __poll_t i915_perf_poll_locked(struct drm_i915_private *dev_priv, stream->ops->poll_wait(stream, file, wait); + /* + * Only check the half buffer full notifications if requested by the + * user. + */ + if (stream->oa_interrupt_monitor && + (dev_priv->perf.oa.half_full_count_last != + atomic64_read(&dev_priv->perf.oa.half_full_count))) { + dev_priv->perf.oa.pollin = oa_buffer_check(dev_priv, true); + } + /* Note: we don't explicitly check whether there's something to read * here since this path may be very hot depending on what else * userspace is polling, or on the timeout in use. We rely solely on @@ -2807,6 +2828,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, } props->poll_oa_period = value; break; + case DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT: + props->oa_interrupt_monitor = value != 0; + break; case DRM_I915_PERF_PROP_MAX: MISSING_CASE(id); return -EINVAL; @@ -2817,12 +2841,14 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, /* * Blocking read need to be waken up by some mechanism. If no polling - * of the HEAD/TAIL register is done by the kernel, we'll never be - * able to wake up. + * of the HEAD/TAIL register is done by the kernel and no interrupt is + * enabled, we'll never be able to wake up. */ if ((open_flags & I915_PERF_FLAG_FD_NONBLOCK) == 0 && - !props->poll_oa_period) { - DRM_DEBUG("Requesting a blocking stream with no polling period.\n"); + !props->poll_oa_period && + !props->oa_interrupt_monitor) { + DRM_DEBUG("Requesting a blocking stream with no polling period " + "& no interrupt.\n"); return -EINVAL; } diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index a04de844d95e..d04ce7ba6bd2 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1657,6 +1657,16 @@ enum drm_i915_perf_property_id { */ DRM_I915_PERF_PROP_POLL_OA_DELAY, + /** + * Specifying this property sets up the interrupt mechanism for the OA + * buffer in i915. This option in conjuction with a long polling delay + * for avaibility of OA data can reduce CPU load significantly if you + * do not care about OA data being read as soon as it's available. + * + * This property is available in perf revision 2. + */ + DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT, + DRM_I915_PERF_PROP_MAX /* non-ABI */ }; From patchwork Tue Feb 26 14:29:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10830377 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCF1313B5 for ; Tue, 26 Feb 2019 14:29:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB2FE2C525 for ; Tue, 26 Feb 2019 14:29:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9F7F82C530; Tue, 26 Feb 2019 14:29:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4B3D02C525 for ; Tue, 26 Feb 2019 14:29:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6A48F89E69; Tue, 26 Feb 2019 14:29:30 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6811589EB8 for ; Tue, 26 Feb 2019 14:29:28 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2019 06:29:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,415,1544515200"; d="scan'208";a="150113389" Received: from delly.ld.intel.com ([10.103.238.201]) by fmsmga001.fm.intel.com with ESMTP; 26 Feb 2019 06:29:27 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Tue, 26 Feb 2019 14:29:10 +0000 Message-Id: <20190226142911.9789-9-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> References: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 8/9] drm/i915/perf: add flushing ioctl X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP With the currently available parameters for the i915-perf stream, there are still situations that are not well covered : If an application opens the stream with polling disable or at very low frequency and OA interrupt enabled, no data will be available even though somewhere between nothing and half of the OA buffer worth of data might have landed in memory. To solve this issue we have a new flush ioctl on the perf stream that forces the i915-perf driver to look at the state of the buffer when called and makes any data available through both poll() & read() type syscalls. v2: Version the ioctl (Joonas) Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 17 +++++++++++++++++ include/uapi/drm/i915_drm.h | 21 +++++++++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 39801a6e3021..7067a0f1700e 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2431,6 +2431,20 @@ static void i915_perf_disable_locked(struct i915_perf_stream *stream) stream->ops->disable(stream); } +/** + * i915_perf_flush_data - handle `I915_PERF_IOCTL_FLUSH_DATA` ioctl + * @stream: An enabled i915 perf stream + * + * The intention is to flush all the data available for reading from the OA + * buffer + */ +static void i915_perf_flush_data(struct i915_perf_stream *stream) +{ + struct drm_i915_private *dev_priv = stream->dev_priv; + + dev_priv->perf.oa.pollin = oa_buffer_check(stream->dev_priv, true); +} + /** * i915_perf_ioctl - support ioctl() usage with i915 perf stream FDs * @stream: An i915 perf stream @@ -2454,6 +2468,9 @@ static long i915_perf_ioctl_locked(struct i915_perf_stream *stream, case I915_PERF_IOCTL_DISABLE: i915_perf_disable_locked(stream); return 0; + case I915_PERF_IOCTL_FLUSH_DATA: + i915_perf_flush_data(stream); + return 0; } return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index d04ce7ba6bd2..54cd3099b2d9 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1709,6 +1709,27 @@ struct drm_i915_perf_open_param { */ #define I915_PERF_IOCTL_DISABLE _IO('i', 0x1) +/** + * Actively check the availability of data from a stream. + * + * A stream data availability can be driven by two types of events : + * + * - if enabled, the kernel's hrtimer checking the amount of available data + * in the OA buffer through head/tail registers. + * + * - if enabled, the OA unit's interrupt mechanism + * + * The kernel hrtimer incur a cost of running callback at fixed time + * intervals, while the OA interrupt might only happen rarely. In the + * situation where the application has disabled the kernel's hrtimer and only + * uses the OA interrupt to know about available data, the application can + * request an active check of the available OA data through this ioctl. This + * will make any data in the OA buffer available with either poll() or read(). + * + * This ioctl is available in perf revision 2. + */ +#define I915_PERF_IOCTL_FLUSH_DATA _IO('i', 0x2) + /** * Common to all i915 perf records */ From patchwork Tue Feb 26 14:29:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10830381 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F73717E9 for ; Tue, 26 Feb 2019 14:29:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2DB112C525 for ; Tue, 26 Feb 2019 14:29:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 221F32C532; Tue, 26 Feb 2019 14:29:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E0CEA2C525 for ; Tue, 26 Feb 2019 14:29:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 025F389F01; Tue, 26 Feb 2019 14:29:35 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id D62B689E69 for ; Tue, 26 Feb 2019 14:29:29 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Feb 2019 06:29:29 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,415,1544515200"; d="scan'208";a="150113394" Received: from delly.ld.intel.com ([10.103.238.201]) by fmsmga001.fm.intel.com with ESMTP; 26 Feb 2019 06:29:28 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Tue, 26 Feb 2019 14:29:11 +0000 Message-Id: <20190226142911.9789-10-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> References: <20190226142911.9789-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 9/9] drm/i915/perf: bump i915-perf revision X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP This makes the following opening parameters available to applications : - DRM_I915_PERF_PROP_POLL_OA_DELAY - DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT As well as this new ioctl on the i915-perf file descriptor : - I915_PERF_IOCTL_FLUSH_DATA Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 1ce58036dbb3..654a6c9c2e56 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -448,7 +448,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data, value = INTEL_INFO(dev_priv)->has_coherent_ggtt; break; case I915_PARAM_PERF_REVISION: - value = 1; + value = 2; break; default: DRM_DEBUG("Unknown parameter %d\n", param->param);