From patchwork Mon Mar 27 19:55:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andi Shyti X-Patchwork-Id: 13190005 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 47112C77B60 for ; Mon, 27 Mar 2023 19:56:34 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AC3B610E745; Mon, 27 Mar 2023 19:56:33 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9516110E745; Mon, 27 Mar 2023 19:56:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679946991; x=1711482991; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4kv1qXsjK3cJ0WSb8UD6Ces83ePuvxnNe+FYUmPJa/g=; b=YvTEhb/aZJ6kU1/orZuKF1iCPRfccHCH2TRKc3xqy23Lp4DuZoVkZSy0 jO/GAUCtQ08dKOCJUW7Gj0XIlO+xIobuXAH37tpxH8N9of7hxv4lmGrMa BYR64xocqaGh5zgcZwiyd0n3u/PqACIiCUEx05niA7HdT5/RLYfON4RCm 5Y1Xz3giR6t/VBn9azk6lZb6t0PSnwFiUaeeE3/0ULyHHIfz/K9vUzNlE zm7Se5EKOkLcQ1dudRJuY71EonZtY6z3dBTvpN7vemcK2q3tI9EN2UHnL pLb8v4/2z/+9P94QCuZESBZ497jmefc3Jt4woF+DWUbXJ7+MZQFvSISUr Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10662"; a="405302595" X-IronPort-AV: E=Sophos;i="5.98,295,1673942400"; d="scan'208";a="405302595" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2023 12:56:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10662"; a="807600377" X-IronPort-AV: E=Sophos;i="5.98,295,1673942400"; d="scan'208";a="807600377" Received: from mgaucher-mobl.ger.corp.intel.com (HELO intel.com) ([10.252.63.207]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2023 12:56:28 -0700 From: Andi Shyti To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Matt Roper Date: Mon, 27 Mar 2023 21:55:46 +0200 Message-Id: <20230327195547.356584-2-andi.shyti@linux.intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230327195547.356584-1-andi.shyti@linux.intel.com> References: <20230327195547.356584-1-andi.shyti@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 1/2] drm/i915: Sanitycheck MMIO access early in driver load X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andi Shyti , Andrzej Hajda Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Matt Roper We occasionally see the PCI device in a non-accessible state at the point the driver is loaded. When this happens, all BAR accesses will read back as 0xFFFFFFFF. Rather than reading registers and misinterpreting their (invalid) values, let's specifically check for 0xFFFFFFFF in a register that cannot have that value to see if the device is accessible. Signed-off-by: Matt Roper Cc: Mika Kuoppala Reviewed-by: Andi Shyti Signed-off-by: Andi Shyti --- Hi, Andrzej suggested to check the upper 16 bits of the BAR. But after an offline discussion with Matt, we agreed that reading the whole 32 bits is a safer choice. Andi drivers/gpu/drm/i915/intel_uncore.c | 34 +++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index e1e1f34490c8e..14ec45e6facfa 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -2602,11 +2602,45 @@ static int uncore_forcewake_init(struct intel_uncore *uncore) return 0; } +static int sanity_check_mmio_access(struct intel_uncore *uncore) +{ + struct drm_i915_private *i915 = uncore->i915; + + if (GRAPHICS_VER(i915) < 8) + return 0; + + /* + * Sanitycheck that MMIO access to the device is working properly. If + * the CPU is unable to communcate with a PCI device, BAR reads will + * return 0xFFFFFFFF. Let's make sure the device isn't in this state + * before we start trying to access registers. + * + * We use the primary GT's forcewake register as our guinea pig since + * it's been around since HSW and it's a masked register so the upper + * 16 bits can never read back as 1's if device access is operating + * properly. + * + * If MMIO isn't working, we'll wait up to 2 seconds to see if it + * recovers, then give up. + */ +#define COND (__raw_uncore_read32(uncore, FORCEWAKE_MT) != ~0) + if (wait_for(COND, 2000) == -ETIMEDOUT) { + drm_err(&i915->drm, "Device is non-operational; MMIO access returns 0xFFFFFFFF!\n"); + return -EIO; + } + + return 0; +} + int intel_uncore_init_mmio(struct intel_uncore *uncore) { struct drm_i915_private *i915 = uncore->i915; int ret; + ret = sanity_check_mmio_access(uncore); + if (ret) + return ret; + /* * The boot firmware initializes local memory and assesses its health. * If memory training fails, the punit will have been instructed to From patchwork Mon Mar 27 19:55:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andi Shyti X-Patchwork-Id: 13190006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EEA94C6FD1D for ; Mon, 27 Mar 2023 19:56:42 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5449C10E749; Mon, 27 Mar 2023 19:56:41 +0000 (UTC) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9936410E74B; Mon, 27 Mar 2023 19:56:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679946996; x=1711482996; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q2VBSez3j5yBTagWQwJvJHgIHgAfLi4yObWl6RlAFto=; b=IuBZQod33KLuaMQGdGYXuVnZnM2q6i1I78lDdld6VNMBDoXu+U/gyJ3u mDmL0I7cYgV1IVNLWcf0296dXPXF6wGhBSZwyJFMSYCvZAsJmBEKmfBHh bhQ4qDjTrRr29VS3kbGcOGz0PMvjgzBHutqfAOIHx4BwIrLtj2X+2ufZC kO7t/Zy+scKgA93jBPzRi36x9fwSlAe4vF9QicJ1x28OnAKt7XEIx0fuL k7uNYawiYYAXRiNaVz4LFejqoQE4iUEJe325PnYGHUP0IXX9HIpIQUK82 6WLUrPNqyCeKxi0kXPu9sVuuH3V2wlbdNUX2RakWo15zdtS7KO+kZyD3+ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10662"; a="405302648" X-IronPort-AV: E=Sophos;i="5.98,295,1673942400"; d="scan'208";a="405302648" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2023 12:56:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10662"; a="807600384" X-IronPort-AV: E=Sophos;i="5.98,295,1673942400"; d="scan'208";a="807600384" Received: from mgaucher-mobl.ger.corp.intel.com (HELO intel.com) ([10.252.63.207]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2023 12:56:33 -0700 From: Andi Shyti To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Matt Roper Date: Mon, 27 Mar 2023 21:55:47 +0200 Message-Id: <20230327195547.356584-3-andi.shyti@linux.intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230327195547.356584-1-andi.shyti@linux.intel.com> References: <20230327195547.356584-1-andi.shyti@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v3 2/2] drm/i915: Check for unreliable MMIO during forcewake X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andi Shyti , Andrzej Hajda Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Matt Roper Although we now sanitycheck MMIO access during driver load to make sure the MMIO BAR isn't returning all 0xFFFFFFFF, there have been a few cases where (temporarily?) unreliable MMIO access has happened after GPU resets or power events. We'll often notice this on our next GT register access since forcewake handling will fail; let's change our handling slightly so that when this happens we print a more meaningful message clarifying that the problem is the MMIO access, not forcewake specifically. Signed-off-by: Matt Roper Cc: Mika Kuoppala Reviewed-by: Andi Shyti Reviewed-by: Andrzej Hajda Signed-off-by: Andi Shyti --- drivers/gpu/drm/i915/intel_uncore.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 14ec45e6facfa..796ebfe6c5507 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -177,12 +177,19 @@ wait_ack_set(const struct intel_uncore_forcewake_domain *d, static inline void fw_domain_wait_ack_clear(const struct intel_uncore_forcewake_domain *d) { - if (wait_ack_clear(d, FORCEWAKE_KERNEL)) { + if (!wait_ack_clear(d, FORCEWAKE_KERNEL)) + return; + + if (fw_ack(d) == ~0) + drm_err(&d->uncore->i915->drm, + "%s: MMIO unreliable (forcewake register returns 0xFFFFFFFF)!\n", + intel_uncore_forcewake_domain_to_str(d->id)); + else drm_err(&d->uncore->i915->drm, "%s: timed out waiting for forcewake ack to clear.\n", intel_uncore_forcewake_domain_to_str(d->id)); - add_taint_for_CI(d->uncore->i915, TAINT_WARN); /* CI now unreliable */ - } + + add_taint_for_CI(d->uncore->i915, TAINT_WARN); /* CI now unreliable */ } enum ack_type {