From patchwork Mon Jul 6 10:49:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micha=C5=82_Winiarski?= X-Patchwork-Id: 11645425 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD218739 for ; Mon, 6 Jul 2020 10:50:09 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AA85A2074D for ; Mon, 6 Jul 2020 10:50:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=hardline-pl.20150623.gappssmtp.com header.i=@hardline-pl.20150623.gappssmtp.com header.b="gxKYEioz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA85A2074D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=hardline.pl Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9914B6E047; Mon, 6 Jul 2020 10:50:08 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-lj1-x241.google.com (mail-lj1-x241.google.com [IPv6:2a00:1450:4864:20::241]) by gabe.freedesktop.org (Postfix) with ESMTPS id C5D3B6E03B for ; Mon, 6 Jul 2020 10:50:07 +0000 (UTC) Received: by mail-lj1-x241.google.com with SMTP id q7so31450360ljm.1 for ; Mon, 06 Jul 2020 03:50:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hardline-pl.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=cA257lTJ4D9YZU+s/1WkvT7gU8A4U/PknCXNLEUCY/8=; b=gxKYEiozfHgtDI/PDF/AfBfo+xEYDFNdIOvIpW0YYfB1i0ftKXXV63ss8TBrovpJOo kvY0pSNKcGJUaqdcVDsqCWT5EVUjF9IqOGPgjUJN07TJ7fhFHWl0uUTyVz285t/1Y+QX wm0TUmc2DCjKaNQ1xZ98lpzhkudnr4E+uk8McOUKittWYxz4nBcxwdpX3ijC2fFNHUFz T76aGGC9W8TGMuNLrXHyyZ000wDnm7GzO1TI21P9mjF6CC5vd0swt6j3SZdwsKrydMRN +KUbPw5HtCbfVZIB0lv5v/W1fgYBh2ONbYIZKYNgULVPxLYWWJLkWmyx5dO3l1vDZ0WU nlYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=cA257lTJ4D9YZU+s/1WkvT7gU8A4U/PknCXNLEUCY/8=; b=pRi7QZiUmqQUHtg3aQBRIvsRgJoUsHYdqoThGiJ0NQEt/wcgp+LvSIdhLDrX2tLsTy Fscyz3PYaAkaZ94HONcAt4Wg9FHPbv/sXhfN7swumG4WRMfPnQbmeD1Uq4DR9GAir8b7 gS27FtDiEVzlA2PGFwXLe4Wu8PK7nUQ+Z1BoRDEu6oUR+Bv7jbphkbFCbdT0zhLAmZws +q3tNUqFgMU9DI6xCvEnyYdthj9f0J3P2PmQ8ux/rKUsVFB33DIkaWwjwGjpVZsTKYSA 15bmcVVw/12JH9cZLlShln8V0//CmPTWVLwH4dsVGWj7BEhbAF2j5ib7vNdBXq5+q2L2 rTyQ== X-Gm-Message-State: AOAM5301p7lnQF/ooZ93ZtDY2F4ZG7NHzgWSIrZ9bHmlrE+xd+SGGNzW dIGAxZJsmrIyphjCbGvFmzhRT9ED278= X-Google-Smtp-Source: ABdhPJw3GWDuo5FuY5w328sRcifZwHvL67R1NC3jaQiJ9PE+h76EMexePegUqcy8uF1tCbO12ouaXg== X-Received: by 2002:a2e:880c:: with SMTP id x12mr27734276ljh.375.1594032605731; Mon, 06 Jul 2020 03:50:05 -0700 (PDT) Received: from localhost (109241244009.gdansk.vectranet.pl. [109.241.244.9]) by smtp.gmail.com with ESMTPSA id d6sm7728197ljc.23.2020.07.06.03.50.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Jul 2020 03:50:05 -0700 (PDT) From: =?utf-8?q?Micha=C5=82_Winiarski?= To: intel-gfx@lists.freedesktop.org Date: Mon, 6 Jul 2020 12:49:52 +0200 Message-Id: <20200706104953.139261-1-michal@hardline.pl> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 1/2] drm/i915: Reboot CI if we get wedged during driver init X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Micha=C5=82_Winiarski?= , Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Michał Winiarski Getting wedged device on driver init is pretty much unrecoverable. Since we're running various scenarios that may potentially hit this in CI (module reload / selftests / hotunplug), and if it happens, it means that we can't trust any subsequent CI results, we should just apply the taint to let the CI know that it should reboot (CI checks taint between test runs). v2: Comment that WEDGED_ON_INIT is non-recoverable, distinguish WEDGED_ON_INIT from WEDGED_ON_FINI (Chris) Signed-off-by: Michał Winiarski Cc: Chris Wilson Cc: Michal Wajdeczko Cc: Petri Latvala Reviewed-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_user.c | 2 +- drivers/gpu/drm/i915/gt/intel_gt.c | 2 +- drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++++++++---- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_reset.c | 13 +++++++++++-- drivers/gpu/drm/i915/gt/intel_reset.h | 10 ++-------- drivers/gpu/drm/i915/gt/intel_reset_types.h | 7 ++++++- 7 files changed, 30 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c index 848decee9066..34e6096f196e 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c @@ -201,7 +201,7 @@ void intel_engines_driver_register(struct drm_i915_private *i915) uabi_node); char old[sizeof(engine->name)]; - if (intel_gt_has_init_error(engine->gt)) + if (intel_gt_has_unrecoverable_error(engine->gt)) continue; /* ignore incomplete engines */ GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes)); diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index ebc29b6ee86c..876f78759095 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -510,7 +510,7 @@ static int __engines_verify_workarounds(struct intel_gt *gt) static void __intel_gt_disable(struct intel_gt *gt) { - intel_gt_set_wedged_on_init(gt); + intel_gt_set_wedged_on_fini(gt); intel_gt_suspend_prepare(gt); intel_gt_suspend_late(gt); diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 4fac043750aa..7201f96723d8 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -58,14 +58,18 @@ static inline u32 intel_gt_scratch_offset(const struct intel_gt *gt, return i915_ggtt_offset(gt->scratch) + field; } -static inline bool intel_gt_is_wedged(const struct intel_gt *gt) +static inline bool intel_gt_has_unrecoverable_error(const struct intel_gt *gt) { - return __intel_reset_failed(>->reset); + return test_bit(I915_WEDGED_ON_INIT, >->reset.flags) || + test_bit(I915_WEDGED_ON_FINI, >->reset.flags); } -static inline bool intel_gt_has_init_error(const struct intel_gt *gt) +static inline bool intel_gt_is_wedged(const struct intel_gt *gt) { - return test_bit(I915_WEDGED_ON_INIT, >->reset.flags); + GEM_BUG_ON(intel_gt_has_unrecoverable_error(gt) ? + !test_bit(I915_WEDGED, >->reset.flags) : false); + + return unlikely(test_bit(I915_WEDGED, >->reset.flags)); } #endif /* __INTEL_GT_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index f1d5333f9456..274aa0dd7050 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -188,7 +188,7 @@ int intel_gt_resume(struct intel_gt *gt) enum intel_engine_id id; int err; - err = intel_gt_has_init_error(gt); + err = intel_gt_has_unrecoverable_error(gt); if (err) return err; diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 0156f1f5c736..6f94b6479a2f 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -880,7 +880,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) return true; /* Never fully initialised, recovery impossible */ - if (test_bit(I915_WEDGED_ON_INIT, >->reset.flags)) + if (intel_gt_has_unrecoverable_error(gt)) return false; GT_TRACE(gt, "start\n"); @@ -1342,7 +1342,7 @@ int intel_gt_terminally_wedged(struct intel_gt *gt) if (!intel_gt_is_wedged(gt)) return 0; - if (intel_gt_has_init_error(gt)) + if (intel_gt_has_unrecoverable_error(gt)) return -EIO; /* Reset still in progress? Maybe we will recover? */ @@ -1360,6 +1360,15 @@ void intel_gt_set_wedged_on_init(struct intel_gt *gt) I915_WEDGED_ON_INIT); intel_gt_set_wedged(gt); set_bit(I915_WEDGED_ON_INIT, >->reset.flags); + + /* Wedged on init is non-recoverable */ + add_taint_for_CI(TAINT_WARN); +} + +void intel_gt_set_wedged_on_fini(struct intel_gt *gt) +{ + intel_gt_set_wedged(gt); + set_bit(I915_WEDGED_ON_FINI, >->reset.flags); } void intel_gt_init_reset(struct intel_gt *gt) diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h index 8e8d5f761166..a0eec7c11c0c 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.h +++ b/drivers/gpu/drm/i915/gt/intel_reset.h @@ -47,8 +47,10 @@ int intel_gt_terminally_wedged(struct intel_gt *gt); /* * There's no unset_wedged_on_init paired with this one. * Once we're wedged on init, there's no going back. + * Same thing for unset_wedged_on_fini. */ void intel_gt_set_wedged_on_init(struct intel_gt *gt); +void intel_gt_set_wedged_on_fini(struct intel_gt *gt); int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask); @@ -71,14 +73,6 @@ void __intel_fini_wedge(struct intel_wedge_me *w); (W)->gt; \ __intel_fini_wedge((W))) -static inline bool __intel_reset_failed(const struct intel_reset *reset) -{ - GEM_BUG_ON(test_bit(I915_WEDGED_ON_INIT, &reset->flags) ? - !test_bit(I915_WEDGED, &reset->flags) : false); - - return unlikely(test_bit(I915_WEDGED, &reset->flags)); -} - bool intel_has_gpu_reset(const struct intel_gt *gt); bool intel_has_reset_engine(const struct intel_gt *gt); diff --git a/drivers/gpu/drm/i915/gt/intel_reset_types.h b/drivers/gpu/drm/i915/gt/intel_reset_types.h index f43bc3a0fe4f..fe386bf5306c 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset_types.h +++ b/drivers/gpu/drm/i915/gt/intel_reset_types.h @@ -34,12 +34,17 @@ struct intel_reset { * longer use the GPU - similar to #I915_WEDGED bit. The difference in * in the way we're handling "forced" unwedged (e.g. through debugfs), * which is not allowed in case we failed to initialize. + * + * #I915_WEDGED_ON_FINI - Similar to #I915_WEDGED_ON_INIT, except we + * use it to mark that the GPU is no longer available (and prevent + * users from using it). */ unsigned long flags; #define I915_RESET_BACKOFF 0 #define I915_RESET_MODESET 1 #define I915_RESET_ENGINE 2 -#define I915_WEDGED_ON_INIT (BITS_PER_LONG - 2) +#define I915_WEDGED_ON_INIT (BITS_PER_LONG - 3) +#define I915_WEDGED_ON_FINI (BITS_PER_LONG - 2) #define I915_WEDGED (BITS_PER_LONG - 1) struct mutex mutex; /* serialises wedging/unwedging */ From patchwork Mon Jul 6 10:49:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Micha=C5=82_Winiarski?= X-Patchwork-Id: 11645427 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7FD3739 for ; Mon, 6 Jul 2020 10:50:38 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9544820715 for ; Mon, 6 Jul 2020 10:50:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=hardline-pl.20150623.gappssmtp.com header.i=@hardline-pl.20150623.gappssmtp.com header.b="OJUAFvsu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9544820715 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=hardline.pl Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 292E26E03B; Mon, 6 Jul 2020 10:50:38 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-ed1-x543.google.com (mail-ed1-x543.google.com [IPv6:2a00:1450:4864:20::543]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2B46C6E03B for ; Mon, 6 Jul 2020 10:50:36 +0000 (UTC) Received: by mail-ed1-x543.google.com with SMTP id g20so34146774edm.4 for ; Mon, 06 Jul 2020 03:50:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hardline-pl.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=3NOc9gQsETNS1533uPXnMDlzOEy1GIEit93SsGsgmgA=; b=OJUAFvsuYKFMP3pp8Bzbn4FDZs8s9UZHj/ioCbkM28JnNR7K8kICkyQR5wdWhPpDgG KsmKd4m+QZi6sBNXSkp3FyTrnZ+wxfwsbVUF7g5Xg7rtiodrvPob8vww9bF0lYX+p/wk 8msz2TMVPBChXNSrwc7LV4UrI2EeLUtJ5gbCnwVIPK7kXII2iKuFiOLlWnRJsfSUMDY9 fKmqmAAhcfvoX3APtzRYfaACvVQqfHqHUUYsjJcJF5UhCPsIXmgAOrejE4OIBIIGm5ir AU14dAoMvQzK+yomo4flbI71BnfDngzAVM7SJ1xPGSQLt7a1I1eZqJ77wUaScZRAMdvI kiCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3NOc9gQsETNS1533uPXnMDlzOEy1GIEit93SsGsgmgA=; b=pkKzUw+hVFBpwPVEq0e2qx3vHstBOGFg9U6IU3CGTOaTdIcQN+goJ2s8iH8F/MEfks 1FvoWrxB5TU45TN4DwNeVHyIW35ok9Z+3T37NeiHDvFsACdpMVQ4LUFYp5MtW+91G0ny fpzi1dDXzG3eLrBdWOnrM4plEmh/bnvQzQA4uwHCdJrqouMFkdJY0X62h5e3tx+mEy5t CoWvx4fFrLYUXnR9Eur7El2NAk4/dILf/tEGYPwqckaS7any0kCrGq/oYBZnoFqpEzIE I5rYXkfvthPK1vd4UYIFWi2LiWnoYM591Cuk9+wCAlNncy8LdaYJ+BsGAcCyDqWJ3n6W VnKg== X-Gm-Message-State: AOAM533oyvaREOzGDqOoErok7VWt5e7Dj+XXnlx9WWQCghsdtP1AKesY Rb2NIlCUqt05iCH6ft4XTgObycMGA+ktKA== X-Google-Smtp-Source: ABdhPJwVcbyv1voSBfpc8G01e0cP+v8c7xMr7fNFMrtLAA1TVot+JQwHtNDeP+mj0zb6YtnHqK2J/Q== X-Received: by 2002:a19:c4a:: with SMTP id 71mr29948799lfm.27.1594032612988; Mon, 06 Jul 2020 03:50:12 -0700 (PDT) Received: from localhost (109241244009.gdansk.vectranet.pl. [109.241.244.9]) by smtp.gmail.com with ESMTPSA id v12sm8778415lfp.12.2020.07.06.03.50.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Jul 2020 03:50:08 -0700 (PDT) From: =?utf-8?q?Micha=C5=82_Winiarski?= To: intel-gfx@lists.freedesktop.org Date: Mon, 6 Jul 2020 12:49:53 +0200 Message-Id: <20200706104953.139261-2-michal@hardline.pl> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200706104953.139261-1-michal@hardline.pl> References: <20200706104953.139261-1-michal@hardline.pl> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 2/2] drm/i915: Print caller when tainting for CI X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Micha=C5=82_Winiarski?= , Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" We can add taint from multiple places, printing the caller allows us to have a better overview of what exactly caused us to do the tainting. Suggested-by: Michal Wajdeczko Signed-off-by: Michał Winiarski Cc: Chris Wilson Cc: Michal Wajdeczko Cc: Petri Latvala --- drivers/gpu/drm/i915/i915_utils.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h index 03a73d2bd50d..1ed5c47eae8f 100644 --- a/drivers/gpu/drm/i915/i915_utils.h +++ b/drivers/gpu/drm/i915/i915_utils.h @@ -444,6 +444,7 @@ static inline void add_taint_for_CI(unsigned int taint) * CI checks the taint state after every test and will reboot * the machine if the kernel is tainted. */ + pr_info("CI taint: %ps\n", __builtin_return_address(0)); add_taint(taint, LOCKDEP_STILL_OK); }