From patchwork Wed Nov 25 17:45:26 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 7701911 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 33902BF90C for ; Wed, 25 Nov 2015 17:45:42 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 10C1320859 for ; Wed, 25 Nov 2015 17:45:41 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 4797720850 for ; Wed, 25 Nov 2015 17:45:38 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 551736EAFA; Wed, 25 Nov 2015 09:45:37 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wm0-f49.google.com (mail-wm0-f49.google.com [74.125.82.49]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5D5BD6EAFA for ; Wed, 25 Nov 2015 09:45:35 -0800 (PST) Received: by wmec201 with SMTP id c201so267552088wme.0 for ; Wed, 25 Nov 2015 09:45:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id; bh=+YbJlinTFdapbOruwV6CpkzwKAnSOixEnfpSla6BE5U=; b=YqEeJPo8uylDfKzCorMV9UTM8NP3Eim3o8Oi0TFQxxAfgzGpj7qIXGYouD855FfI+J bcELIiHthMRKqkR0AUJ6Nd2kAo1JOPTPkP/Jyt4ZCVayUoQUC2VU66H/aM7mwq+VbrGL bbXfBDCqG0vXBhvlOLFACaNI/NJcx6oAZ9QWw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=+YbJlinTFdapbOruwV6CpkzwKAnSOixEnfpSla6BE5U=; b=VgjRcs88M6O5lMBwvMXPsorVTVQdRYagrVguGKoRRPrjg5uf/tU4ieWFe2E8KRbzdn 0mqgpLUo1XIHTWJ5N2/bApN2H8iLjYRpmwpt18HQGQr6AgBNecjDAOCQ8xvMAaxiriyw 1hPqKeboPfQvvG0S1Sn3fkItKTbTEeuALyU1xPgvTcLaKWnAKKFi8AxBQFHeR6Y7Fk7v W+u00e1rDbXph9txjmhmluxqjLN9QMMYyhA/9YHu0+pXP9uANNWPu6sz7Il+ji/ReJDv RtgQycB7ZCbMl3u0lsgFxJiNcyU5V/fMmjO8uzD0HagyKVoSR+UnP7Spba4SQOQSTFm4 KJDg== X-Gm-Message-State: ALoCoQmn3YRAHuxuG9L8f5jao7Dz81Mh3ZfrWFTyT82kV8yWEmrBt5rjw6rsrTzslVBR1Salmr62 X-Received: by 10.28.91.83 with SMTP id p80mr6316343wmb.87.1448473533569; Wed, 25 Nov 2015 09:45:33 -0800 (PST) Received: from wespe.ffwll.local (212-51-149-109.fiber7.init7.net. [212.51.149.109]) by smtp.gmail.com with ESMTPSA id q6sm4548353wmd.8.2015.11.25.09.45.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Nov 2015 09:45:32 -0800 (PST) From: Daniel Vetter To: Intel Graphics Development Date: Wed, 25 Nov 2015 18:45:26 +0100 Message-Id: <1448473526-32077-1-git-send-email-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.1.0 Cc: Daniel Vetter , Daniel Vetter Subject: [Intel-gfx] [PATCH] drm/i915: Fix up -EIO ABI per igt/gem_eio X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP My apologies to Chris Wilson for being dense on this topic for essentially for years. This patch doesn't do any big redesign of the overall reset flow, but instead just applies changes where it's needed to obey gem_eio. We can redesign later on, but for now I prefer to make the testcase happy with some minimally invasive changes: - Ensure that set_domain_ioctl never returns -EIO. Tricky part there is that we might race with the reset handler when doing the lockless wait. - Make sure debugfs/i915_wedged is actually useful to reset a wedged gpu - atm it bails out with -EAGAIN for a terminally wedged gpu. That's because reset_in_progress actually includes that terminal state, which is super-confusing (and most callers got this wrong). Fix this by changing the semantics of reset_in_progress to do what it says on the tin (and fixup all other callers). Also make sure that reset_in_progress is cleared when we reach the terminal "wedged" state. Without this we kept returning -EAGAIN in some places forever. - Second problem with debugfs/i915_wedge was that we never got out of the terminal wedged state. Hence manually set the reset counter out of that state before starting the hung gpu recovery. For safety also make sure that we are consintent with resetting the gpu between i915_reset_and_wakeup and i915_handler_error by just passing the boolean "wedged" around instead of trying to recompute it wrongly. This isn't an issue for gem_eio with the debugfs fix, but just general robustness of the code. - Finally make sure that when gpu reset is disabled through the module paramter the kernel doesn't generate dmesg noise that would upset our CI/piglit. Simplest solution for that was to lift the i915.reset check up into i915_reset(). With all these changes, plus the igt fixes I've posted, gem_eio is now happy on many snb. v2: Don't upset lockdep with my set_domain_ioctl changes. Cc: Chris Wilson Testcase: igt/gem_eio/* Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_debugfs.c | 4 ++++ drivers/gpu/drm/i915/i915_drv.c | 6 ++++++ drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem.c | 23 +++++++++++++++-------- drivers/gpu/drm/i915/i915_irq.c | 11 ++++++----- drivers/gpu/drm/i915/intel_uncore.c | 3 --- 6 files changed, 32 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 411a9c68b4ee..c4006c09ef1b 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -4744,6 +4744,10 @@ i915_wedged_set(void *data, u64 val) if (i915_reset_in_progress(&dev_priv->gpu_error)) return -EAGAIN; + /* Already wedged, force us out of this terminal state. */ + if (i915_terminally_wedged(&dev_priv->gpu_error)) + atomic_set(&dev_priv->gpu_error.reset_counter, 0); + intel_runtime_pm_get(dev_priv); i915_handle_error(dev, val, diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index ec1e877c4a36..1f5386bb78f4 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -909,6 +909,12 @@ int i915_reset(struct drm_device *dev) simulated = dev_priv->gpu_error.stop_rings != 0; + if (!i915.reset) { + DRM_INFO("GPU reset disabled in module option, not resetting\n"); + mutex_unlock(&dev->struct_mutex); + return -ENODEV; + } + ret = intel_gpu_reset(dev); /* Also reset the gpu hangman. */ diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index a47e0f4fab56..8876b4891d56 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2939,7 +2939,7 @@ int __must_check i915_gem_check_wedge(struct i915_gpu_error *error, static inline bool i915_reset_in_progress(struct i915_gpu_error *error) { return unlikely(atomic_read(&error->reset_counter) - & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED)); + & I915_RESET_IN_PROGRESS_FLAG); } static inline bool i915_terminally_wedged(struct i915_gpu_error *error) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f10a5d57377e..f794e8f37f92 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -85,8 +85,7 @@ i915_gem_wait_for_error(struct i915_gpu_error *error) { int ret; -#define EXIT_COND (!i915_reset_in_progress(error) || \ - i915_terminally_wedged(error)) +#define EXIT_COND (!i915_reset_in_progress(error)) if (EXIT_COND) return 0; @@ -1113,16 +1112,16 @@ int i915_gem_check_wedge(struct i915_gpu_error *error, bool interruptible) { + /* Recovery complete, but the reset failed ... */ + if (i915_terminally_wedged(error)) + return -EIO; + if (i915_reset_in_progress(error)) { /* Non-interruptible callers can't handle -EAGAIN, hence return * -EIO unconditionally for these. */ if (!interruptible) return -EIO; - /* Recovery complete, but the reset failed ... */ - if (i915_terminally_wedged(error)) - return -EIO; - /* * Check if GPU Reset is in progress - we need intel_ring_begin * to work properly to reinit the hw state while the gpu is @@ -1577,7 +1576,7 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, ret = i915_mutex_lock_interruptible(dev); if (ret) - return ret; + goto out; obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle)); if (&obj->base == NULL) { @@ -1592,7 +1591,8 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, ret = i915_gem_object_wait_rendering__nonblocking(obj, to_rps_client(file), !write_domain); - if (ret) + /* ABI: We must do cache management even when the gpu is dead. */ + if (ret && ret != -EIO) goto unref; if (read_domains & I915_GEM_DOMAIN_GTT) @@ -1605,10 +1605,17 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, write_domain == I915_GEM_DOMAIN_GTT ? ORIGIN_GTT : ORIGIN_CPU); + /* ABI: set_domain_ioctl must not fail even when the gpu is wedged. */ + if (ret == -EIO) + ret = 0; + unref: drm_gem_object_unreference(&obj->base); unlock: mutex_unlock(&dev->struct_mutex); +out: + WARN_ON(ret == -EIO); + return ret; } diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index c8ba94968aaf..dbbc6159584b 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2401,7 +2401,8 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv, * Fire an error uevent so userspace can see that a hang or error * was detected. */ -static void i915_reset_and_wakeup(struct drm_device *dev) +static void i915_reset_and_wakeup(struct drm_device *dev, + bool wedged) { struct drm_i915_private *dev_priv = to_i915(dev); struct i915_gpu_error *error = &dev_priv->gpu_error; @@ -2422,7 +2423,7 @@ static void i915_reset_and_wakeup(struct drm_device *dev) * the reset in-progress bit is only ever set by code outside of this * work we don't need to worry about any other races. */ - if (i915_reset_in_progress(error) && !i915_terminally_wedged(error)) { + if (wedged) { DRM_DEBUG_DRIVER("resetting chip\n"); kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, reset_event); @@ -2432,7 +2433,7 @@ static void i915_reset_and_wakeup(struct drm_device *dev) * reference held, for example because there is a pending GPU * request that won't finish until the reset is done. This * isn't the case at least when we get here by doing a - * simulated reset via debugs, so get an RPM reference. + * simulated reset via debugfs, so get an RPM reference. */ intel_runtime_pm_get(dev_priv); @@ -2467,7 +2468,7 @@ static void i915_reset_and_wakeup(struct drm_device *dev) kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, reset_done_event); } else { - atomic_or(I915_WEDGED, &error->reset_counter); + atomic_set(&error->reset_counter, I915_WEDGED); } /* @@ -2614,7 +2615,7 @@ void i915_handle_error(struct drm_device *dev, bool wedged, i915_error_wake_up(dev_priv, false); } - i915_reset_and_wakeup(dev); + i915_reset_and_wakeup(dev, wedged); } /* Called from drm generic code, passed 'crtc' which diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index c2358ba78b30..4c5ae1154dd1 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1543,9 +1543,6 @@ not_ready: static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *) { - if (!i915.reset) - return NULL; - if (INTEL_INFO(dev)->gen >= 8) return gen8_do_reset; else if (INTEL_INFO(dev)->gen >= 6)