From patchwork Wed Aug 20 13:20:11 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Maarten Lankhorst <maarten.lankhorst@canonical.com>
X-Patchwork-Id: 4751201
Return-Path: <dri-devel-bounces@lists.freedesktop.org>
X-Original-To: patchwork-dri-devel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.19.201])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 6202D9F344
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Wed, 20 Aug 2014 13:20:19 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id CA50F2015E
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Wed, 20 Aug 2014 13:20:16 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by mail.kernel.org (Postfix) with ESMTP id 0B2F620136
	for <patchwork-dri-devel@patchwork.kernel.org>;
	Wed, 20 Aug 2014 13:20:15 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id 660416E6C8;
	Wed, 20 Aug 2014 06:20:14 -0700 (PDT)
X-Original-To: dri-devel@lists.freedesktop.org
Delivered-To: dri-devel@lists.freedesktop.org
Received: from youngberry.canonical.com (youngberry.canonical.com
	[91.189.89.112])
	by gabe.freedesktop.org (Postfix) with ESMTP id 4AEB26E6C8
	for <dri-devel@lists.freedesktop.org>;
	Wed, 20 Aug 2014 06:20:13 -0700 (PDT)
Received: from 5ed49945.cm-7-5c.dynamic.ziggo.nl ([94.212.153.69]
	helo=[192.168.1.128]) by youngberry.canonical.com with esmtpsa
	(TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <maarten.lankhorst@canonical.com>)
	id 1XK5oB-0006qe-Dr; Wed, 20 Aug 2014 13:20:11 +0000
Message-ID: <53F4A08B.80402@canonical.com>
Date: Wed, 20 Aug 2014 15:20:11 +0200
From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:31.0) Gecko/20100101 Thunderbird/31.0
MIME-Version: 1.0
To: =?UTF-8?B?Q2hyaXN0aWFuIEvDtm5pZw==?= <deathsimple@vodafone.de>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>
Subject: Re: [PATCH v3 1/3] drm/radeon: take exclusive_lock in read mode
	during, ring tests, v3
References: <53F341A3.6060902@canonical.com> <53F34BE4.5000907@vodafone.de>
In-Reply-To: <53F34BE4.5000907@vodafone.de>
Cc: "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
	<dri-devel.lists.freedesktop.org>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED,
	RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Hey,

I found another bug related to gpu reset, one that seems to trigger more reliable with the delayed work changes.

Op 19-08-14 om 15:06 schreef Christian König:
> Am 19.08.2014 um 14:22 schrieb Maarten Lankhorst:
>> This is needed for the next commit, because the lockup detection
>> will need the read lock to run.
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>> ---
>> radeon_pm_compute_clocks already checks if dpm is enabled, so no need to check a second time.
>>
>> Because of locking and waiting stuff the radeon_pm_compute_clocks and resume_force_mode calls
>> have to be done with read lock held.
>>
>> Seems to survive on my radeon when catting /sys/kernel/debug/dri/0/radeon_gpu_reset although
>> uvd fails to reset, and that ring gets disabled as a result.
>
> Depending on what hardware you have it's normal that UVD doesn't reset properly. I still haven't figured out the correct sequence in which I need to disable/enable the different UVD blocks on all hardware generations.
>
> It seems to work fine on my Cayman, but doesn't for example on Turks (which at least theoretically should have the same UVD block). It should be fine as long as the engines gets properly disabled when the IB test fails after an reset.
>
> Another common source of reset instability is DPM, while it now seems to be stable on NI and BTC I can't get a single reset to work once I use it.
>
> Regarding the patch it looks good now, but I still want to test it a bit,
> Christian. 

It seems that if UVD fails a second lockup recovery could in theory be
attempted because the uvd ring is locked up, and the delayed work doesn't
check if the ring is still ready or not.

The changes in "drm/radeon: handle lockup in delayed work, v3" expose 
this bug reliably, because it forces hangups to always be checked.

This results in repeated lockup recovery occuring.

Does this alternate patch look better?

I've also attached a patch to this mail test for lockups on all rings, and if they have
any saved ring data the radeon_gpu_reset function will be retried.

diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index e13e408832e5..9253e1253780 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1735,8 +1735,38 @@ int radeon_gpu_reset(struct radeon_device *rdev)
 		radeon_pm_compute_clocks(rdev);
 
 	if (!r) {
-		r = radeon_ib_ring_tests(rdev);
-		if (r && ring_data[RADEON_RING_TYPE_GFX_INDEX])
+		bool retry = false;
+
+		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+			struct radeon_ring *ring = &rdev->ring[i];
+
+			if (!ring->ready)
+				continue;
+
+			r = radeon_ib_test(rdev, i, ring);
+			if (!r)
+				continue;
+
+			ring->ready = false;
+			dev_err(rdev->dev, "failed testing IB on ring %d (%d)\n", i, r);
+
+			if (ring_data[i]) {
+				retry = true;
+				continue;
+			} else {
+				radeon_fence_driver_force_completion(rdev, i);
+				rdev->needs_reset = false;
+				r = 0;
+			}
+
+			if (i == RADEON_RING_TYPE_GFX_INDEX) {
+				dev_err(rdev->dev, "disabling acceleration\n");
+				rdev->accel_working = false;
+				break;
+			}
+		}
+
+		if (retry)
 			r = -EAGAIN;
 	}