From patchwork Tue Jun 11 23:23:14 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 2708021 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id DD56B9F8E1 for ; Wed, 12 Jun 2013 05:41:49 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 34A5E201A8 for ; Wed, 12 Jun 2013 05:41:45 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 3B5C9201A7 for ; Wed, 12 Jun 2013 05:41:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3AFAEE5FF4 for ; Tue, 11 Jun 2013 22:41:44 -0700 (PDT) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by gabe.freedesktop.org (Postfix) with ESMTP id E2F3DE617D for ; Tue, 11 Jun 2013 16:23:19 -0700 (PDT) Received: by mail-pa0-f42.google.com with SMTP id rl6so6006055pac.1 for ; Tue, 11 Jun 2013 16:23:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references :x-gm-message-state; bh=1MJWykNZBuTLK16WLRwr0bMxtxd/yKjMsML8B+rS00o=; b=eID2BgFL8yQC+yWPYmXvxx5+nqGiamUbd3kqLiU90QRDcpEqIYX6kYxQkvHBjebMOh acutTlQCnGullfXFuWXQ+5ce0/y5c/I7QZ3k1nkpEctP0XGRCEGGhxhEfgV5r/CuwM8I 1bswd7rhSDBLvG96QE+H2bmWuMB47J3nbh6jYq9KQ5xOWySAmpsaCUPXWQoPTBhcyKTq stc0baa1qdiLDn51iyUSUy72a6f/1U9IVHapJ+436yLEfWd1h8hLkO7MgxINXhgGYXpU /3Ki/EdQaNv2cXpoE/HNyWu1NPZ1guvVHhFMDjpzF8GFY5t9eDNZyRRnQjjpc4f5aUNk osCA== X-Received: by 10.68.178.33 with SMTP id cv1mr16709908pbc.209.1370992999652; Tue, 11 Jun 2013 16:23:19 -0700 (PDT) Received: from localhost (50-76-60-73-ip-static.hfc.comcastbusiness.net. [50.76.60.73]) by mx.google.com with ESMTPSA id eq4sm21970245pad.16.2013.06.11.16.23.17 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 11 Jun 2013 16:23:18 -0700 (PDT) From: Andy Lutomirski To: Alex Deucher , dri-devel@lists.freedesktop.org Subject: [PATCH] radeon: Fix a false positive lockup after 10s of inactivity Date: Tue, 11 Jun 2013 16:23:14 -0700 Message-Id: X-Mailer: git-send-email 1.8.1.4 In-Reply-To: References: X-Gm-Message-State: ALoCoQkBvoJ9vGQ9x29EIETjqwAEbr2mvz5+fr7eFShdFhxdZeoKZZaZlXYRhKwoyunecohIoeKE X-Mailman-Approved-At: Tue, 11 Jun 2013 22:29:11 -0700 Cc: xorg-driver-ati@lists.x.org, Andy Lutomirski X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org Errors-To: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If the device is idle for over ten seconds, then the next attempt to do anything can race with the lockup detector and cause a bogus lockup to be detected. Oddly, the situation is well-described in the lockup detector's comments and a fix is even described. This patch implements that fix (and corrects some typos in the description). My system has been stable for about a week running this code. Without this, my screen would go blank every now and then and, when it came back, everything would be remarkably slow (the latter is a separate bug). Signed-off-by: Andy Lutomirski Tested-by: Andy Lutomirski --- This may be -stable material. drivers/gpu/drm/radeon/radeon.h | 1 + drivers/gpu/drm/radeon/radeon_ring.c | 23 ++++++++++++++++------- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 8263af3..9de5778 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -652,6 +652,7 @@ struct radeon_ring { unsigned ring_free_dw; int count_dw; unsigned long last_activity; + unsigned long last_test_lockup; unsigned last_rptr; uint64_t gpu_addr; uint32_t align_mask; diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 1ef5eaa..fb7b3ea 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -547,12 +547,12 @@ void radeon_ring_lockup_update(struct radeon_ring *ring) * have CP rptr to a different value of jiffies wrap around which will force * initialization of the lockup tracking informations. * - * A possible false positivie is if we get call after while and last_cp_rptr == - * the current CP rptr, even if it's unlikely it might happen. To avoid this - * if the elapsed time since last call is bigger than 2 second than we return - * false and update the tracking information. Due to this the caller must call - * radeon_ring_test_lockup several time in less than 2sec for lockup to be reported - * the fencing code should be cautious about that. + * A possible false positive is if we get called after a while and + * last_cp_rptr == the current CP rptr, even if it's unlikely it might + * happen. To avoid this if the elapsed time since the last call is bigger + * than 2 second then we return false and update the tracking + * information. Due to this the caller must call radeon_ring_test_lockup + * more frequently than once every 2s when waiting. * * Caller should write to the ring to force CP to do something so we don't get * false positive when CP is just gived nothing to do. @@ -560,10 +560,14 @@ void radeon_ring_lockup_update(struct radeon_ring *ring) **/ bool radeon_ring_test_lockup(struct radeon_device *rdev, struct radeon_ring *ring) { - unsigned long cjiffies, elapsed; + unsigned long cjiffies, elapsed, last_test; uint32_t rptr; cjiffies = jiffies; + + last_test = ring->last_test_lockup; + ring->last_test_lockup = cjiffies; + if (!time_after(cjiffies, ring->last_activity)) { /* likely a wrap around */ radeon_ring_lockup_update(ring); @@ -576,6 +580,11 @@ bool radeon_ring_test_lockup(struct radeon_device *rdev, struct radeon_ring *rin radeon_ring_lockup_update(ring); return false; } + if (cjiffies - last_test > 2 * HZ) { + /* Possible race -- see comment above */ + radeon_ring_lockup_update(ring); + return false; + } elapsed = jiffies_to_msecs(cjiffies - ring->last_activity); if (radeon_lockup_timeout && elapsed >= radeon_lockup_timeout) { dev_err(rdev->dev, "GPU lockup CP stall for more than %lumsec\n", elapsed);