From patchwork Tue May 12 08:59:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542423 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD2231668 for ; Tue, 12 May 2020 09:00:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8F354214DB for ; Tue, 12 May 2020 09:00:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="XMONIu5D" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729306AbgELI77 (ORCPT ); Tue, 12 May 2020 04:59:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729243AbgELI77 (ORCPT ); Tue, 12 May 2020 04:59:59 -0400 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACF3CC061A0F for ; Tue, 12 May 2020 01:59:58 -0700 (PDT) Received: by mail-wm1-x344.google.com with SMTP id h4so20843736wmb.4 for ; Tue, 12 May 2020 01:59:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kLx974NIZMqcTtktEjSZjvXzGFT/sN3zaOFqis6ynJ4=; b=XMONIu5DP+HMrApNuuxLBjrSt037rXZt2ZOk8MGJnrPXZwZxjQNgepQNY5FaUlHKzp NnpLCK27EuKk1i4zc5dJBZgXUUO+sBRUwyP2QJzh8l+TxbAyY+2UG2eYiOM+AqnUuE2h cBEWsoEqk///Vn/WtO2wLLNhQxn15H9TRBZ1g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kLx974NIZMqcTtktEjSZjvXzGFT/sN3zaOFqis6ynJ4=; b=hblhT9CKZwys/dy956wVadx6yi4Ixarn3TA/VHo2XlFODXvSH20gXyxZgI8SSsmLVX owb38HMbwrnWlN8PEYNSYvg6SeeYCycD3vIW3X+Aoovf6WRHomuw+1F/1/VZ3YdKHLQX cuaKieXGMDjodhJPL0XiuoTy2e97J7Tr6sF+U9pKCyoURiwpm0ovG8NZTuZ0FXWcK5Xt ERIZWGXfDHM6ZHqk8v8fSTi8Lr29BXx2B3ApV6jS/aqJ44IJdBIQrPDl2gYJyaRzO6MK xZuBxQwx6oEEIWfIy2ALPSNGcLK+shjQ7TFtBqfuQJdJvwdhYR68FbdTRf3cZa05b1nI OJkA== X-Gm-Message-State: AGi0PuYqyLRrpVUuZyLHxfJUnvnV9zcJwVHn2JoEV7U99geiu1GjxPku LOa8feeWQZuaqfSATFITa+qbaA== X-Google-Smtp-Source: APiQypLaGju2WZXVsNW4k3kb0cNDedDZi8kpgkf+bDmSvA+Y9bm9G0CROCbMCdeWzLKRHchbjRrBkQ== X-Received: by 2002:a05:600c:2dda:: with SMTP id e26mr27240272wmh.42.1589273997304; Tue, 12 May 2020 01:59:57 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.01.59.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 01:59:56 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 01/17] dma-fence: add might_sleep annotation to _wait() Date: Tue, 12 May 2020 10:59:28 +0200 Message-Id: <20200512085944.222637-2-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org But only for non-zero timeout, to avoid false positives. One question here is whether the might_sleep should be unconditional, or only for real timeouts. I'm not sure, so went with the more defensive option. But in the interest of locking down the cross-driver dma_fence rules we might want to be more aggressive. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/dma-buf/dma-fence.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 052a41e2451c..6802125349fb 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -208,6 +208,9 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout) if (WARN_ON(timeout < 0)) return -EINVAL; + if (timeout > 0) + might_sleep(); + trace_dma_fence_wait_start(fence); if (fence->ops->wait) ret = fence->ops->wait(fence, intr, timeout); From patchwork Tue May 12 08:59:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542671 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B668C14C0 for ; Tue, 12 May 2020 09:04:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9D3452184D for ; Tue, 12 May 2020 09:04:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="G3K7uswl" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729381AbgELJD5 (ORCPT ); Tue, 12 May 2020 05:03:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729275AbgELJAB (ORCPT ); Tue, 12 May 2020 05:00:01 -0400 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA3A9C05BD0A for ; Tue, 12 May 2020 01:59:59 -0700 (PDT) Received: by mail-wm1-x344.google.com with SMTP id k12so20802817wmj.3 for ; Tue, 12 May 2020 01:59:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=I6zGk0sRXoiLMVlOrHdYKmGYdqgb8HIUHeWjfHU8LZ8=; b=G3K7uswlmhW+rpia5CreMV57ZkKdKlwZ6FOlSyfm3Oe7LuGmiEiRHD/PML6gjh4GhM OdvDdQt4u51YIl+RwDLbW7P8kjQMLKpQZcHKzE4dlQdQnOJqX7Su2t5BwmDsqLUnYibz JMBsPiQPy7q4aI4KfXGjLvCIMnBefB60TFyyI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=I6zGk0sRXoiLMVlOrHdYKmGYdqgb8HIUHeWjfHU8LZ8=; b=EMEXAO377RQBHIzDUPOtjtO/axQG8USzmlmg32N0+9zNGrGkJ5zwdpYooyQLt05cSI 7IhnRp2mQDDRuEivtHvHbzZBUdCmOTsto2L+iFeOzP+NuldXfzZO5//DE+ds0/1I9XEN pdnW8wEoSPhaXBR4jLPlJTOuXPaxzbzGwNAuE9OKYvwjIdnJ79623QFvgSrUK4Sx5I4c 1Tzt2mqBx//H9J+Y1xuC63DvzMS6VxLryyY9qjrhh9MidWvrzjneOWh+DY6ONenY2e3r Ex6eFyB2lYqSoiSdd0bPib9k696l5AHfu4TE2S/D3dqumWULjH9Otd9x612m5EeewDJK 8Icw== X-Gm-Message-State: AGi0PubtPdsFX5ogD8bf7j8TRpB8TZbaADmdE/qj5Ufz8dm0Sz3ge70w o/dS3jk5Ji4dhTjiiV08MiYuZg== X-Google-Smtp-Source: APiQypLCYf7Wgh0Wtz0pUmQn/xpulX/EPJ0KyLgnUHWq+1CmCns3LkuSYY1ZCdFMCm+vHgIBfsEl0Q== X-Received: by 2002:a1c:3bc5:: with SMTP id i188mr25545127wma.90.1589273998427; Tue, 12 May 2020 01:59:58 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.01.59.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 01:59:57 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 02/17] dma-fence: basic lockdep annotations Date: Tue, 12 May 2020 10:59:29 +0200 Message-Id: <20200512085944.222637-3-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Design is similar to the lockdep annotations for workers, but with some twists: - We use a read-lock for the execution/worker/completion side, so that this explicit annotation can be more liberally sprinkled around. With read locks lockdep isn't going to complain if the read-side isn't nested the same way under all circumstances, so ABBA deadlocks are ok. Which they are, since this is an annotation only. - We're using non-recursive lockdep read lock mode, since in recursive read lock mode lockdep does not catch read side hazards. And we _very_ much want read side hazards to be caught. For full details of this limitation see commit e91498589746065e3ae95d9a00b068e525eec34f Author: Peter Zijlstra Date: Wed Aug 23 13:13:11 2017 +0200 locking/lockdep/selftests: Add mixed read-write ABBA tests - To allow nesting of the read-side explicit annotations we explicitly keep track of the nesting. lock_is_held() allows us to do that. - The wait-side annotation is a write lock, and entirely done within dma_fence_wait() for everyone by default. - To be able to freely annotate helper functions I want to make it ok to call dma_fence_begin/end_signalling from soft/hardirq context. First attempt was using the hardirq locking context for the write side in lockdep, but this forces all normal spinlocks nested within dma_fence_begin/end_signalling to be spinlocks. That bollocks. The approach now is to simple check in_atomic(), and for these cases entirely rely on the might_sleep() check in dma_fence_wait(). That will catch any wrong nesting against spinlocks from soft/hardirq contexts. The idea here is that every code path that's critical for eventually signalling a dma_fence should be annotated with dma_fence_begin/end_signalling. The annotation ideally starts right after a dma_fence is published (added to a dma_resv, exposed as a sync_file fd, attached to a drm_syncobj fd, or anything else that makes the dma_fence visible to other kernel threads), up to and including the dma_fence_wait(). Examples are irq handlers, the scheduler rt threads, the tail of execbuf (after the corresponding fences are visible), any workers that end up signalling dma_fences and really anything else. Not annotated should be code paths that only complete fences opportunistically as the gpu progresses, like e.g. shrinker/eviction code. The main class of deadlocks this is supposed to catch are: Thread A: mutex_lock(A); mutex_unlock(A); dma_fence_signal(); Thread B: mutex_lock(A); dma_fence_wait(); mutex_unlock(A); Thread B is blocked on A signalling the fence, but A never gets around to that because it cannot acquire the lock A. Note that dma_fence_wait() is allowed to be nested within dma_fence_begin/end_signalling sections. To allow this to happen the read lock needs to be upgraded to a write lock, which means that any other lock is acquired between the dma_fence_begin_signalling() call and the call to dma_fence_wait(), and still held, this will result in an immediate lockdep complaint. The only other option would be to not annotate such calls, defeating the point. Therefore these annotations cannot be sprinkled over the code entirely mindless to avoid false positives. v2: handle soft/hardirq ctx better against write side and dont forget EXPORT_SYMBOL, drivers can't use this otherwise. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter Reviewed-by: Maarten Lankhorst Reviewed-by: Thomas Hellstrom --- drivers/dma-buf/dma-fence.c | 53 +++++++++++++++++++++++++++++++++++++ include/linux/dma-fence.h | 12 +++++++++ 2 files changed, 65 insertions(+) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 6802125349fb..d5c0fd2efc70 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -110,6 +110,52 @@ u64 dma_fence_context_alloc(unsigned num) } EXPORT_SYMBOL(dma_fence_context_alloc); +#ifdef CONFIG_LOCKDEP +struct lockdep_map dma_fence_lockdep_map = { + .name = "dma_fence_map" +}; + +bool dma_fence_begin_signalling(void) +{ + /* explicitly nesting ... */ + if (lock_is_held_type(&dma_fence_lockdep_map, 1)) + return true; + + /* rely on might_sleep check for soft/hardirq locks */ + if (in_atomic()) + return true; + + /* ... and non-recursive readlock */ + lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_); + + return false; +} +EXPORT_SYMBOL(dma_fence_begin_signalling); + +void dma_fence_end_signalling(bool cookie) +{ + if (cookie) + return; + + lock_release(&dma_fence_lockdep_map, _RET_IP_); +} +EXPORT_SYMBOL(dma_fence_end_signalling); + +void __dma_fence_might_wait(void) +{ + bool tmp; + + tmp = lock_is_held_type(&dma_fence_lockdep_map, 1); + if (tmp) + lock_release(&dma_fence_lockdep_map, _THIS_IP_); + lock_map_acquire(&dma_fence_lockdep_map); + lock_map_release(&dma_fence_lockdep_map); + if (tmp) + lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _THIS_IP_); +} +#endif + + /** * dma_fence_signal_locked - signal completion of a fence * @fence: the fence to signal @@ -170,14 +216,19 @@ int dma_fence_signal(struct dma_fence *fence) { unsigned long flags; int ret; + bool tmp; if (!fence) return -EINVAL; + tmp = dma_fence_begin_signalling(); + spin_lock_irqsave(fence->lock, flags); ret = dma_fence_signal_locked(fence); spin_unlock_irqrestore(fence->lock, flags); + dma_fence_end_signalling(tmp); + return ret; } EXPORT_SYMBOL(dma_fence_signal); @@ -211,6 +262,8 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout) if (timeout > 0) might_sleep(); + __dma_fence_might_wait(); + trace_dma_fence_wait_start(fence); if (fence->ops->wait) ret = fence->ops->wait(fence, intr, timeout); diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 3347c54f3a87..3f288f7db2ef 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -357,6 +357,18 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep) } while (1); } +#ifdef CONFIG_LOCKDEP +bool dma_fence_begin_signalling(void); +void dma_fence_end_signalling(bool cookie); +#else +static inline bool dma_fence_begin_signalling(void) +{ + return true; +} +static inline void dma_fence_end_signalling(bool cookie) {} +static inline void __dma_fence_might_wait(void) {} +#endif + int dma_fence_signal(struct dma_fence *fence); int dma_fence_signal_locked(struct dma_fence *fence); signed long dma_fence_default_wait(struct dma_fence *fence, From patchwork Tue May 12 08:59:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542663 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3A9B1912 for ; Tue, 12 May 2020 09:03:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1E8F520714 for ; Tue, 12 May 2020 09:03:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="CLBPuhz5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729243AbgELJAD (ORCPT ); Tue, 12 May 2020 05:00:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729316AbgELJAB (ORCPT ); Tue, 12 May 2020 05:00:01 -0400 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E064EC05BD0B for ; Tue, 12 May 2020 02:00:00 -0700 (PDT) Received: by mail-wm1-x344.google.com with SMTP id d207so5888917wmd.0 for ; Tue, 12 May 2020 02:00:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NZNungkfzjrpX3wvUKHqphES4AQA2BjvgdkrK2cjuiE=; b=CLBPuhz5g03xplqY3Lp/3c6jCI7cBpGlK2yI20bsQjZlk+Y0DSRAAzM7Uk0yJ/74fT qODjSJnOacCEhx0BQuKhrp9xCJljtl7WlTp/enOv9Yf5zmkcwWAQwPHdsOTS6iInjswc E2mZfpyZhXG3pjjGV03WLfeasYqdUTvXYUfyk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NZNungkfzjrpX3wvUKHqphES4AQA2BjvgdkrK2cjuiE=; b=k/YrjwgZGmnHklApoRvABqaQgF/lbrqLfyWVYca6u5niUZ6/RNrP8qVVmTO4E/wVt4 GIOcx5zryzFcxepFYpkWHeSCCvG7pkboS9r49mBCHZKURYq02UaGabQAyReAYGnUzvg9 E1JveXNWZ7RIL8VsHd/Y6bmG+KA4QrpWQrz8TN3F9yuOndrH6LV7rOOu9TYTorxQLN5u cm95TehyJZIPCTjXu4BBRxCAmyGNpqb+d6gN1MNspF+xN4oYOyfbXOMk09N5Su6MZGS4 SHTRDQoiOROOfZCxk1Tkl7PP8dz3qIRmc+sTRbHz6kquWJ1I7zU8duNOiodulqSUFY3A 658A== X-Gm-Message-State: AGi0PublZllFwVcd4t3qd6jOdI5QeQYJz0BJbOkU/cDNW4jAQkJLCfwQ L1uGiNj22Pz2PBKy3i4tN3Ln4RE8ppc= X-Google-Smtp-Source: APiQypKft08OoGrIYtbiSYRYQwvgFtVy2fvk5tEE2SUvyg8qKyVKerp+e40UsbcgCmUWXpSeXbccUQ== X-Received: by 2002:a05:600c:2055:: with SMTP id p21mr13223418wmg.127.1589273999661; Tue, 12 May 2020 01:59:59 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.01.59.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 01:59:59 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 03/17] dma-fence: prime lockdep annotations Date: Tue, 12 May 2020 10:59:30 +0200 Message-Id: <20200512085944.222637-4-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Two in one go: - it is allowed to call dma_fence_wait() while holding a dma_resv_lock(). This is fundamental to how eviction works with ttm, so required. - it is allowed to call dma_fence_wait() from memory reclaim contexts, specifically from shrinker callbacks (which i915 does), and from mmu notifier callbacks (which amdgpu does, and which i915 sometimes also does, and probably always should, but that's kinda a debate). Also for stuff like HMM we really need to be able to do this, or things get real dicey. Consequence is that any critical path necessary to get to a dma_fence_signal for a fence must never a) call dma_resv_lock nor b) allocate memory with GFP_KERNEL. Also by implication of dma_resv_lock(), no userspace faulting allowed. That's some supremely obnoxious limitations, which is why we need to sprinkle the right annotations to all relevant paths. The one big locking context we're leaving out here is mmu notifiers, added in commit 23b68395c7c78a764e8963fc15a7cfd318bf187f Author: Daniel Vetter Date: Mon Aug 26 22:14:21 2019 +0200 mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end that one covers a lot of other callsites, and it's also allowed to wait on dma-fences from mmu notifiers. But there's no ready-made functions exposed to prime this, so I've left it out for now. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/dma-buf/dma-resv.c | 1 + include/linux/dma-fence.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 99c0a33c918d..392c336f0732 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -115,6 +115,7 @@ static int __init dma_resv_lockdep(void) if (ret == -EDEADLK) dma_resv_lock_slow(&obj, &ctx); fs_reclaim_acquire(GFP_KERNEL); + __dma_fence_might_wait(); fs_reclaim_release(GFP_KERNEL); ww_mutex_unlock(&obj.lock); ww_acquire_fini(&ctx); diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 3f288f7db2ef..09e23adb351d 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -360,6 +360,7 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep) #ifdef CONFIG_LOCKDEP bool dma_fence_begin_signalling(void); void dma_fence_end_signalling(bool cookie); +void __dma_fence_might_wait(void); #else static inline bool dma_fence_begin_signalling(void) { From patchwork Tue May 12 08:59:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542667 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A85AF139F for ; Tue, 12 May 2020 09:03:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8FE2520714 for ; Tue, 12 May 2020 09:03:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="e347cRrT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729357AbgELJD4 (ORCPT ); Tue, 12 May 2020 05:03:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729246AbgELJAC (ORCPT ); Tue, 12 May 2020 05:00:02 -0400 Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 18C92C061A0C for ; Tue, 12 May 2020 02:00:02 -0700 (PDT) Received: by mail-wm1-x343.google.com with SMTP id z72so12749057wmc.2 for ; Tue, 12 May 2020 02:00:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Z86jDMPjYosEVmpvz8g87itNg+tEFmiX7DLzwmjjruY=; b=e347cRrTqkHcsmJyCgB3TVco449DVfioti3xt++37x7bJTFMigGg5EWlMaf/MjZpmR eYVFJlx9g9Vm+NEIhg3iXwKg86toJqtzlt+iCudluN0BiyTQqaBnCZJLIa0zEEFILrRP R3mU5I3uQHXrIj7OqeDSyBNZDBZCv72rxHs1Q= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Z86jDMPjYosEVmpvz8g87itNg+tEFmiX7DLzwmjjruY=; b=RJ5dy+ppht6XUjjrytCt7cseMzQZWHVEqRKCsMb3YpoQbjixhO4J3wPyAZCRR34I+z 805dZ9jHvKmXYGlzw2PltnOl+NFi7pIFrWdc9/e/QuJEqkYkQUW34m/eNovjkti5FNy4 hAOwx6awwd7GiX2yAr6ytq3Wczksrt1ff+SzHzBDX8i89ls/lshfMZ82QyifJ3cMCwfa ntwaceR8zecW18nh3mIeWzYgLsrDlReV8GLakMcOBwCJNVHGwqWBHtDMKIn7fn9WfH9p JfGIOOXIiow0ucd3Mnb2LvXjIM+Xf5kwlBA+9FEWrNxzZ/RhCzn/cxoUjSKJyNpfFFeM cYTQ== X-Gm-Message-State: AGi0PuZiNYcuMrGxEq536QRuXmf+vHmmPRhGhvB7dGyvZ0p00SzZHH6y AlWwML7rjzjuvqY7J0JUwipbhA== X-Google-Smtp-Source: APiQypJT+3YqWFL2wx/F/qRbovKxB7nonvtIP8MsamotNvxRkHjM8KCOP8ZK10mJhOIGdoSiUWqbbA== X-Received: by 2002:a1c:6344:: with SMTP id x65mr19610953wmb.51.1589274000857; Tue, 12 May 2020 02:00:00 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.01.59.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:00 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter , Rodrigo Siqueira , Haneen Mohammed , Daniel Vetter Subject: [RFC 04/17] drm/vkms: Annotate vblank timer Date: Tue, 12 May 2020 10:59:31 +0200 Message-Id: <20200512085944.222637-5-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This is needed to signal the fences from page flips, annotate it accordingly. We need to annotate entire timer callback since if we get stuck anywhere in there, then the timer stops, and hence fences stop. Just annotating the top part that does the vblank handling isn't enough. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter Cc: Rodrigo Siqueira Cc: Haneen Mohammed Cc: Daniel Vetter --- drivers/gpu/drm/vkms/vkms_crtc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c index ac85e17428f8..a53a40848a72 100644 --- a/drivers/gpu/drm/vkms/vkms_crtc.c +++ b/drivers/gpu/drm/vkms/vkms_crtc.c @@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0+ +#include + #include #include #include @@ -14,7 +16,9 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) struct drm_crtc *crtc = &output->crtc; struct vkms_crtc_state *state; u64 ret_overrun; - bool ret; + bool ret, fence_cookie; + + fence_cookie = dma_fence_begin_signalling(); ret_overrun = hrtimer_forward_now(&output->vblank_hrtimer, output->period_ns); @@ -49,6 +53,8 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) DRM_DEBUG_DRIVER("Composer worker already queued\n"); } + dma_fence_end_signalling(fence_cookie); + return HRTIMER_RESTART; } From patchwork Tue May 12 08:59:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542547 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8D33912 for ; Tue, 12 May 2020 09:01:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 91466207FF for ; Tue, 12 May 2020 09:01:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="Tba/CBQc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729374AbgELJBJ (ORCPT ); Tue, 12 May 2020 05:01:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729352AbgELJAD (ORCPT ); Tue, 12 May 2020 05:00:03 -0400 Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43A4FC061A0F for ; Tue, 12 May 2020 02:00:03 -0700 (PDT) Received: by mail-wr1-x443.google.com with SMTP id l17so793379wrr.4 for ; Tue, 12 May 2020 02:00:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=k3OodW5bIt6jtahEoyPvdqvd8L7bfSaZtDSgdgQ0KSE=; b=Tba/CBQcoFuTPwNcOK0uHe+q5z5Fq6i1EP5lEPO6yQnGb0xZi3OTwd+ZAJUMrRox80 JQxMbr1Hy5zgQtio1imHYMaQf8uOuY/D0v6qpXbtodzS2aPiBGDFHO2m+fo/NUnJQtIa 9QatpvOy7g08efja4D6G7OF9WYNAibebHkhiY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=k3OodW5bIt6jtahEoyPvdqvd8L7bfSaZtDSgdgQ0KSE=; b=rkEWY9oWRmBB9BfyAapqdWVBIzJrdxSbDbFB2X8GbgoVg/t+GeJZFZ19Rm4OkL50YP FvUTeMbO/gfpJopsEI3pnb2YbMg3mK8c3XoWywvL6YXJi81hv1uTCu4We/W7CTM852z1 0yAs5eaEFqy2gLS2liCsXGveFOO5NF6Gc8d64fZrQc/4iYcEcBh3AtNgsXDWpPmvfEXl o3hqnW8j2fyk1/vLeRZ81laSoZO7Yk1sbba7NJW4/63mOwhGSs3rtKMQrR3awDj0/KB2 IQLRryDW3Z4CFz3tg5YY9KSUfIVgQzoBCD9tKkf3UIGOH/eH8FPz0UYYG2IYSl3poa4I f5sQ== X-Gm-Message-State: AGi0Puao2BFw6qCXvw3OGixTjDT08SYb0jR/w3L4Ul9tnYn6/rIxk0d+ 6vPjgbRTjdowbeg8s3Ukqz4SBQ== X-Google-Smtp-Source: APiQypLIE0/1kWTahFNGDnQcsBGsaNy02ojJZXXlJ0KfLdzMgemSxmLWkBaf5jP2NOQfTl/RTuSVRw== X-Received: by 2002:adf:ca0e:: with SMTP id o14mr25265512wrh.254.1589274001957; Tue, 12 May 2020 02:00:01 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:01 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 05/17] drm/vblank: Annotate with dma-fence signalling section Date: Tue, 12 May 2020 10:59:32 +0200 Message-Id: <20200512085944.222637-6-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This is rather overkill since currently all drivers call this from hardirq (or at least timers). But maybe in the future we're going to have thread irq handlers and what not, doesn't hurt to be prepared. Plus this is an easy start for sprinkling these fence annotations into shared code. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/drm_vblank.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c index 758bf74e1cab..125ef0c0f9a1 100644 --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -24,6 +24,7 @@ * OTHER DEALINGS IN THE SOFTWARE. */ +#include #include #include @@ -1895,7 +1896,7 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe) { struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; unsigned long irqflags; - bool disable_irq; + bool disable_irq, fence_cookie; if (WARN_ON_ONCE(!dev->num_crtcs)) return false; @@ -1903,6 +1904,8 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe) if (WARN_ON(pipe >= dev->num_crtcs)) return false; + fence_cookie = dma_fence_begin_signalling(); + spin_lock_irqsave(&dev->event_lock, irqflags); /* Need timestamp lock to prevent concurrent execution with @@ -1915,6 +1918,7 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe) if (!vblank->enabled) { spin_unlock(&dev->vblank_time_lock); spin_unlock_irqrestore(&dev->event_lock, irqflags); + dma_fence_end_signalling(fence_cookie); return false; } @@ -1940,6 +1944,8 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe) if (disable_irq) vblank_disable_fn(&vblank->disable_timer); + dma_fence_end_signalling(fence_cookie); + return true; } EXPORT_SYMBOL(drm_handle_vblank); From patchwork Tue May 12 08:59:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542541 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4FBA2912 for ; Tue, 12 May 2020 09:01:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 389E3207FF for ; Tue, 12 May 2020 09:01:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="igssb8Qu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729373AbgELJAF (ORCPT ); Tue, 12 May 2020 05:00:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729360AbgELJAE (ORCPT ); Tue, 12 May 2020 05:00:04 -0400 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 530AEC061A0F for ; Tue, 12 May 2020 02:00:04 -0700 (PDT) Received: by mail-wm1-x342.google.com with SMTP id z72so12749195wmc.2 for ; Tue, 12 May 2020 02:00:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=z24jafCl7nBxqPpBfTBC2qyOs+r7mA5F/0neHUey0fs=; b=igssb8QuHGML04iloHis5tnnY04ZjIaINqB9XCnUpZujWX95vQOG2XooNZWED5UlM7 +JukOp1TRMGSKFm0zADXJslDBeE5IaWWucPzHRfjbM3UTCO1U9stqeZEwQUg59RkX3V1 2pMZi9aEONM7pA2SIsda8GES3xD/wunlcJq3w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=z24jafCl7nBxqPpBfTBC2qyOs+r7mA5F/0neHUey0fs=; b=UMAQbM+RQ2DDErjTxVR3THFwjisR5czHETh02qj8Nvwn2GWTJL6k9doK0j5JaDaYoH HGyj6hpCobbmiVoXKATFP+D+LHhnlI/IQk1eR+D+ekagK/6hQDBE6VcvgZNg0tG5h9ka ULfb8tyK9Yq6l7YQiMax2mkMCPbHGc3lTnyfkQzeXma5o9IfMn1zUhOxDykY7uRHfl21 isv2nPHj19RU5Xgnso22pgvafzZQCS1/It/d1rsnfO9tJq3MDcQm5Xp0MlZ768uNBxTz 3ty5ZHI8c0P6lniuDopjKA8jqMvISKBZxPkeCvL8cOGhul4qTw2TASMcVNz4Hs365fkK 3vKw== X-Gm-Message-State: AGi0PuZGpfcz8RbEz8S5qId/RrxX2xNIyEjDHNPRgJ0MPm9M2G7uILV4 c4jpQeoNZF0KyzGl7uBG3VlQ1A== X-Google-Smtp-Source: APiQypIrt7GeKZxO1LJQxL9WakNz+jGvu0kVexVIvKusNxir+f0KPfTbonTVhKB0O4boRqtxdkOUNQ== X-Received: by 2002:a1c:3bc5:: with SMTP id i188mr25545486wma.90.1589274003053; Tue, 12 May 2020 02:00:03 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:02 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 06/17] drm/atomic-helper: Add dma-fence annotations Date: Tue, 12 May 2020 10:59:33 +0200 Message-Id: <20200512085944.222637-7-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This is a bit disappointing since we need to split the annotations over all the different parts. I was considering just leaking the critical section into the ->atomic_commit_tail callback of each driver. But that would mean we need to pass the fence_cookie into each driver (there's a total of 13 implementations of this hook right now), so bad flag day. And also a bit leaky abstraction. Hence just do it function-by-function. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/drm_atomic_helper.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 8ac3aa068261..0a541368246e 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1549,6 +1549,7 @@ EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done); void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state) { struct drm_device *dev = old_state->dev; + bool fence_cookie = dma_fence_begin_signalling(); drm_atomic_helper_commit_modeset_disables(dev, old_state); @@ -1560,6 +1561,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state) drm_atomic_helper_commit_hw_done(old_state); + dma_fence_end_signalling(fence_cookie); + drm_atomic_helper_wait_for_vblanks(dev, old_state); drm_atomic_helper_cleanup_planes(dev, old_state); @@ -1579,6 +1582,7 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_tail); void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state) { struct drm_device *dev = old_state->dev; + bool fence_cookie = dma_fence_begin_signalling(); drm_atomic_helper_commit_modeset_disables(dev, old_state); @@ -1591,6 +1595,8 @@ void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state) drm_atomic_helper_commit_hw_done(old_state); + dma_fence_end_signalling(fence_cookie); + drm_atomic_helper_wait_for_vblanks(dev, old_state); drm_atomic_helper_cleanup_planes(dev, old_state); @@ -1606,6 +1612,9 @@ static void commit_tail(struct drm_atomic_state *old_state) ktime_t start; s64 commit_time_ms; unsigned int i, new_self_refresh_mask = 0; + bool fence_cookie; + + fence_cookie = dma_fence_begin_signalling(); funcs = dev->mode_config.helper_private; @@ -1634,6 +1643,8 @@ static void commit_tail(struct drm_atomic_state *old_state) if (new_crtc_state->self_refresh_active) new_self_refresh_mask |= BIT(i); + dma_fence_end_signalling(fence_cookie); + if (funcs && funcs->atomic_commit_tail) funcs->atomic_commit_tail(old_state); else @@ -1789,6 +1800,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, bool nonblock) { int ret; + bool fence_cookie; if (state->async_update) { ret = drm_atomic_helper_prepare_planes(dev, state); @@ -1811,6 +1823,8 @@ int drm_atomic_helper_commit(struct drm_device *dev, if (ret) return ret; + fence_cookie = dma_fence_begin_signalling(); + if (!nonblock) { ret = drm_atomic_helper_wait_for_fences(dev, state, true); if (ret) @@ -1848,6 +1862,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, */ drm_atomic_state_get(state); + dma_fence_end_signalling(fence_cookie); if (nonblock) queue_work(system_unbound_wq, &state->commit_work); else @@ -1856,6 +1871,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, return 0; err: + dma_fence_end_signalling(fence_cookie); drm_atomic_helper_cleanup_planes(dev, state); return ret; } From patchwork Tue May 12 08:59:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542543 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED9D114C0 for ; Tue, 12 May 2020 09:01:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D6B07207FF for ; Tue, 12 May 2020 09:01:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="bjQj5zUo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729404AbgELJBH (ORCPT ); Tue, 12 May 2020 05:01:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729370AbgELJAF (ORCPT ); Tue, 12 May 2020 05:00:05 -0400 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77718C061A0F for ; Tue, 12 May 2020 02:00:05 -0700 (PDT) Received: by mail-wr1-x444.google.com with SMTP id j5so14362024wrq.2 for ; Tue, 12 May 2020 02:00:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9rtZPjL4qCIHg+K9tvyoRbu8glhfJrL+U886kKa9MgQ=; b=bjQj5zUoHFxr/SGM4dy27f04NAwPFLqit1mjhUpVM48tJB0DvrtAI/QUt4Uq4fHY11 5zIrzzFoaYD8bqvQgrOwaiqaU/6WtYEiZX2CakVoxiDZFsrz5WjYusAXBDM7dzkZzpGs nwDoH4Vjs2gY1QwHAO2mv8pc5dm8xpRypRhzw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9rtZPjL4qCIHg+K9tvyoRbu8glhfJrL+U886kKa9MgQ=; b=Kerkgdvry+CDF+jY4Ij8h6FQjjtbYP3stxDqG0ANxzAlLzZbAL4s6BF24u9M+/ND+f kpgFYX0zfd9J4/ER/nWvZD1WQp+el6+WPOKOeUCnMZ0iaFb5xJO/5duxtYA5MIPrabj3 itcuF79Rf9xNmgQBGrTj6cT9Bipr7xmQBWXn6lQinl2zZ9qiH06+Jk7ceNKF9iQQupYG nR3A/g7HBXdVCsB8noiXNttcu57CScXgMiMANAY+PJuWtE5+OMNyGkBJWBxXEewXZ5c+ uBNAie6GZy+m8KFFvOkf4chNoKeHpxR6IIrtRa/nbU8xa83L4JUaPNLMmvq5j8ZRmJeq k6ag== X-Gm-Message-State: AGi0PuaYiwMHMgaBdWYkYsDDiyK5wzQWy23R9Xr8eeFs8UjC/QoihuZg EVTIKlQ7VFoSWL49nWx4fRXTDw== X-Google-Smtp-Source: APiQypJ4sp6ae2VxawQXYuyWMb/LII0NAOyrZR4XJi9uK+LfraVsh0bXQKwL7UmDORgQR8hkYHBnZA== X-Received: by 2002:adf:ec88:: with SMTP id z8mr22394542wrn.44.1589274004220; Tue, 12 May 2020 02:00:04 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:03 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 07/17] drm/amdgpu: add dma-fence annotations to atomic commit path Date: Tue, 12 May 2020 10:59:34 +0200 Message-Id: <20200512085944.222637-8-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org I need a canary in a ttm-based atomic driver to make sure the dma_fence_begin/end_signalling annotations actually work. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index ea0e039a667a..4469a8c96b08 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -57,6 +57,7 @@ #include "ivsrcid/ivsrcid_vislands30.h" +#include #include #include #include @@ -7109,6 +7110,9 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state) struct drm_connector_state *old_con_state, *new_con_state; struct dm_crtc_state *dm_old_crtc_state, *dm_new_crtc_state; int crtc_disable_count = 0; + bool fence_cookie; + + fence_cookie = dma_fence_begin_signalling(); drm_atomic_helper_update_legacy_modeset_state(dev, state); @@ -7389,6 +7393,8 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state) /* Signal HW programming completion */ drm_atomic_helper_commit_hw_done(state); + dma_fence_end_signalling(fence_cookie); + if (wait_for_vblank) drm_atomic_helper_wait_for_flip_done(dev, state); From patchwork Tue May 12 08:59:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542537 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F07E4139F for ; Tue, 12 May 2020 09:01:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D8BBB2082E for ; Tue, 12 May 2020 09:01:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="QOpkeUs0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729247AbgELJBF (ORCPT ); Tue, 12 May 2020 05:01:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729393AbgELJAI (ORCPT ); Tue, 12 May 2020 05:00:08 -0400 Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FD7AC061A0C for ; Tue, 12 May 2020 02:00:06 -0700 (PDT) Received: by mail-wr1-x442.google.com with SMTP id l17so793673wrr.4 for ; Tue, 12 May 2020 02:00:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QpRcEBLBFti7+uwIPhFpaG5e28opFzcZsYVebGatC0I=; b=QOpkeUs0h0wN4sti4X0g6ThD3SrxXZsFWkczPFMfaK9K848FuZY/LBbwUozpO63nQf eTJs+eMU1PYoj07tdTFH4MaPU0yMQAi0jxHqRvYdOuXMTkIN/Mt01aR+HFippq1ZJM2r G6xGmiSojmbfOZiM1YCpKK0rOM2Epr1zxLZfQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QpRcEBLBFti7+uwIPhFpaG5e28opFzcZsYVebGatC0I=; b=N8NQkhldrF96J1NPqDfOjAAG6JoXgkDg1YebcE38NbymxPaMW1EKbVqs5unUr0Mm5c 4YP3afwJrwwQcflpAaWMl/z0pTbw3oOaZeTRqEEVD11mTuU/WVBWbbDusSulsHMZrWCv m3p+tLeA39WkoozSEo3oiLC9+7dgxdh1FlceWDHNg+10zLJLMc3BEp43E9mVdHme1pug J6cuUIGM/Jqg7/59Q2OhNt/8Iv40n7kswhQp1LmW1tW6r87ZJuXB3ETRdRja+uun7cGe rB+tQWOpufK/qNGBMSmceI9QUSm3cCtIzISxHHmiSmYXK60V6A9DoKPL00tf7K2Z2f+V M8Ig== X-Gm-Message-State: AGi0Pub/SljD1y906vqCOkuk5m9CUVLpUF/1t10J1VrDZSIAfOsunXIv M5jpnGPCI8eQaypz/jsUO+amzA== X-Google-Smtp-Source: APiQypKd1E6Oeu9FhxQgfCRXfBgs5PqqzFNBcX4uC9QIYdSJsh2y79L6Is6aCs3MeuVyGs7xUaybiQ== X-Received: by 2002:adf:8302:: with SMTP id 2mr24564990wrd.114.1589274005258; Tue, 12 May 2020 02:00:05 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:04 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 08/17] drm/scheduler: use dma-fence annotations in main thread Date: Tue, 12 May 2020 10:59:35 +0200 Message-Id: <20200512085944.222637-9-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org If the scheduler rt thread gets stuck on a mutex that we're holding while waiting for gpu workloads to complete, we have a problem. Add dma-fence annotations so that lockdep can check this for us. I've tried to quite carefully review this, and I think it's at the right spot. But obviosly no expert on drm scheduler. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/scheduler/sched_main.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 2f319102ae9f..06a736e506ad 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -763,9 +763,12 @@ static int drm_sched_main(void *param) struct sched_param sparam = {.sched_priority = 1}; struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param; int r; + bool fence_cookie; sched_setscheduler(current, SCHED_FIFO, &sparam); + fence_cookie = dma_fence_begin_signalling(); + while (!kthread_should_stop()) { struct drm_sched_entity *entity = NULL; struct drm_sched_fence *s_fence; @@ -823,6 +826,9 @@ static int drm_sched_main(void *param) wake_up(&sched->job_scheduled); } + + dma_fence_end_signalling(fence_cookie); + return 0; } From patchwork Tue May 12 08:59:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542525 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E437214C0 for ; Tue, 12 May 2020 09:00:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CD27C214DB for ; Tue, 12 May 2020 09:00:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="SDwGwQpJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729419AbgELJAJ (ORCPT ); Tue, 12 May 2020 05:00:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729395AbgELJAI (ORCPT ); Tue, 12 May 2020 05:00:08 -0400 Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81579C061A0E for ; Tue, 12 May 2020 02:00:08 -0700 (PDT) Received: by mail-wr1-x443.google.com with SMTP id l11so8435858wru.0 for ; Tue, 12 May 2020 02:00:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=3/9uzJ3fevUELKKtrW0wppzXu6P75uy22NGdQSlmDz8=; b=SDwGwQpJaRBMJHLwj1aebHZrNlfPB3lM8R5Nxnw2iursO6/Tq0w7BeIxMzSmD391GY iPFvU8Hqv0qaq3/SShkh7frLZZM3Zpz71K/gsOA/w0EdvszMgINIghxumAUMRmpCHVZ6 PNjddJYrJBWaP/HRgZfmAEs0NKLQNEFW8NDqo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3/9uzJ3fevUELKKtrW0wppzXu6P75uy22NGdQSlmDz8=; b=hokriPQdR56/PgpZJmlpRQqiXcZzKqMedLM/gQsyCG6krAqmdQKfgpwu/51OOhVer4 gPWlVDsnPmuxCbineQn4QO4p40OWNqtk5Otzq5CKC2XfOOpYqDe6lz9cMLb1BjFPUorL f8rSBLAdZzCVvxy+yPGyQO7wFemFg/OtQBkmoxjX/4Dqkw+X7H53GmpqUxCs+KaAvh2b tTYalNQ1zeGPVTQQvpJsrBEBw8mEUEnFupUPVstkKgYl8TXLwABq8gwqaE2YhWAK0FeV +V+K3kghUIJttfW38yfo+jgCa0Xl5MzUhtvI4AW4BVvHTT3rJdCy3BKC8S3uZ4fsMxRr Fdug== X-Gm-Message-State: AGi0PuYsofn7E1M8pzSXZByS/Ic7OsqiWWyUfDMbGvRCGWuIKXxE08L2 lOrncjtV0RUgKv7mD016VYzJjA== X-Google-Smtp-Source: APiQypJNdraV8h80zz8iXZE8tE3/9vYAqexMkf4hJDDcWF5MN2cVkPdcNZnVM3rSm/bq3B4xJ2e1wQ== X-Received: by 2002:a5d:6702:: with SMTP id o2mr17490653wru.231.1589274006358; Tue, 12 May 2020 02:00:06 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:05 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 09/17] drm/amdgpu: use dma-fence annotations in cs_submit() Date: Tue, 12 May 2020 10:59:36 +0200 Message-Id: <20200512085944.222637-10-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This is a bit tricky, since ->notifier_lock is held while calling dma_fence_wait we must ensure that also the read side (i.e. dma_fence_begin_signalling) is on the same side. If we mix this up lockdep complaints, and that's again why we want to have these annotations. A nice side effect of this is that because of the fs_reclaim priming for dma_fence_enable lockdep now automatically checks for us that nothing in here allocates memory, without even running any userptr workloads. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 7653f62b1b2d..6db3f3c629b0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1213,6 +1213,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, struct amdgpu_job *job; uint64_t seq; int r; + bool fence_cookie; job = p->job; p->job = NULL; @@ -1227,6 +1228,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, */ mutex_lock(&p->adev->notifier_lock); + fence_cookie = dma_fence_begin_signalling(); + /* If userptr are invalidated after amdgpu_cs_parser_bos(), return * -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl. */ @@ -1264,12 +1267,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm); ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence); + dma_fence_end_signalling(fence_cookie); mutex_unlock(&p->adev->notifier_lock); return 0; error_abort: drm_sched_job_cleanup(&job->base); + dma_fence_end_signalling(fence_cookie); mutex_unlock(&p->adev->notifier_lock); error_unlock: From patchwork Tue May 12 08:59:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542521 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39B9214C0 for ; Tue, 12 May 2020 09:00:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1D57724957 for ; Tue, 12 May 2020 09:00:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="GdZJjNq2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729261AbgELJA4 (ORCPT ); Tue, 12 May 2020 05:00:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729422AbgELJAJ (ORCPT ); Tue, 12 May 2020 05:00:09 -0400 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE12FC05BD0C for ; Tue, 12 May 2020 02:00:08 -0700 (PDT) Received: by mail-wm1-x342.google.com with SMTP id m12so15932742wmc.0 for ; Tue, 12 May 2020 02:00:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=r5PRcG/JtprEwCOUf2/ZPMLqKS9RbW1APrGVc2im3Nw=; b=GdZJjNq28F9KcRJyjlTagS8E3fkL1j9MVdiS32+c3J3oz4My0gpMGd7IrjrVwziyZj 3INpjpPYeyhbD4+NwUOkIx7M0B6lMTqDeQsZGgAtFaTLfzA0CcV9W86y8Gzd1CxGRleL BXUlHw6StLiPmKsmxJLvaFWoCXKd6BPsYXKG0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r5PRcG/JtprEwCOUf2/ZPMLqKS9RbW1APrGVc2im3Nw=; b=obD5sDULM/NB3tiUWV/RCGioX5XDCRgAWs6uEfqLmHBOCOO2HIcG7L+/xUelB4JSJ5 facVq+JppJD4gxvQNCVv9lDuJJvb+F8tMqUD+mogRG6EQKZjuH8IfWDfW7I82Fz/DpX9 H/uh6AQHmkX3aYPhQkT0we4uwUzjOcBciJen9/s74yvBcSaKnfwAwJ9/kEl+64qi1dvS 3bHrzbHjDmZ0s4ErtoIwQpwDR4J0Al4sTmJLu+0ATXgArZiJ7xBNyQVwTBTqKRda4bLE oM70N9AXgn0zdarxQqNI4EU20/WfJRvk+/nR9Hv3RjvpMGpEuqi+hzGYZRCiPVPh0zr8 dsxA== X-Gm-Message-State: AGi0Pubv2s3eQg/k39IA/4UAMHaPDcsq8ejSEwxH+rr03+A6NUwI7xui vk8L5Fy3ncVqoI7GIEM8YGJcmw== X-Google-Smtp-Source: APiQypLn0gzMJwxb1G5NuhzTlxFQ4Fcx0V6C0P/n0rDCyjAZoEVClTf3ENM81ku64jmXd4+qf9UzyQ== X-Received: by 2002:a05:600c:2055:: with SMTP id p21mr13224168wmg.127.1589274007383; Tue, 12 May 2020 02:00:07 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:06 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 10/17] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Date: Tue, 12 May 2020 10:59:37 +0200 Message-Id: <20200512085944.222637-11-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org My dma-fence lockdep annotations caught an inversion because we allocate memory where we really shouldn't: kmem_cache_alloc+0x2b/0x6d0 amdgpu_fence_emit+0x30/0x330 [amdgpu] amdgpu_ib_schedule+0x306/0x550 [amdgpu] amdgpu_job_run+0x10f/0x260 [amdgpu] drm_sched_main+0x1b9/0x490 [gpu_sched] kthread+0x12e/0x150 Trouble right now is that lockdep only validates against GFP_FS, which would be good enough for shrinkers. But for mmu_notifiers we actually need !GFP_ATOMIC, since they can be called from any page laundering, even if GFP_NOFS or GFP_NOIO are set. I guess we should improve the lockdep annotations for fs_reclaim_acquire/release. Ofc real fix is to properly preallocate this fence and stuff it into the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of the way. v2: Two more allocations in scheduler paths. Frist one: __kmalloc+0x58/0x720 amdgpu_vmid_grab+0x100/0xca0 [amdgpu] amdgpu_job_dependency+0xf9/0x120 [amdgpu] drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched] drm_sched_main+0xf9/0x490 [gpu_sched] Second one: kmem_cache_alloc+0x2b/0x6d0 amdgpu_sync_fence+0x7e/0x110 [amdgpu] amdgpu_vmid_grab+0x86b/0xca0 [amdgpu] amdgpu_job_dependency+0xf9/0x120 [amdgpu] drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched] drm_sched_main+0xf9/0x490 [gpu_sched] Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index d878fe7fee51..055b47241bb1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, uint32_t seq; int r; - fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL); + fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC); if (fence == NULL) return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c index fe92dcd94d4a..fdcd6659f5ad 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm, if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait)) return amdgpu_sync_fence(sync, ring->vmid_wait, false); - fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL); + fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC); if (!fences) return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c index b87ca171986a..330476cc0c86 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c @@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f, if (amdgpu_sync_add_later(sync, f, explicit)) return 0; - e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL); + e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC); if (!e) return -ENOMEM; From patchwork Tue May 12 08:59:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542517 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5801D1668 for ; Tue, 12 May 2020 09:00:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4080C207FF for ; Tue, 12 May 2020 09:00:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="h3/mYRp1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729306AbgELJAy (ORCPT ); Tue, 12 May 2020 05:00:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729425AbgELJAK (ORCPT ); Tue, 12 May 2020 05:00:10 -0400 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C689AC061A0C for ; Tue, 12 May 2020 02:00:09 -0700 (PDT) Received: by mail-wr1-x444.google.com with SMTP id e1so416507wrt.5 for ; Tue, 12 May 2020 02:00:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jrOYK5IswDhDcmGB8LsPi+9Nw1bfVsPgV33DAHzX+ls=; b=h3/mYRp1OcqYZQlQX+Gw+Nr2f74xD8wrG/8uWoDrU5GGp6Ub/5RiOXKsHrW2i30PcM rJObZIMZcRT/PeeXQkroNNnNFRFiAD5IX8OIVMx7wsr+QVbeILzn/npECakJZp/sCWwy NqAHegH3rJ0pwC/jz0jj1m+H5qlwdREu7tJoQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jrOYK5IswDhDcmGB8LsPi+9Nw1bfVsPgV33DAHzX+ls=; b=hkJLeSZJcxntqEwJDV5Nzqwv+8OjfhTts3HLSO4HSHotDOzwVEEZJ/A8PxNXfiPkwH o5qJlqaIKkPD5Gcq8EGKSXvLQgwF/rOm74NtEzQcZkN8RUgnUHSODF384Zs+iWsn98KR 16GKBa1acOKEqfdT80k5AlwaDY2fdF7OcH4f0mf3dPomgA5XdXyE/T+k7I49/NlrtrXQ /j/ZoEPGEj4Ujc7bMxqGlOOqwIWFR3Y8KaRl44mEHiVrwnW3Jr9pLQz6RUEPug5WU0cb u2DIvrdNOyWmdZYpwK8tycfg8+oxTSvbcK0YlvjJOI4smCJ1X3w+a+c7rJaBevo6Qh9X KEkg== X-Gm-Message-State: AGi0PuYskwKOf2TVqoPFm3CgS02OPbPuk1CRIf6JWrq1fJMMvY8hVTbE xrB7rFC499fYNxPjiMT+r6Yx3g== X-Google-Smtp-Source: APiQypLaPNHMZXOD+W5pUCFXhyzl8NdTydYhe1gyIbn3OV0HN9J6C0ZeHve71lUzcmTo+4IIHmsQng== X-Received: by 2002:a5d:6283:: with SMTP id k3mr23346958wru.62.1589274008569; Tue, 12 May 2020 02:00:08 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:08 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 11/17] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Date: Tue, 12 May 2020 10:59:38 +0200 Message-Id: <20200512085944.222637-12-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Not going to bother with a complete&pretty commit message, just offending backtrace: kvmalloc_node+0x47/0x80 dc_create_state+0x1f/0x60 [amdgpu] dc_commit_state+0xcb/0x9b0 [amdgpu] amdgpu_dm_atomic_commit_tail+0xd31/0x2010 [amdgpu] commit_tail+0xa4/0x140 [drm_kms_helper] drm_atomic_helper_commit+0x152/0x180 [drm_kms_helper] drm_client_modeset_commit_atomic+0x1ea/0x250 [drm] drm_client_modeset_commit_locked+0x55/0x190 [drm] drm_client_modeset_commit+0x24/0x40 [drm] v2: Found more in DC code, I'm just going to pile them all up. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/atom.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- drivers/gpu/drm/amd/display/dc/core/dc.c | 4 +++- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c b/drivers/gpu/drm/amd/amdgpu/atom.c index 4cfc786699c7..1b0c674fab25 100644 --- a/drivers/gpu/drm/amd/amdgpu/atom.c +++ b/drivers/gpu/drm/amd/amdgpu/atom.c @@ -1226,7 +1226,7 @@ static int amdgpu_atom_execute_table_locked(struct atom_context *ctx, int index, ectx.abort = false; ectx.last_jump = 0; if (ws) - ectx.ws = kcalloc(4, ws, GFP_KERNEL); + ectx.ws = kcalloc(4, ws, GFP_ATOMIC); else ectx.ws = NULL; diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 4469a8c96b08..9bfaa4cad483 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6622,7 +6622,7 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, struct dc_stream_update stream_update; } *bundle; - bundle = kzalloc(sizeof(*bundle), GFP_KERNEL); + bundle = kzalloc(sizeof(*bundle), GFP_ATOMIC); if (!bundle) { dm_error("Failed to allocate update bundle\n"); diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 401d1c66a411..a37a32442a5a 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -1416,8 +1416,10 @@ bool dc_post_update_surfaces_to_stream(struct dc *dc) struct dc_state *dc_create_state(struct dc *dc) { + /* No you really cant allocate random crap here this late in + * atomic_commit_tail. */ struct dc_state *context = kvzalloc(sizeof(struct dc_state), - GFP_KERNEL); + GFP_ATOMIC); if (!context) return NULL; From patchwork Tue May 12 08:59:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542499 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 10C9414B4 for ; Tue, 12 May 2020 09:00:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ED0EF206A3 for ; Tue, 12 May 2020 09:00:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="PnuV7xfl" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729443AbgELJAu (ORCPT ); Tue, 12 May 2020 05:00:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729437AbgELJAM (ORCPT ); Tue, 12 May 2020 05:00:12 -0400 Received: from mail-wr1-x441.google.com (mail-wr1-x441.google.com [IPv6:2a00:1450:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E56A8C05BD09 for ; Tue, 12 May 2020 02:00:10 -0700 (PDT) Received: by mail-wr1-x441.google.com with SMTP id l11so8436011wru.0 for ; Tue, 12 May 2020 02:00:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=U6eMxQ6gGsP+PGOv8CvoIWRRS5TJUyI8UxMjiqc47Mo=; b=PnuV7xflEjOlvfaUhZ+KpylEZ6B+nAuxnCL/fVvuzcQE1nkl8ut88s1ez31P+4ljF4 iSgH127tca2DU8l3HJWyuwwLmSRVfbD3QBzO1HejgP3sc52tGIzmsdHjyMsMiHc5wDaC KKhsrvU1VK5jPM+f5/epbuW0a2Bm2UKTZqW0g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=U6eMxQ6gGsP+PGOv8CvoIWRRS5TJUyI8UxMjiqc47Mo=; b=RZ4sZ98QTIopyNtAnqoiDRyYODQEii9zdzVuhgtbUdWOFAhL1bI4Dm6uPqWaYPlnUs qatSztXWN/MvKSWoPr/UQ9mObWOOU/W3TfaszIqIdL3rM1rm7BGunSvXSScFJOkYSTN9 zW7Dzz1T9Uit6/D4jhbJiL+mgh9TrG7FYoLvf/d68OGSCkM8abcbVA3wFgSs4oxWxgB4 s6dzKBWp69bLPKKEwljohTUKcT+HwyuNQ353pvjxk5258TLa03MEjQwNswbhWzUaNPV9 t9wF9o7ZzxoV/11n/exhQrCahyTejPiUiQOK1QjiI47xnYjVPWl3zkvnAEMpXw2iyZnH jEpQ== X-Gm-Message-State: AGi0PuaR34YM2N9oFc5rNbDwnDAyPF0DPD41TcCuiMYFAmlRJRVdSWDH b7bxc9OtRPv4bQmvOYisiWUZPg== X-Google-Smtp-Source: APiQypLQNaARhwh4BlYnWAU5WfsoUDeU0pdENYNXKn3KrAt4Qe7dKmRwhwnGgQvcs+0sUHauBpwL7w== X-Received: by 2002:adf:e751:: with SMTP id c17mr25218165wrn.351.1589274009683; Tue, 12 May 2020 02:00:09 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:09 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 12/17] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Date: Tue, 12 May 2020 10:59:39 +0200 Message-Id: <20200512085944.222637-13-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Trying to grab dma_resv_lock while in commit_tail before we've done all the code that leads to the eventual signalling of the vblank event (which can be a dma_fence) is deadlock-y. Don't do that. Here the solution is easy because just grabbing locks to read something races anyway. We don't need to bother, READ_ONCE is equivalent. And avoids the locking issue. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 9bfaa4cad483..28e1af9f823c 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6699,7 +6699,11 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, * explicitly on fences instead * and in general should be called for * blocking commit to as per framework helpers + * + * Yes, this deadlocks, since you're calling dma_resv_lock in a + * path that leads to a dma_fence_signal(). Don't do that. */ +#if 0 r = amdgpu_bo_reserve(abo, true); if (unlikely(r != 0)) DRM_ERROR("failed to reserve buffer before flip\n"); @@ -6709,6 +6713,12 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, tmz_surface = amdgpu_bo_encrypted(abo); amdgpu_bo_unreserve(abo); +#endif + /* + * this races anyway, so READ_ONCE isn't any better or worse + * than the stuff above. Except the stuff above can deadlock. + */ + tiling_flags = READ_ONCE(abo->tiling_flags); fill_dc_plane_info_and_addr( dm->adev, new_plane_state, tiling_flags, From patchwork Tue May 12 08:59:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542505 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BBEEE14B4 for ; Tue, 12 May 2020 09:00:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A0A8A214DB for ; Tue, 12 May 2020 09:00:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="LgQftZug" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729144AbgELJAv (ORCPT ); Tue, 12 May 2020 05:00:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729434AbgELJAM (ORCPT ); Tue, 12 May 2020 05:00:12 -0400 Received: from mail-wr1-x441.google.com (mail-wr1-x441.google.com [IPv6:2a00:1450:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47FC7C061A0F for ; Tue, 12 May 2020 02:00:12 -0700 (PDT) Received: by mail-wr1-x441.google.com with SMTP id j5so14362577wrq.2 for ; Tue, 12 May 2020 02:00:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cSDqH5ij1BPhN02yjRk7X7Ch94EKjTTeY1hANr+YPIg=; b=LgQftZug1+s6r/6sfV0w6yEj/NMgDnXPjQ9rogWtkgMYDejoFBo0bPnwGW56hu1d7I qs3FiIgJnTmbmBqRQhBg/d/9mf6F8tXHWnk5q6Y4hrcpMMBBXw0ZHjxKx/PXQDwpH8gi /lG4giH5AqDaIZXE6JQNuQUTWIUktkeQZkN8E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cSDqH5ij1BPhN02yjRk7X7Ch94EKjTTeY1hANr+YPIg=; b=UP+z6oyxz1ah1HnVgEWF+7uP738PcnJ/ygzy3C4rlHpclH5v7whf9QteGWglmwnMI/ ktzuxcRNRtyMjEUQAWIaGE2TNajF4/VCZCZhdytmKwEeBs5/5O1Uj20EYCpg10tIJAAa HGQ4sSHAmQlMVaj30NyA0bUTm8/3HLpOljF81FHEXYULLnarpN8rE4pIi/x/1PhHpqqI DjwL/t/Bke863kz1XpXiAXRHWzGivSqbUtKtxf4afcujGgjWVxrAHCcj7YpPQj2hj0qA 4ocSC8H08dyZqy704k6nFht71BAhRVNOBlJ+MD4YgdniyiiWlpsC1Z2VAKIO3qV6jFYm eF6w== X-Gm-Message-State: AGi0PuZcA8Qz3vwMhzMqvTrh5SgoUUUkMggD9mwLHnzVTgeoII6tJm10 t3V0BMKUFGxTJLODURx3Kp+czQ== X-Google-Smtp-Source: APiQypIHDkKnk39vr0THs9MDd2lPRfxpCvBWLiPU8TBM9Ly4fBvHufkGUrgX+LdKysku7drbWjQ/Mg== X-Received: by 2002:a5d:68c7:: with SMTP id p7mr25405016wrw.29.1589274010831; Tue, 12 May 2020 02:00:10 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:10 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 13/17] drm/scheduler: use dma-fence annotations in tdr work Date: Tue, 12 May 2020 10:59:40 +0200 Message-Id: <20200512085944.222637-14-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org In the face of unpriviledged userspace being able to submit bogus gpu workloads the kernel needs gpu timeout and reset (tdr) to guarantee that dma_fences actually complete. Annotate this worker to make sure we don't have any accidental locking inversions or other problems lurking. Originally this was part of the overall scheduler annotation patch. But amdgpu has some glorious inversions here: - grabs console_lock - does a full modeset, which grabs all kinds of locks (drm_modeset_lock, dma_resv_lock) which can deadlock with dma_fence_wait held inside them. - almost minor at that point, but the modeset code also allocates memory These all look like they'll be very hard to fix properly, the hardware seems to require a full display reset with any gpu recovery. Hence split out as a seperate patch. Since amdgpu isn't the only hardware driver that needs to reset the display (at least gen2/3 on intel have the same problem) we need a generic solution for this. There's two tricks we could still from drm/i915 and lift to dma-fence: - The big whack, aka force-complete all fences. i915 does this for all pending jobs if the reset is somehow stuck. Trouble is we'd need to do this for all fences in the entire system, and just the book-keeping for that will be fun. Plus lots of drivers use fences for all kinds of internal stuff like memory management, so unconditionally resetting all of them doesn't work. I'm also hoping that with these fence annotations we could enlist lockdep in finding the last offenders causing deadlocks, and we could remove this get-out-of-jail trick. - The more feasible approach (across drivers at least as part of the dma_fence contract) is what drm/i915 does for gen2/3: When we need to reset the display we wake up all dma_fence_wait_interruptible calls, or well at least the equivalent of those in i915 internally. Relying on ioctl restart we force all other threads to release their locks, which means the tdr thread is guaranteed to be able to get them. I think we could implement this at the dma_fence level, including proper lockdep annotations. dma_fence_begin_tdr(): - must be nested within a dma_fence_begin/end_signalling section - will wake up all interruptible (but not the non-interruptible) dma_fence_wait() calls and force them to complete with a -ERESTARTSYS errno code. All new interrupitble calls to dma_fence_wait() will immeidately fail with the same error code. dma_fence_end_trdr(): - this will convert dma_fence_wait() calls back to normal. Of course interrupting dma_fence_wait is only ok if the caller specified that, which means we need to split the annotations into interruptible and non-interruptible version. If we then make sure that we only use interruptible dma_fence_wait() calls while holding drm_modeset_lock we can grab them in tdr code, and allow display resets. Doing the same for dma_resv_lock might be a lot harder, so buffer updates must be avoided. What's worse, we're not going to be able to make the dma_fence_wait calls in mmu-notifiers interruptible, that doesn't work. So allocating memory still wont' be allowed, even in tdr sections. Plus obviously we can use this trick only in tdr, it is rather intrusive. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/scheduler/sched_main.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 06a736e506ad..e34a44376e87 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -279,9 +279,12 @@ static void drm_sched_job_timedout(struct work_struct *work) { struct drm_gpu_scheduler *sched; struct drm_sched_job *job; + bool fence_cookie; sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); + fence_cookie = dma_fence_begin_signalling(); + /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->ring_mirror_list, @@ -313,6 +316,8 @@ static void drm_sched_job_timedout(struct work_struct *work) spin_lock(&sched->job_list_lock); drm_sched_start_timeout(sched); spin_unlock(&sched->job_list_lock); + + dma_fence_end_signalling(fence_cookie); } /** From patchwork Tue May 12 08:59:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542485 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8816115AB for ; Tue, 12 May 2020 09:00:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7191624953 for ; Tue, 12 May 2020 09:00:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="eo6XCFnG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729518AbgELJAn (ORCPT ); Tue, 12 May 2020 05:00:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729452AbgELJAN (ORCPT ); Tue, 12 May 2020 05:00:13 -0400 Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 816F0C05BD09 for ; Tue, 12 May 2020 02:00:13 -0700 (PDT) Received: by mail-wm1-x343.google.com with SMTP id e26so20839184wmk.5 for ; Tue, 12 May 2020 02:00:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=u5gM1ohPbNVORxsODI5ngB8Z3LljD1/NzUrdRFpQw6k=; b=eo6XCFnGqS3UTbV+XFLkZTkDoh8UHvgOyRRDESLFJ7MP+5Cj3/IxjzAl127GN5O9rY VXDXGt6n7g7rCnvzywPZOWCjrMfNywQXd3+W+fn3un22WolNWV/n9WI+YAZlftq1Ve6F gC89CnJ6agJOr1dSjp+3V/HsmrsJiwJEQFsVU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=u5gM1ohPbNVORxsODI5ngB8Z3LljD1/NzUrdRFpQw6k=; b=RtYkfExgkDfSpUjp0t3OK9sAgVKrFlOSSBnZFi10FGxcCojxBWL44g05Q5M+NIRiC6 auAZQ3EwgqygK1c0ArX2z+NsNHP0dukarChFEu1F3DLHsalXAV8apIUg91OWkHNBuArZ zAIpGVzmGjWSvuC8vzi2f0zl3UKgD/SJ5CFZKyzkosIJRozOKv1d4TsGMHjMZI3nLItX g/bQuARgLZ0FMwai1yYM2DG/niXYOmDJFrAR7bg7gKer3e24/ViOTQ0DjNuj48s9c24j s/vXah7R1udLuH+PJTpT/eP1DLlrO74GFVLQQA6bzGmAiOpn9MaNF9gVlgaZgig7Iaxn M+oQ== X-Gm-Message-State: AGi0PubY2rpLvaxYDr8TXp3GKLiekWh3mB6p0AUmS+srmj+4Gnx3Jcv1 TKnXvDzEEIA8L7nLJDIF2Gsshg== X-Google-Smtp-Source: APiQypLTNXsQ8HlHPlHfp2Mll6IbSc871xkl7AkjDoENjAiQEoiFF4u2nMFQth6vdD3XxA2bFR9BoQ== X-Received: by 2002:a1c:9e52:: with SMTP id h79mr35953921wme.84.1589274012210; Tue, 12 May 2020 02:00:12 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:11 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= Subject: [RFC 14/17] drm/amdgpu: use dma-fence annotations for gpu reset code Date: Tue, 12 May 2020 10:59:41 +0200 Message-Id: <20200512085944.222637-15-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org To improve coverage also annotate the gpu reset code itself, since that's called from other places than drm/scheduler (which is already annotated). Annotations nests, so this doesn't break anything, and allows easier testing. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index b038ddbb2ece..5560d045b2e0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4140,6 +4140,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, bool use_baco = (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) ? true : false; + bool fence_cookie; + + fence_cookie = dma_fence_begin_signalling(); /* * Flush RAM to disk so that after reboot @@ -4168,6 +4171,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", job ? job->base.id : -1, hive->hive_id); mutex_unlock(&hive->hive_lock); + dma_fence_end_signalling(fence_cookie); return 0; } @@ -4178,8 +4182,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, */ INIT_LIST_HEAD(&device_list); if (adev->gmc.xgmi.num_physical_nodes > 1) { - if (!hive) + if (!hive) { + dma_fence_end_signalling(fence_cookie); return -ENODEV; + } if (!list_is_first(&adev->gmc.xgmi.head, &hive->device_list)) list_rotate_to_front(&adev->gmc.xgmi.head, &hive->device_list); device_list_handle = &hive->device_list; @@ -4194,6 +4200,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress", job ? job->base.id : -1); mutex_unlock(&hive->hive_lock); + dma_fence_end_signalling(fence_cookie); return 0; } @@ -4319,6 +4326,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, if (r) dev_info(adev->dev, "GPU reset end with ret = %d\n", r); + dma_fence_end_signalling(fence_cookie); return r; } From patchwork Tue May 12 08:59:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542467 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3429C14B4 for ; Tue, 12 May 2020 09:00:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 17F9A20A8B for ; Tue, 12 May 2020 09:00:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="kRhMNR3h" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729229AbgELJAf (ORCPT ); Tue, 12 May 2020 05:00:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729408AbgELJAP (ORCPT ); Tue, 12 May 2020 05:00:15 -0400 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 977AAC061A0C for ; Tue, 12 May 2020 02:00:14 -0700 (PDT) Received: by mail-wr1-x444.google.com with SMTP id v12so14306231wrp.12 for ; Tue, 12 May 2020 02:00:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VL9TtbPRMRPCTQiRlmMi8z5JITNnf6io7xX7MRKl+Ko=; b=kRhMNR3hP3pKPVHZhZ9cT25ln+EAm07SNjcy31Y4dhe3XkzIe07JM4Ya995O4xRy86 +saK6s4RC0NG7qbyo1lVxn1eZUImoXrpCgVi47HZi5titk7nGYe9UjbrS8UkF6O3hPRR ovYttYkktjgExaN6BUOrX19W0nVFlKxDNLAKE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VL9TtbPRMRPCTQiRlmMi8z5JITNnf6io7xX7MRKl+Ko=; b=oWwlPhGWRyqU/oexyBxAD7qY8DaLnkNFRrrkMtZ3hPU/2lPm139nqhujOuckpq9+K0 rIoMROxQrPVKpemZNgGSPw5L8wcXjNYgtSBLnj3C8g7nADwrBHhyxMguTT4jgjeBfqZX P/qqcr487ypvfRDfqPAZPYXayBMGFjuSgXHhn8i5hRpJYpzd6YG2f/ZhZD222EIKHYnI 0LuutFe8MOcNz8cW4G/byanlwgAA6/NNXPcHeqYXRboJujDlmmoodqhBO63ReZqthRFv QeDkE16lehrAL3WPoOawOiauA8EpzUFs7QD+RLq0TC6s3F2jACxdgpRmJGg+PqlsMthc ZSRQ== X-Gm-Message-State: AGi0PuYgZxIma7gcariEXxUSJBw7jJuDwqS0Rq2Dfb2lyETOxT8PXhRp v6NXQDrvRD7ZcTBoutKAGIE+rA== X-Google-Smtp-Source: APiQypISnc9hbO9Grrz8ir5cKtb2Z6Km73KFVxCF0gD5VBDJDeUiMEfX/2s/6rU4lnpOUPSzhy7kNw== X-Received: by 2002:a5d:4e0a:: with SMTP id p10mr23345391wrt.215.1589274013291; Tue, 12 May 2020 02:00:13 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:12 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 15/17] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Date: Tue, 12 May 2020 10:59:42 +0200 Message-Id: <20200512085944.222637-16-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This is one from the department of "maybe play lottery if you hit this, karma compensation might work". Or at least lockdep ftw! This reverts commit 565d1941557756a584ac357d945bc374d5fcd1d0. It's not quite as low-risk as the commit message claims, because this grabs console_lock, which might be held when we allocate memory, which might never happen because the dma_fence_wait() is stuck waiting on our gpu reset: [ 136.763714] ====================================================== [ 136.763714] WARNING: possible circular locking dependency detected [ 136.763715] 5.7.0-rc3+ #346 Tainted: G W [ 136.763716] ------------------------------------------------------ [ 136.763716] kworker/2:3/682 is trying to acquire lock: [ 136.763716] ffffffff8226f140 (console_lock){+.+.}-{0:0}, at: drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763723] but task is already holding lock: [ 136.763724] ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 136.763726] which lock already depends on the new lock. [ 136.763726] the existing dependency chain (in reverse order) is: [ 136.763727] -> #2 (dma_fence_map){++++}-{0:0}: [ 136.763730] __dma_fence_might_wait+0x41/0xb0 [ 136.763732] dma_resv_lockdep+0x171/0x202 [ 136.763734] do_one_initcall+0x5d/0x2f0 [ 136.763736] kernel_init_freeable+0x20d/0x26d [ 136.763738] kernel_init+0xa/0xfb [ 136.763740] ret_from_fork+0x27/0x50 [ 136.763740] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 136.763743] fs_reclaim_acquire.part.0+0x25/0x30 [ 136.763745] kmem_cache_alloc_trace+0x2e/0x6e0 [ 136.763747] device_create_groups_vargs+0x52/0xf0 [ 136.763747] device_create+0x49/0x60 [ 136.763749] fb_console_init+0x25/0x145 [ 136.763750] fbmem_init+0xcc/0xe2 [ 136.763750] do_one_initcall+0x5d/0x2f0 [ 136.763751] kernel_init_freeable+0x20d/0x26d [ 136.763752] kernel_init+0xa/0xfb [ 136.763753] ret_from_fork+0x27/0x50 [ 136.763753] -> #0 (console_lock){+.+.}-{0:0}: [ 136.763755] __lock_acquire+0x1241/0x23f0 [ 136.763756] lock_acquire+0xad/0x370 [ 136.763757] console_lock+0x47/0x70 [ 136.763761] drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763809] amdgpu_device_gpu_recover.cold+0x21e/0xe7b [amdgpu] [ 136.763850] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 136.763851] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 136.763852] process_one_work+0x23c/0x580 [ 136.763853] worker_thread+0x50/0x3b0 [ 136.763854] kthread+0x12e/0x150 [ 136.763855] ret_from_fork+0x27/0x50 [ 136.763855] other info that might help us debug this: [ 136.763856] Chain exists of: console_lock --> fs_reclaim --> dma_fence_map [ 136.763857] Possible unsafe locking scenario: [ 136.763857] CPU0 CPU1 [ 136.763857] ---- ---- [ 136.763857] lock(dma_fence_map); [ 136.763858] lock(fs_reclaim); [ 136.763858] lock(dma_fence_map); [ 136.763858] lock(console_lock); [ 136.763859] *** DEADLOCK *** [ 136.763860] 4 locks held by kworker/2:3/682: [ 136.763860] #0: ffff8887fb81c938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 136.763862] #1: ffffc90000cafe58 ((work_completion)(&(&sched->work_tdr)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 136.763863] #2: ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 136.763865] #3: ffff8887ab621748 (&adev->lock_reset){+.+.}-{3:3}, at: amdgpu_device_gpu_recover.cold+0x5ab/0xe7b [amdgpu] [ 136.763914] stack backtrace: [ 136.763915] CPU: 2 PID: 682 Comm: kworker/2:3 Tainted: G W 5.7.0-rc3+ #346 [ 136.763916] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018 [ 136.763918] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 136.763919] Call Trace: [ 136.763922] dump_stack+0x8f/0xd0 [ 136.763924] check_noncircular+0x162/0x180 [ 136.763926] __lock_acquire+0x1241/0x23f0 [ 136.763927] lock_acquire+0xad/0x370 [ 136.763932] ? drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763933] ? mark_held_locks+0x2d/0x80 [ 136.763934] ? _raw_spin_unlock_irqrestore+0x46/0x60 [ 136.763936] console_lock+0x47/0x70 [ 136.763940] ? drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763944] drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763993] amdgpu_device_gpu_recover.cold+0x21e/0xe7b [amdgpu] [ 136.764036] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 136.764038] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 136.764040] process_one_work+0x23c/0x580 [ 136.764041] worker_thread+0x50/0x3b0 [ 136.764042] ? process_one_work+0x580/0x580 [ 136.764044] kthread+0x12e/0x150 [ 136.764045] ? kthread_create_worker_on_cpu+0x70/0x70 [ 136.764046] ret_from_fork+0x27/0x50 Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5560d045b2e0..3584e29323c0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4047,8 +4047,6 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, if (r) goto out; - amdgpu_fbdev_set_suspend(tmp_adev, 0); - /* must succeed. */ amdgpu_ras_resume(tmp_adev); @@ -4217,8 +4215,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, */ amdgpu_unregister_gpu_instance(tmp_adev); - amdgpu_fbdev_set_suspend(tmp_adev, 1); - /* disable ras on ALL IPs */ if (!(in_ras_intr && !use_baco) && amdgpu_device_ip_need_full_reset(tmp_adev)) From patchwork Tue May 12 08:59:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542451 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 185E61668 for ; Tue, 12 May 2020 09:00:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ECA2B24955 for ; Tue, 12 May 2020 09:00:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="i9sr1b15" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729475AbgELJAU (ORCPT ); Tue, 12 May 2020 05:00:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729470AbgELJAQ (ORCPT ); Tue, 12 May 2020 05:00:16 -0400 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14720C061A0C for ; Tue, 12 May 2020 02:00:16 -0700 (PDT) Received: by mail-wm1-x341.google.com with SMTP id d207so5889618wmd.0 for ; Tue, 12 May 2020 02:00:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SfgLaCvxfD7XNn5XcBekzeLLwvBoLk08cBBrX2rnnjo=; b=i9sr1b15hbiGaOUTudiYdX3zH08R5W0n+iiQBJ4jzvHs74gg+UjUiBZvQJSouDtJEs U8ZHUVFajgAOIXV46QqfuBEq50M9oQlamOF5wQdCRZXIwzGeHVJqbdbx5a26kN6Nt22r FqdwM2qxFwhdtXPbYwpqsk/7a6mCzD3lqiM4A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SfgLaCvxfD7XNn5XcBekzeLLwvBoLk08cBBrX2rnnjo=; b=Veye4uxa4ZVAU574mXFkcpHSrIOCuCertmHBd8ZIh9tYVia/IUmPIKnRho0gBn2eiz ATxrX4BlWwhBAI+tAt1fPhbt4m6VWD6p+JMoWSfPWg0JiZjKnXWswx7WQTLWJSU6NnHy XqqVkuxZwnFr1bkiY3ePCgtFX9dV+NsKPzwOoxuX+yplF6oZ9H9lAgcu2hxJguQE2pGO 25ItQzlblEfjLgLCz+/gEfpXTFPv5FBddI3McVdD3hvM9kvzWFHC39ie+Lh9b64ZTPr0 w6qNdbNr9hFFz7W3vnUFnmN5TtZXjZkkvBDnRJZgkMvsTuFGecwb0lFnaNsGSGntT0nY OAOw== X-Gm-Message-State: AGi0Pubc72lrCQas3SNKE6xZwGHuom4XKAy7FBbbmxfO6PY+fKPKS/sE vcQWPcfmAD5Jm0xbOkAOGwTv+g== X-Google-Smtp-Source: APiQypK/giXJoKHEmIIgvZJ7mM3tDZU5ICCASqmWo22Iyt5Tb6Wt3spXNWAx+Pb9At2o6/v8/Np1tg== X-Received: by 2002:a05:600c:40d:: with SMTP id q13mr26312967wmb.69.1589274014662; Tue, 12 May 2020 02:00:14 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:14 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 16/17] drm/amdgpu: gpu recovery does full modesets Date: Tue, 12 May 2020 10:59:43 +0200 Message-Id: <20200512085944.222637-17-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org ... I think it's time to stop this little exercise. The lockdep splat, for the record: [ 132.583381] ====================================================== [ 132.584091] WARNING: possible circular locking dependency detected [ 132.584775] 5.7.0-rc3+ #346 Tainted: G W [ 132.585461] ------------------------------------------------------ [ 132.586184] kworker/2:3/865 is trying to acquire lock: [ 132.586857] ffffc90000677c70 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.587569] but task is already holding lock: [ 132.589044] ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 132.589803] which lock already depends on the new lock. [ 132.592009] the existing dependency chain (in reverse order) is: [ 132.593507] -> #2 (dma_fence_map){++++}-{0:0}: [ 132.595019] dma_fence_begin_signalling+0x50/0x60 [ 132.595767] drm_atomic_helper_commit+0xa1/0x180 [drm_kms_helper] [ 132.596567] drm_client_modeset_commit_atomic+0x1ea/0x250 [drm] [ 132.597420] drm_client_modeset_commit_locked+0x55/0x190 [drm] [ 132.598178] drm_client_modeset_commit+0x24/0x40 [drm] [ 132.598948] drm_fb_helper_restore_fbdev_mode_unlocked+0x4b/0xa0 [drm_kms_helper] [ 132.599738] drm_fb_helper_set_par+0x30/0x40 [drm_kms_helper] [ 132.600539] fbcon_init+0x2e8/0x660 [ 132.601344] visual_init+0xce/0x130 [ 132.602156] do_bind_con_driver+0x1bc/0x2b0 [ 132.602970] do_take_over_console+0x115/0x180 [ 132.603763] do_fbcon_takeover+0x58/0xb0 [ 132.604564] register_framebuffer+0x1ee/0x300 [ 132.605369] __drm_fb_helper_initial_config_and_unlock+0x36e/0x520 [drm_kms_helper] [ 132.606187] amdgpu_fbdev_init+0xb3/0xf0 [amdgpu] [ 132.607032] amdgpu_device_init.cold+0xe90/0x1677 [amdgpu] [ 132.607862] amdgpu_driver_load_kms+0x5a/0x200 [amdgpu] [ 132.608697] amdgpu_pci_probe+0xf7/0x180 [amdgpu] [ 132.609511] local_pci_probe+0x42/0x80 [ 132.610324] pci_device_probe+0x104/0x1a0 [ 132.611130] really_probe+0x147/0x3c0 [ 132.611939] driver_probe_device+0xb6/0x100 [ 132.612766] device_driver_attach+0x53/0x60 [ 132.613593] __driver_attach+0x8c/0x150 [ 132.614419] bus_for_each_dev+0x7b/0xc0 [ 132.615249] bus_add_driver+0x14c/0x1f0 [ 132.616071] driver_register+0x6c/0xc0 [ 132.616902] do_one_initcall+0x5d/0x2f0 [ 132.617731] do_init_module+0x5c/0x230 [ 132.618560] load_module+0x2981/0x2bc0 [ 132.619391] __do_sys_finit_module+0xaa/0x110 [ 132.620228] do_syscall_64+0x5a/0x250 [ 132.621064] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 132.621903] -> #1 (crtc_ww_class_mutex){+.+.}-{3:3}: [ 132.623587] __ww_mutex_lock.constprop.0+0xcc/0x10c0 [ 132.624448] ww_mutex_lock+0x43/0xb0 [ 132.625315] drm_modeset_lock+0x44/0x120 [drm] [ 132.626184] drmm_mode_config_init+0x2db/0x8b0 [drm] [ 132.627098] amdgpu_device_init.cold+0xbd1/0x1677 [amdgpu] [ 132.628007] amdgpu_driver_load_kms+0x5a/0x200 [amdgpu] [ 132.628920] amdgpu_pci_probe+0xf7/0x180 [amdgpu] [ 132.629804] local_pci_probe+0x42/0x80 [ 132.630690] pci_device_probe+0x104/0x1a0 [ 132.631583] really_probe+0x147/0x3c0 [ 132.632479] driver_probe_device+0xb6/0x100 [ 132.633379] device_driver_attach+0x53/0x60 [ 132.634275] __driver_attach+0x8c/0x150 [ 132.635170] bus_for_each_dev+0x7b/0xc0 [ 132.636069] bus_add_driver+0x14c/0x1f0 [ 132.636974] driver_register+0x6c/0xc0 [ 132.637870] do_one_initcall+0x5d/0x2f0 [ 132.638765] do_init_module+0x5c/0x230 [ 132.639654] load_module+0x2981/0x2bc0 [ 132.640522] __do_sys_finit_module+0xaa/0x110 [ 132.641372] do_syscall_64+0x5a/0x250 [ 132.642203] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 132.643022] -> #0 (crtc_ww_class_acquire){+.+.}-{0:0}: [ 132.644643] __lock_acquire+0x1241/0x23f0 [ 132.645469] lock_acquire+0xad/0x370 [ 132.646274] drm_modeset_acquire_init+0xd2/0x100 [drm] [ 132.647071] drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.647902] dm_suspend+0x1c/0x60 [amdgpu] [ 132.648698] amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu] [ 132.649498] amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu] [ 132.650300] amdgpu_device_gpu_recover.cold+0x4e6/0xe64 [amdgpu] [ 132.651084] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 132.651825] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 132.652594] process_one_work+0x23c/0x580 [ 132.653402] worker_thread+0x50/0x3b0 [ 132.654139] kthread+0x12e/0x150 [ 132.654868] ret_from_fork+0x27/0x50 [ 132.655598] other info that might help us debug this: [ 132.657739] Chain exists of: crtc_ww_class_acquire --> crtc_ww_class_mutex --> dma_fence_map [ 132.659877] Possible unsafe locking scenario: [ 132.661416] CPU0 CPU1 [ 132.662126] ---- ---- [ 132.662847] lock(dma_fence_map); [ 132.663574] lock(crtc_ww_class_mutex); [ 132.664319] lock(dma_fence_map); [ 132.665063] lock(crtc_ww_class_acquire); [ 132.665799] *** DEADLOCK *** [ 132.667965] 4 locks held by kworker/2:3/865: [ 132.668701] #0: ffff8887fb81c938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 132.669462] #1: ffffc90000677e58 ((work_completion)(&(&sched->work_tdr)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 132.670242] #2: ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 132.671039] #3: ffff8887b84a1748 (&adev->lock_reset){+.+.}-{3:3}, at: amdgpu_device_gpu_recover.cold+0x59e/0xe64 [amdgpu] [ 132.671902] stack backtrace: [ 132.673515] CPU: 2 PID: 865 Comm: kworker/2:3 Tainted: G W 5.7.0-rc3+ #346 [ 132.674347] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018 [ 132.675194] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 132.676046] Call Trace: [ 132.676897] dump_stack+0x8f/0xd0 [ 132.677748] check_noncircular+0x162/0x180 [ 132.678604] ? stack_trace_save+0x4b/0x70 [ 132.679459] __lock_acquire+0x1241/0x23f0 [ 132.680311] lock_acquire+0xad/0x370 [ 132.681163] ? drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.682021] ? cpumask_next+0x16/0x20 [ 132.682880] ? module_assert_mutex_or_preempt+0x14/0x40 [ 132.683737] ? __module_address+0x28/0xf0 [ 132.684601] drm_modeset_acquire_init+0xd2/0x100 [drm] [ 132.685466] ? drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.686335] drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.687255] dm_suspend+0x1c/0x60 [amdgpu] [ 132.688152] amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu] [ 132.689057] ? amdgpu_fence_process+0x4c/0x150 [amdgpu] [ 132.689963] amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu] [ 132.690893] amdgpu_device_gpu_recover.cold+0x4e6/0xe64 [amdgpu] [ 132.691818] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 132.692707] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 132.693597] process_one_work+0x23c/0x580 [ 132.694487] worker_thread+0x50/0x3b0 [ 132.695373] ? process_one_work+0x580/0x580 [ 132.696264] kthread+0x12e/0x150 [ 132.697154] ? kthread_create_worker_on_cpu+0x70/0x70 [ 132.698057] ret_from_fork+0x27/0x50 Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3584e29323c0..b3b84a0d3baf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2415,6 +2415,14 @@ static int amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev) /* displays are handled separately */ if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE) { /* XXX handle errors */ + + /* + * This is dm_suspend, which calls modeset locks, and + * that a pretty good inversion against dma_fence_signal + * which gpu recovery is supposed to guarantee. + * + * Dont ask me how to fix this. + */ r = adev->ip_blocks[i].version->funcs->suspend(adev); /* XXX handle errors */ if (r) { From patchwork Tue May 12 08:59:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11542437 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8527714B4 for ; Tue, 12 May 2020 09:00:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6DAB42137B for ; Tue, 12 May 2020 09:00:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="gVfBqrMK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729135AbgELJAU (ORCPT ); Tue, 12 May 2020 05:00:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1729474AbgELJAR (ORCPT ); Tue, 12 May 2020 05:00:17 -0400 Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1ED07C05BD0A for ; Tue, 12 May 2020 02:00:17 -0700 (PDT) Received: by mail-wr1-x443.google.com with SMTP id e16so14354899wra.7 for ; Tue, 12 May 2020 02:00:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tFPcYYVPEczCJNqu565ZZzRL31Ro7iqoJLud+V9/W6s=; b=gVfBqrMKYAzurJ/RstzwJY91qBLOOMNjuPIDPoBrD1wsTDF47Xk6LAJDfP1paQCHue hb7A0jpibe9rCFhP8vACHxcmJAL+/3WLLIF1ayojzTlGhgxI0AD/lpcfFzQsuaWtCNTF xdX3NuhPackoyWfVuA8lxXykjw+ve1/2YT27g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tFPcYYVPEczCJNqu565ZZzRL31Ro7iqoJLud+V9/W6s=; b=H8F+iDg8l87BEkkDeZKa/TfQwFX8sW1Cy6id1wSk89+KMNXEtTu2xG5GPs355n+gO+ 7Di96Twp/Q+fbsjOcTd7CwT+5MT7VJiGg78jVvmT09wxodS9UCFZqXaMxIfcdcJedcvj sLxnhIso4g4fQFKVJS8i4BSIu0vweJc4YE5/+vbdiC7MFq3rDG/c4Qdo8m/nEML8Gm5M 0wKKfR+y0AcFgsLNeQ7muXEDcxMiujGUvvV84weEEP8FI5oKeufwQcKKWUMPw3ApRivJ JUK64jylKQkAerx0OgwdwjmWt9Z8IEHGg4sDfx4oulY+wsJUMy1WWj0b2qtda+wrN/7X HsvQ== X-Gm-Message-State: AGi0PuZYFDSlKNV+wrLKAqf1jJznreFAbPhvWo4WvgZ3707nynXS6O/C ziQ+8eod+c3ODcWd2djE4ksYPg== X-Google-Smtp-Source: APiQypIbabYyHNtuDrEHDjujDbV60fZeSSVWGyjvdD4rzzfjplG/fVCJt6XewcV7pRKQdnM6iDktMg== X-Received: by 2002:adf:d4c6:: with SMTP id w6mr24833712wrk.92.1589274015818; Tue, 12 May 2020 02:00:15 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id y10sm18845457wrd.95.2020.05.12.02.00.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 02:00:15 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: LKML , Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_?= =?utf-8?q?K=C3=B6nig?= , Daniel Vetter Subject: [RFC 17/17] drm/i915: Annotate dma_fence_work Date: Tue, 12 May 2020 10:59:44 +0200 Message-Id: <20200512085944.222637-18-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200512085944.222637-1-daniel.vetter@ffwll.ch> References: <20200512085944.222637-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org i915 does tons of allocations from this worker, which lockdep catches. Also generic infrastructure like this with big potential for how dma_fence or other cross driver contracts work, really should be reviewed on dri-devel. Implementing custom wheels for everything within the driver is a classic case of "platform problem" [1]. Which in upstream we really shouldn't have. Since there's no quick way to solve these splats (dma_fence_work is used a bunch in basic buffer management and command submission) like for amdgpu, I'm giving up at this point here. Annotating i915 scheduler and gpu reset could would be interesting, but since lockdep is one-shot we can't see what surprises would lurk there. 1: https://lwn.net/Articles/443531/ Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_sw_fence_work.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c index a3a81bb8f2c3..5b74acadaef5 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c @@ -17,12 +17,15 @@ static void fence_work(struct work_struct *work) { struct dma_fence_work *f = container_of(work, typeof(*f), work); int err; + bool fence_cookie; + fence_cookie = dma_fence_begin_signalling(); err = f->ops->work(f); if (err) dma_fence_set_error(&f->dma, err); fence_complete(f); + dma_fence_end_signalling(fence_cookie); dma_fence_put(&f->dma); }