From patchwork Thu Jun 4 08:12:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 11587429 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 01ABB90 for ; Thu, 4 Jun 2020 08:13:31 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D43EA2075B for ; Thu, 4 Jun 2020 08:13:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="g2y6+38x" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D43EA2075B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ffwll.ch Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C66C9899E9; Thu, 4 Jun 2020 08:12:55 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3AFED89933 for ; Thu, 4 Jun 2020 08:12:54 +0000 (UTC) Received: by mail-wr1-x443.google.com with SMTP id e1so5053556wrt.5 for ; Thu, 04 Jun 2020 01:12:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PAzY52AYv0RSDf9knhitt5KJNjp0ADK9OcL/la6MhG4=; b=g2y6+38xV996zbWQlHRfYfwmdoYrQtMSB9AuhumQ8EuoFThzOqHcB0+rQlar3nhlyH mzJYMNcNs5v0zXm1taJdTlBWcRFGmIG8H58vFVYh2UJyXnrrsWGD/zJxugQTr50yWENI OrKBQ9/cJ2DiSb8hsPkl8FtzjA9HerNQ7h37o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PAzY52AYv0RSDf9knhitt5KJNjp0ADK9OcL/la6MhG4=; b=Y8mbNhjrBMN+7Ye5nmf7+qMNM/6XTTzuZUmDBziJJVqNzd4zxyauatwtkuMYgWNQgO xCTKJvhmZgKcKsCoydQF2OtGQnDAS/lloaTzix1TykhnNf68+tIbA3pJWNxGcJsg5pC9 zOtzRTkVR7eZZmYQzA6/28+Bf66kIz0EeS8TJFIlXIb91reIzIfrB0h+feyRekqj3PkA vzj33B87PYR7q2ZacuQkPEMCrOqBMhe6rJ4bUhgd+XdWTZ8C2Bp+EVYkC+EcVH2rMhw8 RdGThcAocukPq/hcfE44aJXsJsZfaRLxlIkgUa24kvWh0z2Pq7eJOJNyDLFWJ1gNiTuO 1smA== X-Gm-Message-State: AOAM531TgkPTrW+Xn7u9/3DIEJ/b4bt5Ej73IOr4ejViKj1FdWcYvp91 ZbTjdEU1HVWGVj+hkKpmQ6GNsLt2k1g= X-Google-Smtp-Source: ABdhPJws7BZsTbZqn46ygWPxsi69ZQ9EdQZlg8FdQb2SX5+xMmbZiTCisZQm8VfjwqondDqR2/ssFg== X-Received: by 2002:a5d:4d89:: with SMTP id b9mr3497599wru.210.1591258372493; Thu, 04 Jun 2020 01:12:52 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id f11sm6873305wrj.2.2020.06.04.01.12.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2020 01:12:51 -0700 (PDT) From: Daniel Vetter To: DRI Development Subject: [PATCH 16/18] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Date: Thu, 4 Jun 2020 10:12:22 +0200 Message-Id: <20200604081224.863494-17-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200604081224.863494-1-daniel.vetter@ffwll.ch> References: <20200604081224.863494-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-rdma@vger.kernel.org, Daniel Vetter , Intel Graphics Development , LKML , amd-gfx@lists.freedesktop.org, Chris Wilson , linaro-mm-sig@lists.linaro.org, Daniel Vetter , =?utf-8?q?Christian_K=C3=B6nig?= , linux-media@vger.kernel.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" This is one from the department of "maybe play lottery if you hit this, karma compensation might work". Or at least lockdep ftw! This reverts commit 565d1941557756a584ac357d945bc374d5fcd1d0. It's not quite as low-risk as the commit message claims, because this grabs console_lock, which might be held when we allocate memory, which might never happen because the dma_fence_wait() is stuck waiting on our gpu reset: [ 136.763714] ====================================================== [ 136.763714] WARNING: possible circular locking dependency detected [ 136.763715] 5.7.0-rc3+ #346 Tainted: G W [ 136.763716] ------------------------------------------------------ [ 136.763716] kworker/2:3/682 is trying to acquire lock: [ 136.763716] ffffffff8226f140 (console_lock){+.+.}-{0:0}, at: drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763723] but task is already holding lock: [ 136.763724] ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 136.763726] which lock already depends on the new lock. [ 136.763726] the existing dependency chain (in reverse order) is: [ 136.763727] -> #2 (dma_fence_map){++++}-{0:0}: [ 136.763730] __dma_fence_might_wait+0x41/0xb0 [ 136.763732] dma_resv_lockdep+0x171/0x202 [ 136.763734] do_one_initcall+0x5d/0x2f0 [ 136.763736] kernel_init_freeable+0x20d/0x26d [ 136.763738] kernel_init+0xa/0xfb [ 136.763740] ret_from_fork+0x27/0x50 [ 136.763740] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 136.763743] fs_reclaim_acquire.part.0+0x25/0x30 [ 136.763745] kmem_cache_alloc_trace+0x2e/0x6e0 [ 136.763747] device_create_groups_vargs+0x52/0xf0 [ 136.763747] device_create+0x49/0x60 [ 136.763749] fb_console_init+0x25/0x145 [ 136.763750] fbmem_init+0xcc/0xe2 [ 136.763750] do_one_initcall+0x5d/0x2f0 [ 136.763751] kernel_init_freeable+0x20d/0x26d [ 136.763752] kernel_init+0xa/0xfb [ 136.763753] ret_from_fork+0x27/0x50 [ 136.763753] -> #0 (console_lock){+.+.}-{0:0}: [ 136.763755] __lock_acquire+0x1241/0x23f0 [ 136.763756] lock_acquire+0xad/0x370 [ 136.763757] console_lock+0x47/0x70 [ 136.763761] drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763809] amdgpu_device_gpu_recover.cold+0x21e/0xe7b [amdgpu] [ 136.763850] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 136.763851] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 136.763852] process_one_work+0x23c/0x580 [ 136.763853] worker_thread+0x50/0x3b0 [ 136.763854] kthread+0x12e/0x150 [ 136.763855] ret_from_fork+0x27/0x50 [ 136.763855] other info that might help us debug this: [ 136.763856] Chain exists of: console_lock --> fs_reclaim --> dma_fence_map [ 136.763857] Possible unsafe locking scenario: [ 136.763857] CPU0 CPU1 [ 136.763857] ---- ---- [ 136.763857] lock(dma_fence_map); [ 136.763858] lock(fs_reclaim); [ 136.763858] lock(dma_fence_map); [ 136.763858] lock(console_lock); [ 136.763859] *** DEADLOCK *** [ 136.763860] 4 locks held by kworker/2:3/682: [ 136.763860] #0: ffff8887fb81c938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 136.763862] #1: ffffc90000cafe58 ((work_completion)(&(&sched->work_tdr)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 136.763863] #2: ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 136.763865] #3: ffff8887ab621748 (&adev->lock_reset){+.+.}-{3:3}, at: amdgpu_device_gpu_recover.cold+0x5ab/0xe7b [amdgpu] [ 136.763914] stack backtrace: [ 136.763915] CPU: 2 PID: 682 Comm: kworker/2:3 Tainted: G W 5.7.0-rc3+ #346 [ 136.763916] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018 [ 136.763918] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 136.763919] Call Trace: [ 136.763922] dump_stack+0x8f/0xd0 [ 136.763924] check_noncircular+0x162/0x180 [ 136.763926] __lock_acquire+0x1241/0x23f0 [ 136.763927] lock_acquire+0xad/0x370 [ 136.763932] ? drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763933] ? mark_held_locks+0x2d/0x80 [ 136.763934] ? _raw_spin_unlock_irqrestore+0x46/0x60 [ 136.763936] console_lock+0x47/0x70 [ 136.763940] ? drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763944] drm_fb_helper_set_suspend_unlocked+0x7b/0xa0 [drm_kms_helper] [ 136.763993] amdgpu_device_gpu_recover.cold+0x21e/0xe7b [amdgpu] [ 136.764036] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 136.764038] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 136.764040] process_one_work+0x23c/0x580 [ 136.764041] worker_thread+0x50/0x3b0 [ 136.764042] ? process_one_work+0x580/0x580 [ 136.764044] kthread+0x12e/0x150 [ 136.764045] ? kthread_create_worker_on_cpu+0x70/0x70 [ 136.764046] ret_from_fork+0x27/0x50 Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index ac0286a5f2fc..4c4492de670c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4063,8 +4063,6 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive, if (r) goto out; - amdgpu_fbdev_set_suspend(tmp_adev, 0); - /* must succeed. */ amdgpu_ras_resume(tmp_adev); @@ -4305,8 +4303,6 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, */ amdgpu_unregister_gpu_instance(tmp_adev); - amdgpu_fbdev_set_suspend(tmp_adev, 1); - /* disable ras on ALL IPs */ if (!(in_ras_intr && !use_baco) && amdgpu_device_ip_need_full_reset(tmp_adev))