From patchwork Wed Sep 18 13:39:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13806903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3A3BDCCD1A0 for ; Wed, 18 Sep 2024 13:40:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AEF3E10E5A3; Wed, 18 Sep 2024 13:40:01 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="RkQSIZlO"; dkim-atps=neutral Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by gabe.freedesktop.org (Postfix) with ESMTPS id 730EE10E5A3 for ; Wed, 18 Sep 2024 13:40:00 +0000 (UTC) Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a8d2b4a5bf1so943394266b.2 for ; Wed, 18 Sep 2024 06:40:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726666799; x=1727271599; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=ISbgXs47Os4/xsVjaiwdh7WAhROR4D/ImIhSH1DGQnI=; b=RkQSIZlOuVvzakjff2klyT/45fgarGpGJjyynKnUvLIHKYV7wyZESd9DHA5SFi41XI wDNbGD3+rg37QUQmZwqvo70UFniLUaYJnZn5vlIBLSCyp+aWzeC2jcrHd4LQ1Cvf0XoO G+Lkp96YND7eJbAE2tMDG7rj2533ofuvQEEdE222Tb2BqBzbLjT2lhCNC27M49DPaCXr lhSmyj1RX7cbrD+SGIWNpKMkJriMchpuqhyyxuW+gu6w8Z4izKaEwcgchQJML2QPdQT/ mkdZaJ1CLKUDc4tnp6eHs0ZSFoDG/0h5AeA2s+tsU8kgSxuQTRAjKE/MqtCoNAY4r91b w/TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726666799; x=1727271599; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ISbgXs47Os4/xsVjaiwdh7WAhROR4D/ImIhSH1DGQnI=; b=MULq7vG+ketNwtA4AlEE47jFQ+GO7FDLUo4S1UlCZuZT0uC11VcRFR0XcTfkQ2OgCo NAJWPXrZiLrgFqNGOzPELs6rJcSdq4IiiIfLqDHpe6ENXjyB4tX5wnL3lG3jw4+mIspu YU9yRS9Oh+OTHEcP41Op7ZbUb3fq3BYL6ZlLdbDEJTcbY9OcLWMz+f/Rda0cC8EkwR1Y oN6YeVYWG5DEZdDdnTuU39/Vn1UsWoVdyo0GTafcbqymWvQKmqJ0JzwyYc8vd+C3ArjP tJ4KelhGaboYOZC3UdJqTyx5M6R7jzzXy0eiwk7ltBBb38k2NlAPqAv0blcDXPeTVhmY PyOw== X-Forwarded-Encrypted: i=1; AJvYcCU34yutaGWZs04mlgUE1Hf9IDl4wlvspbMdawG4/Qyq5pdUWJ1LGxuqlIgyXc1W3VR/jzWoTXTVpI4=@lists.freedesktop.org X-Gm-Message-State: AOJu0Ywm9zFxR0SegQNRu/PbYZG7azLn+Ihs41KXGHJDZeuvGvsayv0K oCc9bb0xCrj4IpzRKKgY2oSvj+dvKntV9LrUl8oMASHwgGSrHz8n X-Google-Smtp-Source: AGHT+IE8foPRGQgx2xFdYO+/wh6fHjmSBWwyEqLqJH3gMBsX60K4WhEEYR6GpFXxx01WwaUt+93ilw== X-Received: by 2002:a17:907:3e94:b0:a83:94bd:d913 with SMTP id a640c23a62f3a-a90294a9ccamr2495388366b.10.1726666798301; Wed, 18 Sep 2024 06:39:58 -0700 (PDT) Received: from able.fritz.box ([2a00:e180:155d:1400:17ae:e091:3b52:93d]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9061096a08sm592666266b.2.2024.09.18.06.39.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Sep 2024 06:39:57 -0700 (PDT) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: dakr@kernel.org, pstanner@redhat.com, dri-devel@lists.freedesktop.org, ltuikov89@gmail.com Subject: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini Date: Wed, 18 Sep 2024 15:39:55 +0200 Message-Id: <20240918133956.26557-1-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Tearing down the scheduler with jobs still on the pending list can lead to use after free issues. Add a warning if drivers try to destroy a scheduler which still has work pushed to the HW. When there are still entities with jobs the situation is even worse since the dma_fences for those jobs can never signal we can just choose between potentially locking up core memory management and random memory corruption. When drivers really mess it up that well let them run into a BUG_ON(). Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_main.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index f093616fe53c..8a46fab5cdc8 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) drm_sched_wqueue_stop(sched); + /* + * Tearing down the scheduler wile there are still unprocessed jobs can + * lead to use after free issues in the scheduler fence. + */ + WARN_ON(!list_empty(&sched->pending_list)); + for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) { struct drm_sched_rq *rq = sched->sched_rq[i]; spin_lock(&rq->lock); - list_for_each_entry(s_entity, &rq->entities, list) + list_for_each_entry(s_entity, &rq->entities, list) { + /* + * The justification for this BUG_ON() is that tearing + * down the scheduler while jobs are pending leaves + * dma_fences unsignaled. Since we have dependencies + * from the core memory management to eventually signal + * dma_fences this can trivially lead to a system wide + * stop because of a locked up memory management. + */ + BUG_ON(spsc_queue_count(&s_entity->job_queue)); + /* * Prevents reinsertion and marks job_queue as idle, * it will removed from rq in drm_sched_entity_fini * eventually */ s_entity->stopped = true; + } spin_unlock(&rq->lock); kfree(sched->sched_rq[i]); } From patchwork Wed Sep 18 13:39:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13806904 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E9EBACCD1A5 for ; Wed, 18 Sep 2024 13:40:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6851810E5A6; Wed, 18 Sep 2024 13:40:03 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="BZI0r/9q"; dkim-atps=neutral Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5228810E5A3 for ; Wed, 18 Sep 2024 13:40:01 +0000 (UTC) Received: by mail-lf1-f48.google.com with SMTP id 2adb3069b0e04-5365b71a6bdso6213470e87.2 for ; Wed, 18 Sep 2024 06:40:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726666799; x=1727271599; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=F67HKGJS01CpRs61gIMCLFbQnFdt6OhKu+DWzbIPnBU=; b=BZI0r/9qTmpMs/+EbNy5R5/VsSuV+t1KveTu5+m7DER8KYljJWiIwbM4pMdfX9HEkA X5nAw0FsCrRX8wSWue9VNVtJneu/u8O8AZ7jtS449ZUDp/3BIYqkktoMg2m1GWPDsQQA AsO1nc9ZLivmj4rz+ETLNGvlRsNAMiB4xWNNYLgWs5yzEPhlxFRF3nReUmY4uxYz93wH GBs46UenG5XjZ1Ic0mifUuF8WyOaL0SdJ9TUSmH1ZFsnG8ddDtQYDh81ooMhztqmREN3 LBW1yz39oB71h6Qu0IVVi1MvLpn2Yn7tYWNv/Aw9hmCBN5XWqY/EFbt6F177r3hMEgCQ MSDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726666799; x=1727271599; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F67HKGJS01CpRs61gIMCLFbQnFdt6OhKu+DWzbIPnBU=; b=qoNrpRwrcrctHUGcxNO6swpSg2uOMmlZUseU1WRE0lCG6CrN2qO+L46arKypspW0ak vg5/wxTZpRkZ4XwgGOP6HGzmnYMUSDCYyp/RBbW6qpydnBD1JAZ2rd43cSGL7++Kj/SL 8UgTKvIjBCJ1YMSPifrmejwtVn9sYN1XR0payhCNW/TXt24wlQpNmkijTS7B+XEhFtG+ xkSuyrNQVlnx/X6T0gmWFGwz8b0zhT6qfGgaN6Ax27JzFYHQqyOytjFmjffbMO0bivlH CEUxT0ZAtEwWjfRzrNhuqcc2zDxp/1T6wy8/MQ+r4I7CE+s078kqGnn43XiEKex3157q UDRw== X-Forwarded-Encrypted: i=1; AJvYcCU8w5sTV4Xyza2HLu/4RZiHX7B2OrSaFI7CcERpNDuuGpqZ6HwR30bt2YuFNb+1DQ4fJNWUmh3t6Kc=@lists.freedesktop.org X-Gm-Message-State: AOJu0YwYpU/d+J0RXH/J/oMcfNvr62kpK7Klv8rcBt9jGagvpOKBZkLq 4nsDGxl4r0MaRUiZqSXDs7pUuS+yJsrocLMwJ9UwTevXlP0w1Huv X-Google-Smtp-Source: AGHT+IEEPbCtGTvJHBL8f3ryjVz+3b/nTasIglynGeZ2wysez6st6hkNEiQljjIn5VEKS18L+h7p+Q== X-Received: by 2002:a05:6512:158e:b0:52c:df8c:72cc with SMTP id 2adb3069b0e04-5367ff24b28mr9709252e87.43.1726666798978; Wed, 18 Sep 2024 06:39:58 -0700 (PDT) Received: from able.fritz.box ([2a00:e180:155d:1400:17ae:e091:3b52:93d]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9061096a08sm592666266b.2.2024.09.18.06.39.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Sep 2024 06:39:58 -0700 (PDT) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: dakr@kernel.org, pstanner@redhat.com, dri-devel@lists.freedesktop.org, ltuikov89@gmail.com Subject: [PATCH 2/2] drm/sched: clarify the documentation on drm_sched_entity_error Date: Wed, 18 Sep 2024 15:39:56 +0200 Message-Id: <20240918133956.26557-2-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240918133956.26557-1-christian.koenig@amd.com> References: <20240918133956.26557-1-christian.koenig@amd.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Sima requested that in a discussion, just copy&paste my explanation from the mail. Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 58c8161289fe..571e2f2365a1 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -166,8 +166,21 @@ bool drm_sched_entity_is_ready(struct drm_sched_entity *entity) * drm_sched_entity_error - return error of last scheduled job * @entity: scheduler entity to check * - * Opportunistically return the error of the last scheduled job. Result can - * change any time when new jobs are pushed to the hw. + * Drivers should use this function in two ways: + * + * 1. In it's prepare callback so that when one submission fails all following + * from the same ctx are marked with an error number as well. + * + * This is intentionally done in a driver callback so that driver decides if + * they want subsequent submissions to fail or not. That can be helpful for + * example for in kernel paging queues where submissions don't depend on each + * other and a failed submission shouldn't cancel all following. + * + * 2. In it's submission IOCTL to reject new submissions and inform userspace + * that it needs to kick of some error handling. + * + * Returns the error of the last scheduled job. Result can change any time when + * new jobs are pushed to the hw. */ int drm_sched_entity_error(struct drm_sched_entity *entity) {