From patchwork Fri Sep 27 08:41:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13814019 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A1192CEBF84 for ; Fri, 27 Sep 2024 08:42:01 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A8E4710EC7A; Fri, 27 Sep 2024 08:42:00 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="izGtNxyr"; dkim-atps=neutral Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7722C10EC7A for ; Fri, 27 Sep 2024 08:41:59 +0000 (UTC) Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-42cb6f3a5bcso22440125e9.2 for ; Fri, 27 Sep 2024 01:41:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727426518; x=1728031318; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=yfL963GX4hn+9etH+Dw3PZf4XacBtknkoH+Zc75J+9U=; b=izGtNxyrP1F5lHrc7g9hr/+mpJ6X9HxebE8Mkbur6rS2mFJ3FusbahWDoCLdP1S9fj 1+Sad6iDHhqsEWvTl725mwFxiVi5qKst7ST5wjysfr467uX3tfvGAdYo67g7tcIYfFJH tKNhmh8sFcdYT4igbeEQ/MihUcRd/snUOi9k4O5Sj+tlVXMBLKCLSnOeTYz2HshdglUa Mv9Uj965i9imxbUAEAdXGY4KTG1KiRILgihfq1Gm4+sUrXRK+nGIkD8rxg1guLGpCaGn Dh22G2eWwwImHPOj5ijygW+zsyZIlQ1An9zJ/WD4aEJilOyZJAXAfQ/G1oAnUKbk1bqm ETMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727426518; x=1728031318; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yfL963GX4hn+9etH+Dw3PZf4XacBtknkoH+Zc75J+9U=; b=svDv1t8n/foplgtyU5b+SS1AkeaeBaKbyuX5R7FM93BxOkJzO+E8WT8Jf+OJGMKvE0 b1KPCuKKPxgiIj8n3dtBZ+V1myJ3d1DDKco3TQiclhjzgOirIkzzfVGweN5Cs5AX4+jb vkKsqSarqIQFE57a/tb8+YTOOHnxuHh4iITFKYKgXZNc6MiR/l5oKopgrHkz3enLdv2m z0wijcBw3NdQV2FqKx5HVrB2Io1kg4SRrqMOSCxSfR4PH3EFWYrOPcuAfhixheF0pDYj nGICtntIOUmINOORotjf7VpQlki/bSsUvavzQDDnfU3LiNelhESDnF4lCZ2gyCGL2I/n 7aQA== X-Forwarded-Encrypted: i=1; AJvYcCXr7PZmStJUAmWWFE5g1+q6kljch+xXCQHfRJchAWnSBRenagAkOhZJCmQLXSJ6tAAQYpumhb91f3Q=@lists.freedesktop.org X-Gm-Message-State: AOJu0YyXqkjz7hNIPhFeMvLltdGINafIZm6PxMmoJqclDy00kuZjkPgb +swpuAfTwRbpf6rLp5v+r+Y50rnO05JLlk0ucOckLrvRkMTYltlQ X-Google-Smtp-Source: AGHT+IE9J4Yz7rR1vqk4wpKDwrfqBMd5vYI8n8qK6UMnVF+1gzG+Bh6ctExNxaZTBHkPz8EsPHSssA== X-Received: by 2002:a05:600c:5117:b0:42c:a8cb:6a5a with SMTP id 5b1f17b1804b1-42f5844b9dcmr21342025e9.15.1727426517326; Fri, 27 Sep 2024 01:41:57 -0700 (PDT) Received: from able.fritz.box ([2a00:e180:1515:2900:6181:3f3e:a0cc:be1a]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42f57de2e46sm19363415e9.19.2024.09.27.01.41.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Sep 2024 01:41:56 -0700 (PDT) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: pstanner@redhat.com, dakr@kernel.org, ltuikov89@gmail.com, simona.vetter@ffwll.ch, dri-devel@lists.freedesktop.org Subject: [PATCH 1/2] drm/sched: document drm_sched_fini requirements Date: Fri, 27 Sep 2024 10:41:52 +0200 Message-Id: <20240927084153.18583-1-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Document the necessary steps which needs to be done before calling drm_sched_fini(). Tearing down the scheduler with jobs still on the pending list can lead to use after free issues. Add a warning if drivers try to destroy a scheduler which still has work pushed to the HW. When there are still entities with jobs the situation is even worse since the dma_fences for those jobs can never signal we can just choose between potentially locking up core memory management and random memory corruption. When drivers really mess it up that well let them run into a BUG() to prevent further damage. Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_main.c | 33 ++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index b5e9f657d829..2082cacada69 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1324,7 +1324,16 @@ EXPORT_SYMBOL(drm_sched_init); * * @sched: scheduler instance * - * Tears down and cleans up the scheduler. + * Before calling this function all entities which potentially used this + * scheduler instance should be forced idle using drm_sched_entity_flush() and + * detached from their scheduler using drm_entity_fini(). + * + * Special care must be taken if drivers allocate scheduler instances + * dynamically. Since the dma_fence signaling doesn't guarantee any processing + * order of callbacks it is possible that the scheduler is still cleaning up + * when fences have already signaled. The easiest way to avoid that is to keep a + * reference from the job to the scheduler and tear down the scheduler from a + * work item after the last job was cleaned up. */ void drm_sched_fini(struct drm_gpu_scheduler *sched) { @@ -1333,17 +1342,37 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) drm_sched_wqueue_stop(sched); + /* + * Tearing down the scheduler wile there are still unprocessed jobs can + * lead to use after free issues in the scheduler fence. + */ + WARN_ON(!list_empty(&sched->pending_list)); + for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) { struct drm_sched_rq *rq = sched->sched_rq[i]; spin_lock(&rq->lock); - list_for_each_entry(s_entity, &rq->entities, list) + list_for_each_entry(s_entity, &rq->entities, list) { + /* + * The justification for this BUG() is that tearing + * down the scheduler while jobs are pending leaves + * dma_fences unsignaled. Since we have dependencies + * from the core memory management to eventually signal + * dma_fences this can lead to a system wide stop + * because of a locked up memory management. + */ + if (spsc_queue_count(&s_entity->job_queue)) + BUG(); + else + WARN(); + /* * Prevents reinsertion and marks job_queue as idle, * it will be removed from the rq in drm_sched_entity_fini() * eventually */ s_entity->stopped = true; + } spin_unlock(&rq->lock); kfree(sched->sched_rq[i]); } From patchwork Fri Sep 27 08:41:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13814020 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3A672CEBF84 for ; Fri, 27 Sep 2024 08:42:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AB74B10EC7B; Fri, 27 Sep 2024 08:42:04 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="i30ypwcy"; dkim-atps=neutral Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8A8BB10EC7A for ; Fri, 27 Sep 2024 08:42:00 +0000 (UTC) Received: by mail-lj1-f178.google.com with SMTP id 38308e7fff4ca-2f762de00fbso22243851fa.2 for ; Fri, 27 Sep 2024 01:42:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727426519; x=1728031319; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=mDMgyKKew/IZsYrrEs1u6/IPCFvw+mgWRq2aYRNTWOU=; b=i30ypwcycACNTlif8odwFmQzjq+PTvLsCcnkQuKrSA18smKmSlfhAFDUq4PbSn1fWN cfXhSddaA3Zts8RA8hkPoxV+URBKahwcNRU4apy7qKLSd6p3JsMMw20YBPfs3WO9iVyD OgDHrwkncZ20y/i0D7c0gjBAw8s2Vp3IZhEr1P0u/IRY+DOIUp+LAAzOgioOMWcAq8x+ kLwBhWPkWr1dhaxfzXCL/Oa4cuOpLu0DWVlV4rZHntLm8khBzEe6gGaVuVaJRqOp5XFX jaWcf9sYEl+GDHCW0KNYA7ROf1ey7iUQfCPoHdbegQ4lv8xCBORMWverLSASGU01yFP9 1Hyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727426519; x=1728031319; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mDMgyKKew/IZsYrrEs1u6/IPCFvw+mgWRq2aYRNTWOU=; b=QTIbn3GN1/c9BBW4O3Hb2QYsoq8fJWtmGVhHTXI7/Mne6HFpYEH2NyZj2W9bbnli9S 915VSqJcgWfXTDjIKLX7/DLUIqXqVUAGh0Q5OHMLZBv7vXih0SgVrNTpuNO+XHM3tGod lmplGNaRG+h5IcA6IdNSW5cB11zuLmYpV9Cr5c8b6kLWbojrsMuIR7KCYaGx8kIKzyqU WdOD57m0BlvOv/2WeFi9ki/jO3ftgjY7i9mznIEEyB1be1BdEkmkqb5np97xG5Z/BcTi 3b7OOEte0LEwqgEBUOiW5JuVJBFukV4BU9Vdw3ZXR2wWfkQ24n8Xzs6OWK+05IN9LQta H51A== X-Forwarded-Encrypted: i=1; AJvYcCUjYx7d/QVszuGSGWYmMCAAXApON6Fgm7hxzMHNYqGTvJuR3m6lCzOvkoVRLreO/dyOjtkcWpMNEbs=@lists.freedesktop.org X-Gm-Message-State: AOJu0YyNVWX13t7wzlQn2G/a/dx9gQpWXC1fW14/Z36D1s/7wFofTBsu uFNSxoD4r5iVBuwPd2GgHIt88iGj08YNgHSeIJdZXl2sSJZ+wbtL X-Google-Smtp-Source: AGHT+IGZnZaQEdYuGQz5WHQTYA/v0g5kbvBONwZQM8n7Ehcx9ws/3Lls3We0mmno089vomnZrNVgtA== X-Received: by 2002:a05:651c:2104:b0:2f6:5492:dcaa with SMTP id 38308e7fff4ca-2f9d41ca8b1mr15839771fa.41.1727426518371; Fri, 27 Sep 2024 01:41:58 -0700 (PDT) Received: from able.fritz.box ([2a00:e180:1515:2900:6181:3f3e:a0cc:be1a]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42f57de2e46sm19363415e9.19.2024.09.27.01.41.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Sep 2024 01:41:57 -0700 (PDT) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: pstanner@redhat.com, dakr@kernel.org, ltuikov89@gmail.com, simona.vetter@ffwll.ch, dri-devel@lists.freedesktop.org Subject: [PATCH 2/2] drm/sched: clarify the documentation on drm_sched_entity_error Date: Fri, 27 Sep 2024 10:41:53 +0200 Message-Id: <20240927084153.18583-2-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240927084153.18583-1-christian.koenig@amd.com> References: <20240927084153.18583-1-christian.koenig@amd.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Sima requested that in a discussion, just copy&paste my explanation from the mail. Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b06c0696f970..90eba2ee3875 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -168,8 +168,21 @@ bool drm_sched_entity_is_ready(struct drm_sched_entity *entity) * drm_sched_entity_error - return error of last scheduled job * @entity: scheduler entity to check * - * Opportunistically return the error of the last scheduled job. Result can - * change any time when new jobs are pushed to the hw. + * Drivers should use this function in two ways: + * + * 1. In it's prepare callback so that when one submission fails all following + * from the same ctx are marked with an error number as well. + * + * This is intentionally done in a driver callback so that driver decides if + * they want subsequent submissions to fail or not. That can be helpful for + * example for in kernel paging queues where submissions don't depend on each + * other and a failed submission shouldn't cancel all following. + * + * 2. In it's submission IOCTL to reject new submissions and inform userspace + * that it needs to kick of some error handling. + * + * Returns the error of the last scheduled job. Result can change any time when + * new jobs are pushed to the hw. */ int drm_sched_entity_error(struct drm_sched_entity *entity) {