From patchwork Fri Sep 27 14:27:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13814373 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C0162CDD1B7 for ; Fri, 27 Sep 2024 14:28:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4106910EC6C; Fri, 27 Sep 2024 14:28:05 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="lueZNpqV"; dkim-atps=neutral Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4758910EC53 for ; Fri, 27 Sep 2024 14:28:01 +0000 (UTC) Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-42cb9a0c300so18647945e9.0 for ; Fri, 27 Sep 2024 07:28:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727447279; x=1728052079; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=vDlr/22kiZgOVq04AVr3qbPI+AtwcDD0GIxtAYzjnGg=; b=lueZNpqV9BCjTOazpnQP8YUOguuaU/y5ACOYSxrZA1bVFf1FiLQvYXUFgZJD3e4/cR p+fZEXJrKzcJKoOXEolwDU8EKwUTfpD8ugsJbNVemevAkY7D35qwSdewEcPgv9P3yTwj 5xiqqHXFE+zEl08q9gLiAnMQJEoBSGptkgvf+KrcYO4nL3r+LRYoNeX9+0IsMrNJZp/b j4pT0RxNfgbQBtOrX1LpvoQv2cvswA/Hrw4MOOQtQIs63ZpkJygcSv+PQ4OvbBCJbyaA akL0ztII7ZilENbQ3bzdMXwSXeU30hu7owvoYN2jWKACOAyxYthNZlkPtO9SF706Ixqz mhXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727447279; x=1728052079; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vDlr/22kiZgOVq04AVr3qbPI+AtwcDD0GIxtAYzjnGg=; b=dzdotsj6UKJ4opn6sb8skn4n4s111QuKdSInp05cr5UGbIRpNVre4nv52VBch8yWA/ MHe919wY8HbRNuygRwq0vkcCTg7zac25z+X+Nzcv8zOQucsy3O3NF/d9Ggv3hSaOapmV WqsQUyJWgsDL/r+0JPM/yeMD/zGOxGHlOGC2FVW5gHGjeG4J+xeq/VVLNTDD21lUIkJ+ 0w7DCp62GR4lCztlaiRORrTXntPJFYlvSxw3MfCBmrYZYUvArggU6RS2rh1KkqYfki/t /wfGQHfTVfnMvmEQjn4fX6Sovb1fMN4crzfKnoVqZ0xDQtcPhBlzMatDibeNpj7hauZI IByQ== X-Forwarded-Encrypted: i=1; AJvYcCXo5mfzjJee9kpQdTgTtQHBJuQOyLoYeiYD8G0+kSoJOAOTZ6HnQH3NMB0VGzYvhHV4/nxilGiJ9Og=@lists.freedesktop.org X-Gm-Message-State: AOJu0Yw29GriSIaNPmSr+FSeJW6Ax87JB45kUjNOdy/npGas49kPwotk UXNeIHvfc4zTDFk+LQH9kvgIeCTgSPnGw87Yb5QZ6G0OJZBFhSnll00JjKlb X-Google-Smtp-Source: AGHT+IGWDi46nCLGTlxwJwR0pgy9OaW3QKvpHW+cQ7kjSH3TeS2uAVQozB43VkdwGksGQzDcMYfZ5A== X-Received: by 2002:adf:e504:0:b0:37c:cec9:284c with SMTP id ffacd0b85a97d-37cd5ae0728mr2056107f8f.37.1727447279447; Fri, 27 Sep 2024 07:27:59 -0700 (PDT) Received: from able.fritz.box ([2a00:e180:1515:2900:6181:3f3e:a0cc:be1a]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42e96a36705sm76944445e9.33.2024.09.27.07.27.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Sep 2024 07:27:58 -0700 (PDT) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: pstanner@redhat.com, dakr@kernel.org, ltuikov89@gmail.com, simona.vetter@ffwll.ch, dri-devel@lists.freedesktop.org Subject: [PATCH 1/2] drm/sched: document drm_sched_fini requirements v2 Date: Fri, 27 Sep 2024 16:27:54 +0200 Message-Id: <20240927142755.103076-2-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240927142755.103076-1-christian.koenig@amd.com> References: <20240927142755.103076-1-christian.koenig@amd.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Document the necessary steps which needs to be done before calling drm_sched_fini(). Tearing down the scheduler with jobs still on the pending list can lead to use after free issues. Add a warning if drivers try to destroy a scheduler which still has work pushed to the HW. When there are still entities with jobs the situation is even worse since the dma_fences for those jobs can never signal we can just choose between potentially locking up core memory management and random memory corruption. When drivers really mess it up that well let them run into a BUG() to prevent further damage. v2: Use drm_WARN_ON() as suggested by Jani, add documentation what drivers need to do before calling drm_sched_fini. Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_main.c | 31 ++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index f093616fe53c..5510f04788d1 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1324,7 +1324,16 @@ EXPORT_SYMBOL(drm_sched_init); * * @sched: scheduler instance * - * Tears down and cleans up the scheduler. + * Before calling this function all entities which potentially used this + * scheduler instance should be forced idle using drm_sched_entity_flush() and + * detached from their scheduler using drm_entity_fini(). + * + * Special care must be taken if drivers allocate scheduler instances + * dynamically. Since the dma_fence signaling doesn't guarantee any processing + * order of callbacks it is possible that the scheduler is still cleaning up + * when fences have already signaled. The easiest way to avoid that is to keep a + * reference from the job to the scheduler and tear down the scheduler from a + * work item after the last job was cleaned up. */ void drm_sched_fini(struct drm_gpu_scheduler *sched) { @@ -1333,17 +1342,35 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) drm_sched_wqueue_stop(sched); + /* + * Tearing down the scheduler wile there are still unprocessed jobs can + * lead to use after free issues in the scheduler fence. + */ + drm_WARN_ON(sched, !list_empty(&sched->pending_list)); + for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) { struct drm_sched_rq *rq = sched->sched_rq[i]; spin_lock(&rq->lock); - list_for_each_entry(s_entity, &rq->entities, list) + drm_WARN_ON(sched, !list_empty(&rq->entities)); + list_for_each_entry(s_entity, &rq->entities, list) { + /* + * The justification for this BUG_ON() is that tearing + * down the scheduler while jobs are pending leaves + * dma_fences unsignaled. Since we have dependencies + * from the core memory management to eventually signal + * dma_fences this can lead to a system wide stop + * because of a locked up memory management. + */ + BUG_ON(spsc_queue_count(&s_entity->job_queue)); + /* * Prevents reinsertion and marks job_queue as idle, * it will removed from rq in drm_sched_entity_fini * eventually */ s_entity->stopped = true; + } spin_unlock(&rq->lock); kfree(sched->sched_rq[i]); } From patchwork Fri Sep 27 14:27:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13814374 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B832DCDD1B7 for ; Fri, 27 Sep 2024 14:28:09 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3AA0310EC72; Fri, 27 Sep 2024 14:28:09 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="cGOOuHBT"; dkim-atps=neutral Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4F54A10EC67 for ; Fri, 27 Sep 2024 14:28:02 +0000 (UTC) Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-42cae6bb895so20348195e9.1 for ; Fri, 27 Sep 2024 07:28:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727447280; x=1728052080; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=F67HKGJS01CpRs61gIMCLFbQnFdt6OhKu+DWzbIPnBU=; b=cGOOuHBTzFEK7phBOw+REQPoIVgJ/v6g72+ncQA8oN6f+xSYVjMYB8L0l++wUwbMMh o1k1KGvuW6sGJViyVMzlwM85nGLrQ3/KyJRNYcKx+jF1Z4i8rhApO4cck25nbL2JxyHj MQyuloKwb5DYOzvepfsT+kWqKIom81tX2K1O0ROCw/K1yAZ7ZOC1T/KdGceyTsJqbY9D BkTusH4XBQLBYVex0zTQzsQeIze0Aw07l1R3OmDgHFxtIUcBD5WRuM8z/uQvrGAwhJEq vP247rO/X5AnvZRvC6jB7SYpb0MqIFDWAwoO1CtFuzByDc/6TQtzWTS4bZhwkBwI5g5w Isjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727447280; x=1728052080; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F67HKGJS01CpRs61gIMCLFbQnFdt6OhKu+DWzbIPnBU=; b=YtFoAO6PfRKeUnBuVJfrFD0FTInvUzgYnzaa5n+xUZuCUjhqCa3fIRJlG2O6Wc51WC GjqPivnlalUQNEdOqwBgtD92nC03hXDwMa9jgb1pVximRzjEh1Vt+Ys7fQxGVWicRFK/ kd/3K/Kt3ODqhpfPZfS5krjEGPDJfakbPQp69MNwBWAduwXUvpHVG94I2wdSOgdCWZ64 d65CNFZCVtwx6tWSNEqaTRiX3fInRJkSWXfk6fX6010aN7+Zuq2BEfd+cE4xrTWL8EpJ Zr9Elnfv7J0ysC3der+6wKG2BL8LbyJ2DSoFMwPyvfCcK0D7XcUVxwR2iNz9nDwBNBOw ckZw== X-Forwarded-Encrypted: i=1; AJvYcCUJpuq48eBaXpxYC0uVkCS+Uxm82Tzs21OO3DueCWyek5g1yDAg/A56LeTxgazk4pQJ/GZSsutKZco=@lists.freedesktop.org X-Gm-Message-State: AOJu0Yyd8LnkJTFtj8dzOX/T5j1+Q6P66XenvIr2MAhcxjBPexo0MYSs plsNbnikSr0/Y48++xjbRC6+a16qOGhs/HNqGNNs/rSD2at7EbGK X-Google-Smtp-Source: AGHT+IGqMUeQTNMvVGjzppJTM4qR8JTQ3Wd7JMpDIOlQgOflfafr+/HEu+rk4iWn6/Tu/i2H7Umxfw== X-Received: by 2002:a05:600c:3783:b0:42c:b942:1bba with SMTP id 5b1f17b1804b1-42f58488360mr24405295e9.27.1727447280261; Fri, 27 Sep 2024 07:28:00 -0700 (PDT) Received: from able.fritz.box ([2a00:e180:1515:2900:6181:3f3e:a0cc:be1a]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42e96a36705sm76944445e9.33.2024.09.27.07.27.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Sep 2024 07:27:59 -0700 (PDT) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: pstanner@redhat.com, dakr@kernel.org, ltuikov89@gmail.com, simona.vetter@ffwll.ch, dri-devel@lists.freedesktop.org Subject: [PATCH 2/2] drm/sched: clarify the documentation on drm_sched_entity_error Date: Fri, 27 Sep 2024 16:27:55 +0200 Message-Id: <20240927142755.103076-3-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240927142755.103076-1-christian.koenig@amd.com> References: <20240927142755.103076-1-christian.koenig@amd.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Sima requested that in a discussion, just copy&paste my explanation from the mail. Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 58c8161289fe..571e2f2365a1 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -166,8 +166,21 @@ bool drm_sched_entity_is_ready(struct drm_sched_entity *entity) * drm_sched_entity_error - return error of last scheduled job * @entity: scheduler entity to check * - * Opportunistically return the error of the last scheduled job. Result can - * change any time when new jobs are pushed to the hw. + * Drivers should use this function in two ways: + * + * 1. In it's prepare callback so that when one submission fails all following + * from the same ctx are marked with an error number as well. + * + * This is intentionally done in a driver callback so that driver decides if + * they want subsequent submissions to fail or not. That can be helpful for + * example for in kernel paging queues where submissions don't depend on each + * other and a failed submission shouldn't cancel all following. + * + * 2. In it's submission IOCTL to reject new submissions and inform userspace + * that it needs to kick of some error handling. + * + * Returns the error of the last scheduled job. Result can change any time when + * new jobs are pushed to the hw. */ int drm_sched_entity_error(struct drm_sched_entity *entity) {