From patchwork Tue Oct 29 15:29:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13855106 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F3721D3A66C for ; Tue, 29 Oct 2024 15:29:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 285A910E684; Tue, 29 Oct 2024 15:29:17 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="YieOf0lQ"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id DF20510E683 for ; Tue, 29 Oct 2024 15:29:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1730215754; bh=9Vzc5cjseL09Uwc9IZorIUMr2qOPWNxYj54sxB69S/c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YieOf0lQggtc+9IgrUfGNFu7JJaaHVovtc0/jXpYMI5ddOLsi1L6/O3cxuSGZwZlJ HS01jiwXEzEJU42nRGptqPs/hDaUjVlGBs+9hlGJ62F2PwzfYB8aWzMAWWwPnLLKLr vScSfMVM3luiQG6tELwtOdDiFByEzvOAUKW9IHKFKgx1wkl9P3lDc/WpjIjLE7O95+ mA8lg7jl9QReQUe7Vr6qPnzozmYYWGydWAMzwd/KWh+Hj5c6UdVO1OI03SxwWi5j52 wCL4adQjMtUfXnsbKnq/FjUlnBSPzIy47PBMWsrugX/Bt/JYN6etY+uAhLHElwGBWl KnadCZeMaIO5g== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id 34B3617E3661; Tue, 29 Oct 2024 16:29:14 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: Christopher Healy , dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v3 1/3] drm/panthor: Fail job creation when the group is dead Date: Tue, 29 Oct 2024 16:29:10 +0100 Message-ID: <20241029152912.270346-2-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241029152912.270346-1-boris.brezillon@collabora.com> References: <20241029152912.270346-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Userspace can use GROUP_SUBMIT errors as a trigger to check the group state and recreate the group if it became unusable. Make sure we report an error when the group became unusable. Changes in v3: - None Changes in v2: - Add R-bs Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Liviu Dudau --- drivers/gpu/drm/panthor/panthor_sched.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index eb9f6635cc12..eda8fbb276b3 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -3688,6 +3688,11 @@ panthor_job_create(struct panthor_file *pfile, goto err_put_job; } + if (!group_can_run(job->group)) { + ret = -EINVAL; + goto err_put_job; + } + if (job->queue_idx >= job->group->queue_count || !job->group->queues[job->queue_idx]) { ret = -EINVAL; From patchwork Tue Oct 29 15:29:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13855107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 85647D3A66B for ; Tue, 29 Oct 2024 15:29:21 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 25F0510E2A2; Tue, 29 Oct 2024 15:29:17 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="Rc2Y9mWv"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4186E10E2A2 for ; Tue, 29 Oct 2024 15:29:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1730215755; bh=FxOUJtCQ+n7DIYEzUNQj5cfDzP5GIdWINDU5tFcG694=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Rc2Y9mWvCe1m8+dSfpub2SRSYZFS+NUPIzMtrdwsN6WrWUv7PZKXwNKpJk8aSDpre +abAwFS/hymu37Rq8nMyb8THvSoOCcQJdDa/c189oM1ATWLzGTQekd39tXTPz70Uyn n5tJ/+wWo7iInqnhYYIAVTT/1eBbtOoqwvfjj4t+Oi7TX4Kl/hfM5nivG+sVLmDS6H xwwuo7u1iiimmFc4mDLko6sOBjqYucKabPAf8av6N1Eiz06GuIsMbVa3J9eoSXWfdu iY1BdCw1GqoNk5wKY1onNNX6qMNyKyCZsQGbgoh2Yk4mkBBk4BeslFvsqM+i4y6hO5 EoYuO3lty8yzw== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id A7A5417E3663; Tue, 29 Oct 2024 16:29:14 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: Christopher Healy , dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v3 2/3] drm/panthor: Report group as timedout when we fail to properly suspend Date: Tue, 29 Oct 2024 16:29:11 +0100 Message-ID: <20241029152912.270346-3-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241029152912.270346-1-boris.brezillon@collabora.com> References: <20241029152912.270346-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" If we don't do that, the group is considered usable by userspace, but all further GROUP_SUBMIT will fail with -EINVAL. Changes in v3: - Add R-bs Changes in v2: - New patch Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Liviu Dudau --- drivers/gpu/drm/panthor/panthor_sched.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index eda8fbb276b3..ef4bec7ff9c7 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -602,10 +602,11 @@ struct panthor_group { * @timedout: True when a timeout occurred on any of the queues owned by * this group. * - * Timeouts can be reported by drm_sched or by the FW. In any case, any - * timeout situation is unrecoverable, and the group becomes useless. - * We simply wait for all references to be dropped so we can release the - * group object. + * Timeouts can be reported by drm_sched or by the FW. If a reset is required, + * and the group can't be suspended, this also leads to a timeout. In any case, + * any timeout situation is unrecoverable, and the group becomes useless. We + * simply wait for all references to be dropped so we can release the group + * object. */ bool timedout; @@ -2687,6 +2688,12 @@ void panthor_sched_suspend(struct panthor_device *ptdev) csgs_upd_ctx_init(&upd_ctx); while (slot_mask) { u32 csg_id = ffs(slot_mask) - 1; + struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id]; + + /* We consider group suspension failures as fatal and flag the + * group as unusable by setting timedout=true. + */ + csg_slot->group->timedout = true; csgs_upd_ctx_queue_reqs(ptdev, &upd_ctx, csg_id, CSG_STATE_TERMINATE, From patchwork Tue Oct 29 15:29:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13855108 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 55678D3A661 for ; Tue, 29 Oct 2024 15:29:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C481D10E686; Tue, 29 Oct 2024 15:29:27 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="paxQdUTP"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id A801D10E2A2 for ; Tue, 29 Oct 2024 15:29:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1730215755; bh=dXMMyOB60m9ALxzKyHFr3SJrapxIVt4hmhwLSW54Brs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=paxQdUTPbz54DDaTA4LT1VnInP7oeOG6wixhcDMiYkktNbwIC9igk9UNZToUFIvMM nwNK2YLmrByyEc9Y0wJOdHekT/TS89tDpeEFG8SOgGR8XH8SqIm1chJWyZs3IywpYv 6kE5pmA/Sbq4Fd8LkAi1XCARXmfAYSR4fAAtiMfds485/40D3FwyGUaE1CLC8Fn81C cnZDqSGo3OG2aaPdFXXi7+OwUo74wlyUBdz2PHtApvsq7lk4n7f1Y3Y3FDOLN6DT5d cDTQeJctgCItv4HkW6wZpSY5QKEcPf1Kzogu2qvAg2U7RYDfXQDnVXBPqwmeG88CBm eDwtj+sRHWWHg== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id 2221D17E3676; Tue, 29 Oct 2024 16:29:15 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: Christopher Healy , dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v3 3/3] drm/panthor: Report innocent group kill Date: Tue, 29 Oct 2024 16:29:12 +0100 Message-ID: <20241029152912.270346-4-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241029152912.270346-1-boris.brezillon@collabora.com> References: <20241029152912.270346-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Groups can be killed during a reset even though they did nothing wrong. That usually happens when the FW is put in a bad state by other groups, resulting in group suspension failures when the reset happens. If we end up in that situation, flag the group innocent and report innocence through a new DRM_PANTHOR_GROUP_STATE flag. Bump the minor driver version to reflect the uAPI change. Changes in v3: - Actually report innocence to userspace Changes in v2: - New patch Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panthor/panthor_drv.c | 2 +- drivers/gpu/drm/panthor/panthor_sched.c | 18 ++++++++++++++++++ include/uapi/drm/panthor_drm.h | 9 +++++++++ 3 files changed, 28 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index ac7e53f6e3f0..f1dff7e0173d 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1507,7 +1507,7 @@ static const struct drm_driver panthor_drm_driver = { .desc = "Panthor DRM driver", .date = "20230801", .major = 1, - .minor = 2, + .minor = 3, .gem_create_object = panthor_gem_create_object, .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table, diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index ef4bec7ff9c7..97ed5fe5a191 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -610,6 +610,16 @@ struct panthor_group { */ bool timedout; + /** + * @innocent: True when the group becomes unusable because the group suspension + * failed during a reset. + * + * Sometimes the FW was put in a bad state by other groups, causing the group + * suspension happening in the reset path to fail. In that case, we consider the + * group innocent. + */ + bool innocent; + /** * @syncobjs: Pool of per-queue synchronization objects. * @@ -2690,6 +2700,12 @@ void panthor_sched_suspend(struct panthor_device *ptdev) u32 csg_id = ffs(slot_mask) - 1; struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id]; + /* If the group was still usable before that point, we consider + * it innocent. + */ + if (group_can_run(csg_slot->group)) + csg_slot->group->innocent = true; + /* We consider group suspension failures as fatal and flag the * group as unusable by setting timedout=true. */ @@ -3570,6 +3586,8 @@ int panthor_group_get_state(struct panthor_file *pfile, get_state->state |= DRM_PANTHOR_GROUP_STATE_FATAL_FAULT; get_state->fatal_queues = group->fatal_queues; } + if (group->innocent) + get_state->state |= DRM_PANTHOR_GROUP_STATE_INNOCENT; mutex_unlock(&sched->lock); group_put(group); diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h index 87c9cb555dd1..b99763cbae48 100644 --- a/include/uapi/drm/panthor_drm.h +++ b/include/uapi/drm/panthor_drm.h @@ -923,6 +923,15 @@ enum drm_panthor_group_state_flags { * When a group ends up with this flag set, no jobs can be submitted to its queues. */ DRM_PANTHOR_GROUP_STATE_FATAL_FAULT = 1 << 1, + + /** + * @DRM_PANTHOR_GROUP_STATE_INNOCENT: Group was killed during a reset caused by other + * groups. + * + * This flag can only be set if DRM_PANTHOR_GROUP_STATE_TIMEDOUT is set and + * DRM_PANTHOR_GROUP_STATE_FATAL_FAULT is not. + */ + DRM_PANTHOR_GROUP_STATE_INNOCENT = 1 << 2, }; /**