From patchwork Mon Jun 12 12:07:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 13276413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 694FFC7EE25 for ; Mon, 12 Jun 2023 12:07:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 502D610E206; Mon, 12 Jun 2023 12:07:42 +0000 (UTC) Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by gabe.freedesktop.org (Postfix) with ESMTPS id 55A4310E010 for ; Mon, 12 Jun 2023 12:07:40 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D118661693 for ; Mon, 12 Jun 2023 12:07:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 944C4C433EF for ; Mon, 12 Jun 2023 12:07:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1686571659; bh=7m0aXQ68iQWYEEliktpGifEVzQhcpTkJxmBDRCfQfKM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=J34yxTyHJpILIB7tW6SymKxRHYpxpversq9qGGF532YxnV5NMm3McXyZsn2hP8MBN w/ZaLhEbK6BmP/iyl4nzEmHCuHTvON/6F39F7tnDg9REz1RzsJfP+s0cTZQrohbPT0 73Ipogsa0xBfpLM7yIi/BoGrHA1jzdtvM/ByXj+eh0HQQG5uPG01PJ7At7iTqXu3lR 8TPjmXBUuM07g5XnzMLicrOqjZXW1b+gVYwOYE+ebgMO2QAYVV4oDPoXqzwTIaZLGU zleyr6f9YzTef36HAe5lEP0+5MGWMaS8/DjhxRLPsSF92Cso7zYZ3V4zalX2D30rsj hCJuogv8oCHVQ== From: Oded Gabbay To: dri-devel@lists.freedesktop.org Subject: [PATCH 2/3] accel/habanalabs: reset device if scrubbing failed Date: Mon, 12 Jun 2023 15:07:32 +0300 Message-Id: <20230612120733.3079507-2-ogabbay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230612120733.3079507-1-ogabbay@kernel.org> References: <20230612120733.3079507-1-ogabbay@kernel.org> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" If scrubbing memory after user released device has failed it means the device is in a bad state and should be reset. Signed-off-by: Oded Gabbay Signed-off-by: Oded Gabbay Reviewed-by: Ofir Bitton > --- drivers/accel/habanalabs/common/device.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c index 5e61761b8c11..d7d9198b2103 100644 --- a/drivers/accel/habanalabs/common/device.c +++ b/drivers/accel/habanalabs/common/device.c @@ -454,8 +454,10 @@ static void hpriv_release(struct kref *ref) /* Scrubbing is handled within hl_device_reset(), so here need to do it directly */ int rc = hdev->asic_funcs->scrub_device_mem(hdev); - if (rc) + if (rc) { dev_err(hdev->dev, "failed to scrub memory from hpriv release (%d)\n", rc); + hl_device_reset(hdev, HL_DRV_RESET_HARD); + } } /* Now we can mark the compute_ctx as not active. Even if a reset is running in a different