From patchwork Wed Dec 22 22:04:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12697346 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5C1F3C433FE for ; Wed, 22 Dec 2021 22:06:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EAABC10E35C; Wed, 22 Dec 2021 22:05:57 +0000 (UTC) Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2071.outbound.protection.outlook.com [40.107.94.71]) by gabe.freedesktop.org (Postfix) with ESMTPS id 74F2E10E348; Wed, 22 Dec 2021 22:05:56 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=O0ZofkNCqw3ztKi+CaURjbpJZ7FhcLHHpsC9MECm4pREv/9g/3r7K4ZOMBsYM9e0vwq0/33f0ZGcCY/qzQC7DQqpa5CyYLXrwmAHkIK8g3gGQEj3XDNjC90Jl1BrDpdgCCVNJf/Szx/3TQi/QicGJ5Oqnyno2o5EkwLecUewVm7NPcQDOK3+JRPTPGMG45RNW9ntpzeI3MT1n579FCDLxWqRjkfI6Hdx+9sjzEn0owbzdnuY7lAZxVcxZOorO2M4uxSPc51JjLG/8Xvs1mI2841n2FIsXjfX0QkqNWyjaIP608XjzUjaBqG/Jo2vYJ6eGk3FSfVFm7IQ/wwiEGa4lQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0jDyLVNTFQabu8nZ8kZWGIAPZD4i5irSpsyZpfcEzBU=; b=kwqk5xzAvFKc9sGj3A61asygoamnn0EOxzwQYs/d7NmZyQiKg6/+dWMVXhc+GxKZmQ5ebYranL9/BIaVHN8zdhfkVk54vBihY8DqybYXVBMOmvypf0BuDkKArA7WyCESjc0mUdXIXhSyoY7pdDZDFOK1v4vurCIWT3Lo8wesVJTNSX+ax1SnrFJveXQ4nTuivnnnjYt8KEQr6s3LsFdzPtU4OkSEuZqjCJS4NzNlw1E/9Ba0bWvcMQkPx/rdf0HSknXdhm1BgnRzImZeIkW1R9pcu9lqdU/NoHLhHQW+iEXHTVwtrFDX9wum9h2cLYZzqwNdtdyikk5e0ApbQfrmrg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0jDyLVNTFQabu8nZ8kZWGIAPZD4i5irSpsyZpfcEzBU=; b=L33qKJjUkaViXsIO2oPXGiKV2q5wsj9FNQniSrO2FCv5MvVsssID/NSPyieF55DTlRbYOxvGkp5HGrT266GFyv09640lhw+8tvqCBBhrx4ALHVmjZer3iTO5FdiFI9W1YPn51ZTrhDEfcwQ0xgw67jNMtOpMvxP59Ej4zho0ip0= Received: from CO2PR04CA0198.namprd04.prod.outlook.com (2603:10b6:104:5::28) by BN6PR1201MB0100.namprd12.prod.outlook.com (2603:10b6:405:59::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.19; Wed, 22 Dec 2021 22:05:55 +0000 Received: from CO1NAM11FT030.eop-nam11.prod.protection.outlook.com (2603:10b6:104:5:cafe::5f) by CO2PR04CA0198.outlook.office365.com (2603:10b6:104:5::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.17 via Frontend Transport; Wed, 22 Dec 2021 22:05:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1NAM11FT030.mail.protection.outlook.com (10.13.174.125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4823.18 via Frontend Transport; Wed, 22 Dec 2021 22:05:54 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Wed, 22 Dec 2021 16:05:52 -0600 From: Andrey Grodzovsky To: , Subject: [RFC v2 1/8] drm/amdgpu: Introduce reset domain Date: Wed, 22 Dec 2021 17:04:59 -0500 Message-ID: <20211222220506.789133-2-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211222220506.789133-1-andrey.grodzovsky@amd.com> References: <20211222220506.789133-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 4ada8295-f72a-482b-9cb6-08d9c5973990 X-MS-TrafficTypeDiagnostic: BN6PR1201MB0100:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:4502; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: annruevZrtsiZuclMMPikiK4xml3RdLIMcMiqBc6YAn6enhkBAolROQ8mtXAhNaAMAWdy4W2k6NJPhP79okDdJcwnc9PHRAYRjNFFKgBy2BAB5Be3vXS480YvE3BvDZdIXR4/fVe3kFB5mC49vmvihNE0wim7DN+QNomnrsfXDsrlvNO3d5zmdGZ0oSRzighAzwjuP4/jKmzdOAH9p4QNkhqkFCK6RADk9VkLehjDHEj/7Spxp2OLMZSoeA+juQKdDKaRIZpLFcyKaLNy95BAJYGbrSmeVlpmcte+bMxTQm+S6NKPcwFN3ckoDGUyOPDWxlnYa2VHLzFMlBiKglu+yA/bDBs8E2zbuIIfBKzRn5HdIO/goY22peN1OfmBUOx7Vp1T0CfHh3jhEMQxNTNrXLADeYEG4SOI1PmSX/iUbTgbhex+F3zdR1x9nDvJfLTPRfQ+6G556d3sp4yh0JMZVxMNE4a1aAGyhMnhMC1J6m5mdrQGU0RztLIe9LgbfW0pVHr45rqEXiiHEE2wX+REB88AXx1mOWyiNFyRtemJxaWOonKg5n9QGFVAByO86Xn7UrmYsfufg7fe6Txy6xUI7+Ifva8tr+S8zodMCAi9UVFUcUHRgzJjCOdInLCrkNnR1G6/R77hsiDG5BEPDYF9hdjggYdlRRlMTV04GmfjzDZ7qS5FMjvPjNx532XU7X+9uP+OSj6JE1jIEgXjbixSdZCWIcDuwpnzJDQlP1UPEQ6doZ/YN76QH3p9UHKUP36S9oyLtBUJTg6QEqRiSnKReusPoPA2Oo53ZbGqVrrvjE= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(40470700002)(46966006)(36840700001)(70586007)(44832011)(86362001)(70206006)(1076003)(8676002)(8936002)(40460700001)(4326008)(82310400004)(110136005)(356005)(36860700001)(508600001)(26005)(186003)(16526019)(54906003)(2906002)(7696005)(83380400001)(47076005)(426003)(2616005)(36756003)(66574015)(5660300002)(316002)(81166007)(336012)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Dec 2021 22:05:54.4585 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4ada8295-f72a-482b-9cb6-08d9c5973990 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT030.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR1201MB0100 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Vetter , horace.chen@amd.com, =?utf-8?q?C?= =?utf-8?q?hristian_K=C3=B6nig?= , christian.koenig@amd.com, Monk.Liu@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another. Do it for both single device and XGMI hive cases. Signed-off-by: Andrey Grodzovsky Suggested-by: Daniel Vetter Suggested-by: Christian König Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 2 ++ 4 files changed, 37 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 9f017663ac50..b5ff76aae7e0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -812,6 +812,11 @@ struct amd_powerplay { #define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4 + +struct amdgpu_reset_domain { + struct workqueue_struct *wq; +}; + struct amdgpu_device { struct device *dev; struct pci_dev *pdev; @@ -1096,6 +1101,8 @@ struct amdgpu_device { struct amdgpu_reset_control *reset_cntl; uint32_t ip_versions[HW_ID_MAX][HWIP_MAX_INSTANCE]; + + struct amdgpu_reset_domain reset_domain; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 90d22a376632..0f3e6c078f88 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2391,9 +2391,27 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (r) goto init_failed; - if (adev->gmc.xgmi.num_physical_nodes > 1) + if (adev->gmc.xgmi.num_physical_nodes > 1) { + struct amdgpu_hive_info *hive; + amdgpu_xgmi_add_device(adev); + hive = amdgpu_get_xgmi_hive(adev); + if (!hive || !hive->reset_domain.wq) { + DRM_ERROR("Failed to obtain reset domain info for XGMI hive:%llx", hive->hive_id); + r = -EINVAL; + goto init_failed; + } + + adev->reset_domain.wq = hive->reset_domain.wq; + } else { + adev->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-dev", 0); + if (!adev->reset_domain.wq) { + r = -ENOMEM; + goto init_failed; + } + } + /* Don't init kfd if whole hive need to be reset during init */ if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 567df2db23ac..a858e3457c5c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -392,6 +392,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) goto pro_end; } + hive->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-hive", 0); + if (!hive->reset_domain.wq) { + dev_err(adev->dev, "XGMI: failed allocating wq for reset domain!\n"); + kfree(hive); + hive = NULL; + goto pro_end; + } + hive->hive_id = adev->gmc.xgmi.hive_id; INIT_LIST_HEAD(&hive->device_list); INIT_LIST_HEAD(&hive->node); @@ -401,6 +409,7 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) task_barrier_init(&hive->tb); hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN; hive->hi_req_gpu = NULL; + /* * hive pstate on boot is high in vega20 so we have to go to low * pstate on after boot. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index d2189bf7d428..6121aaa292cb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -42,6 +42,8 @@ struct amdgpu_hive_info { AMDGPU_XGMI_PSTATE_MAX_VEGA20, AMDGPU_XGMI_PSTATE_UNKNOWN } pstate; + + struct amdgpu_reset_domain reset_domain; }; struct amdgpu_pcs_ras_field { From patchwork Wed Dec 22 22:05:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12697349 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 28173C433F5 for ; Wed, 22 Dec 2021 22:06:12 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CF72E10E376; Wed, 22 Dec 2021 22:06:03 +0000 (UTC) Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam08on2069.outbound.protection.outlook.com [40.107.102.69]) by gabe.freedesktop.org (Postfix) with ESMTPS id 143CB10E34F; Wed, 22 Dec 2021 22:05:59 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TOKBC0+3Lz10Q0e0kOaQTtlLuKTOAUVPBrYw6slnRMiKsH1sMKuDL7r1jRVRA+M9+NMJl9dSunRkfaAmH8uvd/HfbpLaadJYrP6RFF6q1G6pUunXx3errJAGXCmGA/svACgRkd8iS8La28f2pmElvKEdaw7mbvO9kBtc5Ii+6kk/zE7tqP9nLY3yZHsHglpGeUNvijWSKB8LaAVMusWbKRDgvTnkyh7RBarKGhygAYNvcZfc657TRNoCI5pK7DvVI5yfbDooFdo9NLHlMEBiQr602VGjpc6EAP9vuYC5B6yjNtL49oaYeiS0/Aqqz47yz7AQOjIIv7mGSXrNBTz0kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Lk+/v5Ru5WVHNwhF49eB/zlLQWIh929hS53FBd/x8jo=; b=XREJkiPlIjFnoP7zA+WWqyCS+EEQHwKOAkNfvxmSShuu4a3WNeXipqWtf781XarbYXLMfLy3sayfnLGsN99TzkxG1CxfcMdhar2MhkWUVosf0WmV/yy0CSG8CeNElSiUsWzeGpU0/YKyQlrdWZnPSw1peQNryRbtDE4UNCCU0J9d5gf6V36PiFqwtX3Bap2g/MCYoYK13mNsFh52qXT7ejWqObWM7nBWxq9rjhbKW+1IUZRAxS7fA+ovt3EhyNDeAY6P2ajm3SVGgMDxA1mQiLbrvIQKZpVN2iOS50Lykl49xKM5YKZr8mxCLcRFjhuotDlcgr2x0flblS/pl4pr/A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Lk+/v5Ru5WVHNwhF49eB/zlLQWIh929hS53FBd/x8jo=; b=UIejDADHTDQ3ueckHqECXsw17b6XZcIYi8/mWGUU+swDyAq0nl39P9yzeuCixKoCJKElQ/Uf2Qa1W7ZbHHL8ZRpaoTBLTh+6tfCXLd/oRNFQlq8aKBegNHAS7B5Fv9P2wMtNNBzOOBTCJwoVIPzyvTWgeRpixIHuC6wk6/wx++4= Received: from CO2PR04CA0202.namprd04.prod.outlook.com (2603:10b6:104:5::32) by CY4PR12MB1480.namprd12.prod.outlook.com (2603:10b6:910:f::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.17; Wed, 22 Dec 2021 22:05:55 +0000 Received: from CO1NAM11FT030.eop-nam11.prod.protection.outlook.com (2603:10b6:104:5:cafe::16) by CO2PR04CA0202.outlook.office365.com (2603:10b6:104:5::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.17 via Frontend Transport; Wed, 22 Dec 2021 22:05:55 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1NAM11FT030.mail.protection.outlook.com (10.13.174.125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4823.18 via Frontend Transport; Wed, 22 Dec 2021 22:05:55 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Wed, 22 Dec 2021 16:05:54 -0600 From: Andrey Grodzovsky To: , Subject: [RFC v2 2/8] drm/amdgpu: Move scheduler init to after XGMI is ready Date: Wed, 22 Dec 2021 17:05:00 -0500 Message-ID: <20211222220506.789133-3-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211222220506.789133-1-andrey.grodzovsky@amd.com> References: <20211222220506.789133-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: b4c354bd-30aa-4560-1def-08d9c5973a16 X-MS-TrafficTypeDiagnostic: CY4PR12MB1480:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:510; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: nst00jqddR+LwDzG0uVMp6uLxK/iK5CgdbDxyXKBFRCR1GyUvYaaOho5MhQIP/BcuEuWl7uhiS8juxpISxMHrsdRufIam41nQGMSUroA109cqI9yeoBVYx8EtSldoojbsheeAcUH6QHtO52X++m/3rDcv4tp2DnLYV3URjmIZ9xGaNwR8bTSJQekdGzP2RXS9LdctYNkzO6aCaG7z33SqFoiYNFq1qchqBbdINfKpuYfanEorEqwRdFYjm9555gB4WzE5siKv1o5xW/6z5vhiYxFmvX2sGvzQcXQjUuXrsGPF5xujLn/HNedDBk6NnHhEIyQisAs8Yw1wgc0DeH4X2UAYJHndZg2Oar6H1vz6tufG2uwi6rHzuELF3N6Az78F6iKFBz+ZenYBj10Em+8b/9YiuWRrpfcTwh2ScV5ayHFoXW4ai3o3zmPMeT5cV1UlnJhKtUIiyWYdVyaNtymNPrCDwMlUAfGQMYgUBxkJCg6LnWCi4P4HL/mj7Ajg8nUHNZ/I/gSwMKF4/qwZI85JlojFaX35+04+jrLrNN9pT8jn8C69H9mWp7+i+C5LIcl6/rKusCAtmodkrvbLxIH4aqpsTSxXZ/6ck1/tprfDdGwfKyojO/XENkOzv9BXHMWbeVDO5kBMp1pKAttmh74zIzD1s3Z2JL/Q0scjKWYlh30JxJfHGCDzjlFL89Ory0yxozjWb21dkPu8Jkje5jrYUgeEirkBYWmNQwtD8fgF463NJaT1o/SQTgLuaRK/l4OBlBmocTe9Qy7i8tmrgMGWRPbcl7FCWovGYuRQLOybLg= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(36840700001)(40470700002)(46966006)(6666004)(70586007)(36860700001)(2616005)(81166007)(70206006)(44832011)(26005)(336012)(47076005)(1076003)(426003)(4326008)(54906003)(16526019)(83380400001)(5660300002)(36756003)(316002)(7696005)(8676002)(40460700001)(110136005)(356005)(82310400004)(186003)(2906002)(86362001)(8936002)(508600001)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Dec 2021 22:05:55.3335 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b4c354bd-30aa-4560-1def-08d9c5973a16 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT030.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR12MB1480 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 34 ++-------------- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + 3 files changed, 51 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 0f3e6c078f88..7c063fd37389 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev) return r; } +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) +{ + long timeout; + int r, i; + + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = adev->rings[i]; + + /* No need to setup the GPU scheduler for rings that don't need it */ + if (!ring || ring->no_scheduler) + continue; + + switch (ring->funcs->type) { + case AMDGPU_RING_TYPE_GFX: + timeout = adev->gfx_timeout; + break; + case AMDGPU_RING_TYPE_COMPUTE: + timeout = adev->compute_timeout; + break; + case AMDGPU_RING_TYPE_SDMA: + timeout = adev->sdma_timeout; + break; + default: + timeout = adev->video_timeout; + break; + } + + r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, + ring->num_hw_submission, amdgpu_job_hang_limit, + timeout, adev->reset_domain.wq, ring->sched_score, ring->name); + if (r) { + DRM_ERROR("Failed to create scheduler on ring %s.\n", + ring->name); + return r; + } + } + + return 0; +} + + /** * amdgpu_device_ip_init - run init for hardware IPs * @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) } } + r = amdgpu_device_init_schedulers(adev); + if (r) + goto init_failed; + /* Don't init kfd if whole hive need to be reset during init */ if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 3b7e86ea7167..5527c68c51de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, atomic_t *sched_score) { struct amdgpu_device *adev = ring->adev; - long timeout; - int r; if (!adev) return -EINVAL; @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, spin_lock_init(&ring->fence_drv.lock); ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *), GFP_KERNEL); - if (!ring->fence_drv.fences) - return -ENOMEM; - /* No need to setup the GPU scheduler for rings that don't need it */ - if (ring->no_scheduler) - return 0; + ring->num_hw_submission = num_hw_submission; + ring->sched_score = sched_score; - switch (ring->funcs->type) { - case AMDGPU_RING_TYPE_GFX: - timeout = adev->gfx_timeout; - break; - case AMDGPU_RING_TYPE_COMPUTE: - timeout = adev->compute_timeout; - break; - case AMDGPU_RING_TYPE_SDMA: - timeout = adev->sdma_timeout; - break; - default: - timeout = adev->video_timeout; - break; - } - - r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, - num_hw_submission, amdgpu_job_hang_limit, - timeout, NULL, sched_score, ring->name); - if (r) { - DRM_ERROR("Failed to create scheduler on ring %s.\n", - ring->name); - return r; - } + if (!ring->fence_drv.fences) + return -ENOMEM; return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 4d380e79752c..a4b8279e3011 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -253,6 +253,8 @@ struct amdgpu_ring { bool has_compute_vm_bug; bool no_scheduler; int hw_prio; + unsigned num_hw_submission; + atomic_t *sched_score; }; #define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), (ib))) From patchwork Wed Dec 22 22:05:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12697347 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 89DA7C433F5 for ; Wed, 22 Dec 2021 22:06:08 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1E5A510E34B; Wed, 22 Dec 2021 22:06:03 +0000 (UTC) Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2082.outbound.protection.outlook.com [40.107.92.82]) by gabe.freedesktop.org (Postfix) with ESMTPS id D805410E358; Wed, 22 Dec 2021 22:06:00 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XxejSl60LwL2kX/P4/jk1mi+zjyA+HLTZ41oNT+FfUYKvrA9fgpHS+gnnzn/ioCFx7WUqADKn5Uk2zWGm2kutsMe49HlsnVzfVr3URq/ht4bHd+jlEqsvaZIxY26Dp7PbkVQ0zo/zKCjN7u2hqoL7fEi6sZc0Q8ZT+UjkuUmtY85VynFH5GwK1XMeD+wiMcoTAkobgPCS4ljaIuCHBZ2czwLAiwWmqP3Jf5aMZFkdSplr54TEeRjMF45RmNTilrwG6zcYZKhvVYqzm7b0BhXLM2Ao26fc9qWjP9kV2dMT+S1D13a1WKpuWZfjUZP3UISK/H6NCpdTnRKhkwaPX8t/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+4/2uRIx9zwn9X49l80cVOBWoM5CWkrEJhA6a+OVWwM=; b=B6dHYa2tsNpoOYHg1tokITk72OZr5OX0965JNBk6TlH5ogBg7vgyBemQT3KrQkJQ1lLMmpBMdlp0c7MQHO6YkboFNdHVzHXuJmdf0kKrIXh/APJHfT4v3Onx2C4MEeTJ/TZ9dF6Y4svxhhERbLXM46G00CYltEKaH2Zzp6qOOPeF7bj9YQ5kvsq1YtJOcKrBHC0DBkd5Oi27edFMN6RsCnIQANsoW851eaUTneRbvGGki0yZBYPQ6yq0BYhuSefDj2/z1QDFWlgo5Tg7Jp4yQjyhM/93+nz9DnPhy1AT5GRM2mTjI8TnFMFQLURt9K95ndfwRFwCtZKJgzE6CcpDtg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+4/2uRIx9zwn9X49l80cVOBWoM5CWkrEJhA6a+OVWwM=; b=Jm/wGMMcJNR2/fmJg++yLbk74AQJ2F5VQ7W+KtYgfbGTpwBoRPS+Hw25UMC6e6fTRXDZTyWhFXoMIJOT3GzU7NcgQu3uNaoTyqc8Al4ZkhF51elkDVveh7hQr7ZjN6tippCd4L07xvnhE0NLe+J8ONydpt8XFcsKF8SWOENKtwU= Received: from CO2PR04CA0185.namprd04.prod.outlook.com (2603:10b6:104:5::15) by BYAPR12MB2776.namprd12.prod.outlook.com (2603:10b6:a03:67::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.20; Wed, 22 Dec 2021 22:05:57 +0000 Received: from CO1NAM11FT030.eop-nam11.prod.protection.outlook.com (2603:10b6:104:5:cafe::94) by CO2PR04CA0185.outlook.office365.com (2603:10b6:104:5::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.18 via Frontend Transport; Wed, 22 Dec 2021 22:05:57 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1NAM11FT030.mail.protection.outlook.com (10.13.174.125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4823.18 via Frontend Transport; Wed, 22 Dec 2021 22:05:56 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Wed, 22 Dec 2021 16:05:55 -0600 From: Andrey Grodzovsky To: , Subject: [RFC v2 3/8] drm/amdgpu: Fix crash on modprobe Date: Wed, 22 Dec 2021 17:05:01 -0500 Message-ID: <20211222220506.789133-4-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211222220506.789133-1-andrey.grodzovsky@amd.com> References: <20211222220506.789133-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: bbcfb6b1-13ae-419d-84b6-08d9c5973b04 X-MS-TrafficTypeDiagnostic: BYAPR12MB2776:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:3383; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xqXq/zbNBu/XS3PExCgHbjJ9ZTEjUxHO7AQdu5rWj/mAu+35TKjZacXsrMN3Yiu8sqJ1ZSRRjUxa+6W3xESKzPioU+EtF4gKtZKnVPn8aexnp9aKQoA2NK2LrBQSs1W9Ye0HHoBc7F5Rje2UsYHLTwnN2dtvxdMJCyDNPUZQu+FOR8cUqH6G4xvYL89eq12WiscNC3G95BgF0t8pWYNpFB0jhT62ZPxjCRJ0NqKUXCz0VyYahtu0OxyOfkWkKLoT00rvcSbKpPwg/BF1KkCUOcuT1b+Bv2gvHuqPD69StOin6m5cipODvQS4YgMdRHFMHrpCoq1ftZ7jJWa0XKRZwRFpj5mwIMSpLGfYYcZ7FM82bABNGiIoHMpFXhnz6JgmPxCwGFGb6ds4scy7sDKsHwQphS8d6Xi++dGdlGaM+mel6UTKfayCMLEMt6+FO07Psp9+QhOhNWzYxfV7flspNmuCxYNsxQrK46BlfDiK7Wnt9Uyz8JMwBYONbcQybn9oyn5OZD6+yJwuilO+gIj4BQiJQ4sGtULswAPRjBHx8l8bN8mF2KG2JCPw6neC3f2llNW3eteIuIagIx7gd7QwQ70NiL3fkaWUYM3r9fdjCEEOx2T/CDDzVpkxDzDrBBZ6s4HXzqO6igqfBmwfCISY/m/uQRyM5qnextwd4opb+s++HwRSeo4w9bxKmLl8g/MeyP+wVvLM2ahva+3Rup779y9UON5zDgGmSupabREQwtkDUSlZVHaeUkfmcrCeFN9Xen8XlV02gGchmbwxsySE/jeprA95aP4l6VWXFvm/57s= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(36840700001)(40470700002)(46966006)(36756003)(2616005)(8936002)(86362001)(83380400001)(4744005)(508600001)(426003)(336012)(2906002)(47076005)(6666004)(7696005)(36860700001)(316002)(110136005)(4326008)(82310400004)(40460700001)(5660300002)(70206006)(44832011)(54906003)(70586007)(16526019)(1076003)(26005)(81166007)(8676002)(186003)(356005)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Dec 2021 22:05:56.9115 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: bbcfb6b1-13ae-419d-84b6-08d9c5973b04 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT030.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB2776 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 5527c68c51de..8ebd954e06c6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) { + if (adev->in_suspend && !ring->no_scheduler) { drm_sched_resubmit_jobs(&ring->sched); drm_sched_start(&ring->sched, true); } From patchwork Wed Dec 22 22:05:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12697348 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 39B84C433F5 for ; Wed, 22 Dec 2021 22:06:10 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4550F10E36C; Wed, 22 Dec 2021 22:06:03 +0000 (UTC) Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2042.outbound.protection.outlook.com [40.107.220.42]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6EDCB10E358; Wed, 22 Dec 2021 22:06:00 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AouePaZimgzTkaA8wMYX1HFeho7B69G5Jz5Ob5zJCXBlQWl756BbSazDINPmL4RYL8m6BFK4mRWvqYaxa5R43FeOZKcLWLlQH5Sy6JxUGT58jmx6qtL0fQPV1t07GXtN3IAjxz2uRB5z1LeAUyvreu00xRmGQBSMDRhAepckg//ASqRiCL0pxTmDK4AhWMm/xqHXviwW8qIMIm7uJYWXSrqb1kMOPz17oDRDpzdqXlVduS7hyUzt0NL5jFSi9+3cfSPF6P6YTo0HHjPwU6x77GK/3VsxoYP04ZV5Shau6epCDzuN8Nis4mVSYu62nOiG+1hcVv1oex6xNQGv6uGBwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hyA1VG+BInqqmQxbYmju2Jwx5aQAVEWxUBLEsgB3INo=; b=SdUv/YKozBpwqdRl1hXzxU9j44Xr1gMKYxnKCtZUMzOLfFt9SUVGPLT6Q3G49fdQoEVdzjdOSKSoMr6UQGSfA3keibD4EeG0u4qhtb3OJbcwSw8/L0h91GEMNzC7Mhc+tGsqSHxYjMEdH83EmSQh7cdV84HqYaKduywUiEX0wIH9VixEDSr2bEk1hNxbdbxuiM2wbzlN5QiGiZLEoJsbY57MobmQqVcXPCKHo639Bmah2Y5qTjbosYy+oLgJ6R2s3HvzEw2uhm4jQTU7V9YXhOGPY3+GjIwdts5ixXUHXA/c3/8W4GCQxOtZBCTyL4FRo5uqSg8gqTneNTNs7ICVzA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=hyA1VG+BInqqmQxbYmju2Jwx5aQAVEWxUBLEsgB3INo=; b=jVzN8dQwWpsdpTBsltQguy4VoDrJf86GuXo1C6GtBVdo+fOfKhyRpcUIFpZVLSJQrsD+Cb+q5AclukFCjidYPuD9wp+74n2/PBJc7MQ+uz6eIBeKIdOe3STLd4FnB6BSKlElVcQCEkkh+nQVmO0/S4j+wFK0Eo63qRmAC2mRIFA= Received: from CO2PR04CA0203.namprd04.prod.outlook.com (2603:10b6:104:5::33) by DM6PR12MB4484.namprd12.prod.outlook.com (2603:10b6:5:28f::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.19; Wed, 22 Dec 2021 22:05:58 +0000 Received: from CO1NAM11FT030.eop-nam11.prod.protection.outlook.com (2603:10b6:104:5:cafe::28) by CO2PR04CA0203.outlook.office365.com (2603:10b6:104:5::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.18 via Frontend Transport; Wed, 22 Dec 2021 22:05:57 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1NAM11FT030.mail.protection.outlook.com (10.13.174.125) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4823.18 via Frontend Transport; Wed, 22 Dec 2021 22:05:57 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Wed, 22 Dec 2021 16:05:56 -0600 From: Andrey Grodzovsky To: , Subject: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs Date: Wed, 22 Dec 2021 17:05:02 -0500 Message-ID: <20211222220506.789133-5-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211222220506.789133-1-andrey.grodzovsky@amd.com> References: <20211222220506.789133-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: da855d0d-ea99-4bc5-05eb-08d9c5973b8a X-MS-TrafficTypeDiagnostic: DM6PR12MB4484:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:6108; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: pGNbYsEsZuBdRDgqzklT5reNyh/fIHxLTvH60CKX0IErdJzYIg4EDvMjNdBRYCNRqj1+8R61h6RbAIskoackOH4/g7xy62jwY/zJF5mNPCCmahgtZavJ2U5vs2g38lxCnCJInSfi6SDY5jlitWOxW+ZwYjE8SmcNa4FZMuNcQpxwwM+JhnIxVd4B70U4+GGNvHRaqhs01XrNVoibEdyYcuJBOtAw1l1QCMS2nhbpOla3OcVL/n1Niu2JKeMLKIH7RDNywHxagfNmlxT7DdneZJav7CZQleCy45Vx60+LBkA6RF6MP3MHktbiSin62OHkwD7/hKUAUZfs1o+4dMqr03NgcumE8R1LcRz458Q+EWtUi0ttR/0vUlT4UR+7JIa9FL7F3cYYjua6PwkrKC/pnt1h9dJ0nU78hEgjKUjBTxpaf+O1G+xMkmQzRkFk4oPVjtTEWNgUH0EkbkqH+i5tcyq+UE6kLLb6BxknMsOQ/Up40oWBfXo98MNYxmnqCHOIb/dc+y7Ze70XxKEaRrWUcU0sVqBB9n4QtetWeRx9JkE/lFS91Ulo25hM2VNHj/vzT52NxHmmK1jqk6s9+BTAs07NdxmCmz+JUFbQuPAmdcBh7jtwvIuFpTqRMRmDrOOEivYM4yODAWiln7f/TEH8DH27elvFiLiHrH9gFyOzyB6Ren/9XAA/regOaHZPkR9p3DSVTt3+KtMPhABkx3MQytaUON1o7uzIajr1STA+NKVnWStMyr4HNWAw2agVN0waCGvsk2p08ZL2hELL1MBmDW2Il02wykCqOAnE/hh0bOw= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(40470700002)(46966006)(36840700001)(47076005)(54906003)(6666004)(8676002)(110136005)(82310400004)(2906002)(40460700001)(36860700001)(316002)(86362001)(2616005)(44832011)(508600001)(81166007)(70206006)(4326008)(1076003)(5660300002)(7696005)(16526019)(426003)(70586007)(336012)(186003)(8936002)(36756003)(356005)(26005)(83380400001)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Dec 2021 22:05:57.7708 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: da855d0d-ea99-4bc5-05eb-08d9c5973b8a X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT030.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4484 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish. v2: Rename to amdgpu_recover_work_struct Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- 3 files changed, 35 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b5ff76aae7e0..8e96b9a14452 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev); bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev); int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job* job); +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, + struct amdgpu_job *job); void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); bool amdgpu_device_need_post(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 7c063fd37389..258ec3c0b2af 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs( * Returns 0 for success or an error on failure. */ -int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, struct amdgpu_job *job) { struct list_head device_list, *device_list_handle = NULL; @@ -5237,6 +5237,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, return r; } +struct amdgpu_recover_work_struct { + struct work_struct base; + struct amdgpu_device *adev; + struct amdgpu_job *job; + int ret; +}; + +static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) +{ + struct amdgpu_recover_work_struct *recover_work = container_of(work, struct amdgpu_recover_work_struct, base); + + recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); +} +/* + * Serialize gpu recover into reset domain single threaded wq + */ +int amdgpu_device_gpu_recover(struct amdgpu_device *adev, + struct amdgpu_job *job) +{ + struct amdgpu_recover_work_struct work = {.adev = adev, .job = job}; + + INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work); + + if (!queue_work(adev->reset_domain.wq, &work.base)) + return -EAGAIN; + + flush_work(&work.base); + + return work.ret; +} + /** * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot * diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index bfc47bea23db..38c9fd7b7ad4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) ti.process_name, ti.tgid, ti.task_name, ti.pid); if (amdgpu_device_should_recover_gpu(ring->adev)) { - amdgpu_device_gpu_recover(ring->adev, job); + amdgpu_device_gpu_recover_imp(ring->adev, job); } else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev)) From patchwork Wed Dec 22 22:13:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12697350 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 24888C433F5 for ; Wed, 22 Dec 2021 22:14:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7AD8210E37E; Wed, 22 Dec 2021 22:14:23 +0000 (UTC) Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2082.outbound.protection.outlook.com [40.107.92.82]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5353710E377; Wed, 22 Dec 2021 22:14:21 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iSueTYd6rd0rMufJXA9s8l6iB7B4hljY1E5WQrPrfT7zDUk5dAGPhnDvi/b9aUomEQCwNK8ba9X4O9WUf5zydNRdI0Sq72KkMDiyjExD+AXISk2tW209rtY3HI5/GuER1bFNqwIgs1qI4le7XOprnuwB7bEmjTM65IdEN4lkFW7ffU2Oe2Z/jH3FX+ihliu0MzYMfEJ9547dVABjNsw76qph7ETAOnDyXJCl3OhSJLT+EgXjLxssmPK3Xu+yQz1RuU93dDlrZg6DsFlQrshlMRs54SldcP0fwoB2bCSdS7I5K/+UCNkvPjwXoVmd80yOXtQhscbRVVt5hd6tGPeIsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qwXAtj3obP/VPC6vnQpjR/+tsahOlycfJQ/OVEBchKw=; b=EYwv3UoWJQP+2212svG5NpT1Ect1wpdnnEWp+2iYlgi2E9fVftlIleBhUBV7VO4D2bEV1pap05ZCzL5slYwCwaY6igdSJvo5zafB1MFtXEy3kisw4sAW7n7YiaGtGqdtzJnJWqxxkX+v2ndXPlRx4MnNjiLNaJVRt0U95mvZFgarU6BdhA0/5qTRTohDOjfy0gCh0J1SzaLlLqGvVlW7b9K+5S9FgBLo9nC8vbtMinsBGc95sN7bz22K7cT1oyRj71lNMTB5ZMTSB+yC/aGwI/hGQKd/YUsLG77U6SQiMxbXsW3xDDiERIu1LC0cA8NLxFe2HS1N1xzZ9sjq2GO5Vw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qwXAtj3obP/VPC6vnQpjR/+tsahOlycfJQ/OVEBchKw=; b=oHsZdMXtZYbwLAemoG1G6DwvT7pfC3dLVk8mkE2tqi7JG1kp0xSnhjOqo0azpK61ARv1yQWamcCVUkTOoiK9T53qwDmPxOvIxAeT8kwp2AzxKaEaOgob1WaFU2Mdo9KaQsbswpD7F2OuYGLFiMTKj7PXg/W0SerCtqBycUuhD7Q= Received: from MWHPR18CA0059.namprd18.prod.outlook.com (2603:10b6:300:39::21) by DM6PR12MB4186.namprd12.prod.outlook.com (2603:10b6:5:21b::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.18; Wed, 22 Dec 2021 22:14:18 +0000 Received: from CO1NAM11FT057.eop-nam11.prod.protection.outlook.com (2603:10b6:300:39:cafe::9) by MWHPR18CA0059.outlook.office365.com (2603:10b6:300:39::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.14 via Frontend Transport; Wed, 22 Dec 2021 22:14:18 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1NAM11FT057.mail.protection.outlook.com (10.13.174.205) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4823.18 via Frontend Transport; Wed, 22 Dec 2021 22:14:18 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Wed, 22 Dec 2021 16:14:16 -0600 From: Andrey Grodzovsky To: , Subject: [RFC v2 5/8] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue. Date: Wed, 22 Dec 2021 17:13:57 -0500 Message-ID: <20211222221400.790842-1-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211222220506.789133-1-andrey.grodzovsky@amd.com> References: <20211222220506.789133-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 9f5e6450-e1e5-4d49-0f15-08d9c59865c0 X-MS-TrafficTypeDiagnostic: DM6PR12MB4186:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:245; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: kI+koM2aev1wmvITzwPbZ6PSXo3dmjsr99FsyBocDCn03UJGS1yG8Z6oS1ROXhhfZOFs6vs3yFcxEv33Y05NX6/dmJkhGVqbcvl/wsquLcPJuZg9nh5URFEqJPn2cBP/D6qOSgR14DGVfaDqLKJ6UKVQlnhUnxC7SSdahklB9kOvc1+wkSZ3oJqL5Yh41LsUF9EtC2Rw88r0Nw8UgH+xfAB1Ott6ELzmO4I5NB4dqeF0VpBVVgRVHeDBFCTqCs5oWpC0WAta0OEZBYZ20JkhvwsdY3pRRm2gE9EM3HiFOvXogcFkNJOkytogIVTUVVyvosDbrmpOyzHyeh9JccNB2/EAbIhgwNVdhvzzyiqAHgWBFCU4QLNVjEF7AtxpTupgjBrayHrvKHIes/YbaDDt04Lq5UdY2itTtK7DzLKN40B5k+oWmdAVXQhGP6kL38JZe3WgL+y8p7IWah+2ieVK4oFrdZSMGTJ+QiApistM5l8ylfKozJ5r78etRmq000dfXUG+x8DVU5kfEc/Yt9A5ls0tWHHAVu7x3SrlsM7MojkL7JVKMQTgCQWpuL/WttrWig9WmlfHzuHSnsEzJHABGp8wEaoqp6V7gM99gXgG7OHcJ3lZxM6GACmqiZhi/S04OscvaaPZwuHqZCaSEBkabL5yTjKUDs7BcrNyFBMw73ULWW4aRESmkKGrwfsnHACMKcpDZIHvDo2TTqb+TlxmCH0r45TD+RH4RA8aLdBJMnkqy87lcqkFjatJFGdfayhfy4MXa8fbOdAMp6nqQaXqpChhbrRp9+yZXJLpabhi/Iw= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(40470700002)(2906002)(8676002)(316002)(70586007)(8936002)(81166007)(110136005)(40460700001)(70206006)(36860700001)(47076005)(6666004)(54906003)(356005)(83380400001)(86362001)(82310400004)(44832011)(426003)(4326008)(16526019)(186003)(36756003)(2616005)(26005)(7696005)(508600001)(5660300002)(1076003)(336012)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Dec 2021 22:14:18.0859 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9f5e6450-e1e5-4d49-0f15-08d9c59865c0 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT057.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4186 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: horace.chen@amd.com, christian.koenig@amd.com, Monk.Liu@amd.com, Liu Shaoyun Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" No need to to trigger another work queue inside the work queue. Suggested-by: Liu Shaoyun Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 7 +++++-- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 7 +++++-- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 7 +++++-- 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c index 23b066bcffb2..487cd654b69e 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c @@ -276,7 +276,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) if (amdgpu_device_should_recover_gpu(adev) && (!amdgpu_device_has_job_running(adev) || adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT)) - amdgpu_device_gpu_recover(adev, NULL); + amdgpu_device_gpu_recover_imp(adev, NULL); } static int xgpu_ai_set_mailbox_rcv_irq(struct amdgpu_device *adev, @@ -302,7 +302,10 @@ static int xgpu_ai_mailbox_rcv_irq(struct amdgpu_device *adev, switch (event) { case IDH_FLR_NOTIFICATION: if (amdgpu_sriov_runtime(adev)) - schedule_work(&adev->virt.flr_work); + WARN_ONCE(!queue_work(adev->reset_domain.wq, + &adev->virt.flr_work), + "Failed to queue work! at %s", + __FUNCTION__ ); break; case IDH_QUERY_ALIVE: xgpu_ai_mailbox_send_ack(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c index a35e6d87e537..e3869067a31d 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c @@ -308,7 +308,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work) adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT || adev->compute_timeout == MAX_SCHEDULE_TIMEOUT || adev->video_timeout == MAX_SCHEDULE_TIMEOUT)) - amdgpu_device_gpu_recover(adev, NULL); + amdgpu_device_gpu_recover_imp(adev, NULL); } static int xgpu_nv_set_mailbox_rcv_irq(struct amdgpu_device *adev, @@ -337,7 +337,10 @@ static int xgpu_nv_mailbox_rcv_irq(struct amdgpu_device *adev, switch (event) { case IDH_FLR_NOTIFICATION: if (amdgpu_sriov_runtime(adev)) - schedule_work(&adev->virt.flr_work); + WARN_ONCE(!queue_work(adev->reset_domain.wq, + &adev->virt.flr_work), + "Failed to queue work! at %s", + __FUNCTION__ ); break; /* READY_TO_ACCESS_GPU is fetched by kernel polling, IRQ can ignore * it byfar since that polling thread will handle it, diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c index aef9d059ae52..23e802cae2bb 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c @@ -521,7 +521,7 @@ static void xgpu_vi_mailbox_flr_work(struct work_struct *work) /* Trigger recovery due to world switch failure */ if (amdgpu_device_should_recover_gpu(adev)) - amdgpu_device_gpu_recover(adev, NULL); + amdgpu_device_gpu_recover_imp(adev, NULL); } static int xgpu_vi_set_mailbox_rcv_irq(struct amdgpu_device *adev, @@ -551,7 +551,10 @@ static int xgpu_vi_mailbox_rcv_irq(struct amdgpu_device *adev, /* only handle FLR_NOTIFY now */ if (!r) - schedule_work(&adev->virt.flr_work); + WARN_ONCE(!queue_work(adev->reset_domain.wq, + &adev->virt.flr_work), + "Failed to queue work! at %s", + __FUNCTION__ ); } return 0; From patchwork Wed Dec 22 22:13:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12697352 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4A859C433EF for ; Wed, 22 Dec 2021 22:14:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5C8FE10E388; Wed, 22 Dec 2021 22:14:27 +0000 (UTC) Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2087.outbound.protection.outlook.com [40.107.236.87]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4525510E37D; Wed, 22 Dec 2021 22:14:23 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jn/TLCthmY0pm1axmAVA9cA0UZ9Nf37tak2470+Sgd683Fe5FPukco6U0aHhf2kU0t58Z/Izl+DvtQTx2zu/n1deafeJFKMZXE6lstIzeCi+MzwLtofV3bskNRk42oNmKRlMJ/iGXgJO0IuDLSkTbVwO3BSNHJ3YCQLnJzmtGlksNIqG33wNlg1YkSRQc8Y+VjlfXly1ySXEotbQdocz1IVIU7uwFeozGsWQLUjGOcL2SpSOBEz1YWZwOy+dmmy89a93zp3UiQQwJK8GxKFgDSRjMg27R8WfYaNcUDIBKE9YoKCExtDz9jzG2C2aY70FdXY7A8YtMv81i4/O6p/wDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KhXyC0flfwv6bNeZ29uT/mET9BrHzj+KxClAUd2H4Vw=; b=KiJrO1SerdZXN8Qsgu1Tpfx2ju770+FRfrH1mMXvlxOFP+oPCc3g/t/vQS8jqIn8snSduRErLDiPzWClQgDr61NqKO41rJwViwD9XJ91uXld9tzNB2X5Koo2uqJPjV6u/SKOZArL4021UPQIN++HWhDdyLwurTJhxjwSCWMxM7EwkU84U+2HxPhOOzQxUrDTIvNB8S3VuoKvJ1rcyaBI2XkE3HUG0sQaZ0kNlW5l2kRPr1QdsLl2fjJSsOXovouQwYSCVD3APnPmFbNjkhATJmE22xfr/9PGJPvmZPDQbvVjkSidDrdLTVXK8r5C2javla2dUFdpI8qOp8Z8TXaqvA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KhXyC0flfwv6bNeZ29uT/mET9BrHzj+KxClAUd2H4Vw=; b=cnyGEqWebgo8yWz3uCqW+XEVzmLVnCjgReSvwM2DMKgf1EdPHFV0Jj6sWosMiiLUTdxyNMt7Zq4B3iU2UmCodgjHx7FlalNdqF9QM20fHuC/fO0UNPZj8KtPrQ23HtKNsG+xDtPnybSSxMq9zcISwy5zHCctLunDThLcWEuO+VI= Received: from MW2PR16CA0039.namprd16.prod.outlook.com (2603:10b6:907:1::16) by CY4PR12MB1446.namprd12.prod.outlook.com (2603:10b6:910:10::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4823.19; Wed, 22 Dec 2021 22:14:20 +0000 Received: from CO1NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:907:1:cafe::4a) by MW2PR16CA0039.outlook.office365.com (2603:10b6:907:1::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.20 via Frontend Transport; Wed, 22 Dec 2021 22:14:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1NAM11FT060.mail.protection.outlook.com (10.13.175.132) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4823.18 via Frontend Transport; Wed, 22 Dec 2021 22:14:19 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Wed, 22 Dec 2021 16:14:18 -0600 From: Andrey Grodzovsky To: , Subject: [RFC v2 6/8] drm/amdgpu: Drop hive->in_reset Date: Wed, 22 Dec 2021 17:13:58 -0500 Message-ID: <20211222221400.790842-2-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211222221400.790842-1-andrey.grodzovsky@amd.com> References: <20211222220506.789133-1-andrey.grodzovsky@amd.com> <20211222221400.790842-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 85c5a21a-89ce-48d2-60c5-08d9c59866cc X-MS-TrafficTypeDiagnostic: CY4PR12MB1446:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:923; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: TwZ5yqIIMkEoTT5LXuJGf14jozV/k6sYuhvkRkON6uduEFBLrj3jlI2oK52Iyfx9+fByYWroCvNRTDGJgieDJxnSbVZ6+00NM5d25v+tbBUjrVUViRJ1hUY5V5HJcU4x/TblUEjlymSxrF821PW8e/tXr5WhnEtVBI19m3ZVBBKEPN5BnV/EuCAbbYf/rU04eLJ3tYaBzAuge5B0TweMBuSX77FR0N+W+Hi/9PSyLGFgdIkvNsEdHy92Vrj6XuWsSnJL8gkYHAvfP114dS4SJEr/EV15k+Q5NHYTMjq1MGmaMEf5mbxJ9sqndHGKgCO7UrZBTzdk5MzgtGbSLlQZNGREkOdlRWYrbtI1YxHaRgVx+g1NWmWJDkB+n0jY7sARl8pFZtS6lKbS5LZCR2VcJxx5XBJCUoT3WmDB/10ph6TPKvYb4Bh41ejFlq7jGamzVQ6D7/hSfJOhuzDJMh2mnzUXjnLYILXCjYorQCMHB4rWwbmhCfCMr5/HtGgW9XkNYWUWE7iOSCAfZ/5TN+Py9yLcKMM1kfAoJZeS0aItaE84mROuNLJmZ8fxLh0H6Gre/RP90/DxFQST7ewJ6QurVOOv+1mv7V/mb6S/NQpVIchu7xZUvgH1AkT1y9YVWjsTKUIxDZo3aa6swYdoR5rdQjcgOnoUKFTrxDQKenFDmH+3X4DA/63rXlSpBp2dN8DEQYnVZCSN9wLULh/Y9rlC+maTAXw0lyVXw8knnnKM8iFg1QvHP4CYxRG9tzegjEHuGbikFR/j1NXl14cCB7iUCkuE3QUm0P9zsbnPxXY1S/E= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(40470700002)(46966006)(36840700001)(54906003)(508600001)(81166007)(36756003)(7696005)(110136005)(6666004)(66574015)(40460700001)(82310400004)(356005)(83380400001)(4326008)(70586007)(336012)(426003)(44832011)(26005)(1076003)(86362001)(16526019)(36860700001)(5660300002)(70206006)(316002)(2906002)(8676002)(8936002)(186003)(2616005)(47076005)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Dec 2021 22:14:19.8444 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 85c5a21a-89ce-48d2-60c5-08d9c59866cc X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR12MB1446 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Since we serialize all resets no need to protect from concurrent resets. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +------------------ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 - 3 files changed, 1 insertion(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 258ec3c0b2af..107a393ebbfd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5013,25 +5013,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, dev_info(adev->dev, "GPU %s begin!\n", need_emergency_restart ? "jobs stop":"reset"); - /* - * Here we trylock to avoid chain of resets executing from - * either trigger by jobs on different adevs in XGMI hive or jobs on - * different schedulers for same device while this TO handler is running. - * We always reset all schedulers for device and all devices for XGMI - * hive so that should take care of them too. - */ hive = amdgpu_get_xgmi_hive(adev); - if (hive) { - if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) { - DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", - job ? job->base.id : -1, hive->hive_id); - amdgpu_put_xgmi_hive(hive); - if (job && job->vm) - drm_sched_increase_karma(&job->base); - return 0; - } + if (hive) mutex_lock(&hive->hive_lock); - } reset_context.method = AMD_RESET_METHOD_NONE; reset_context.reset_req_dev = adev; @@ -5227,7 +5211,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, skip_recovery: if (hive) { - atomic_set(&hive->in_reset, 0); mutex_unlock(&hive->hive_lock); amdgpu_put_xgmi_hive(hive); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index a858e3457c5c..9ad742039ac9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -404,7 +404,6 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) INIT_LIST_HEAD(&hive->device_list); INIT_LIST_HEAD(&hive->node); mutex_init(&hive->hive_lock); - atomic_set(&hive->in_reset, 0); atomic_set(&hive->number_devices, 0); task_barrier_init(&hive->tb); hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index 6121aaa292cb..2f2ce53645a5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -33,7 +33,6 @@ struct amdgpu_hive_info { struct list_head node; atomic_t number_devices; struct mutex hive_lock; - atomic_t in_reset; int hi_req_count; struct amdgpu_device *hi_req_gpu; struct task_barrier tb; From patchwork Wed Dec 22 22:13:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12697353 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 35255C433FE for ; Wed, 22 Dec 2021 22:14:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0975F10E381; Wed, 22 Dec 2021 22:14:36 +0000 (UTC) Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2063.outbound.protection.outlook.com [40.107.243.63]) by gabe.freedesktop.org (Postfix) with ESMTPS id 81FBC10E3BC; Wed, 22 Dec 2021 22:14:28 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cgtDs10QlNkxw+uYsYIXCnB/bXUxO9htg1siijnvFlUvcSdjH4HyLwBrDhPPo2iRW2G8Cp3OJpAmckSSiZFUFrbxDzlfo0Tly1XwuRa/aboBaHpHIrHticE8opZMwNKjWInbhAD2tN5gCOGiSLLc7w0xpEOYY9PZOsk0faY5Yf21bye6sTjfmzchg0NE4joUTJabxDIa0nrm25D77DtZZZmyBMd5i771XT2vr4mNKtX/TuS+bXGry0QRrmuu7NT3wjtyYz/+lYUjM4+Nle3dRj7Cj2d4E+IWqJA/HcX84jnsIa1k+3egKihE1uwUdOq4ZT6RS+CeITiOf84oGQPtNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=VrOtf4AlwuVvJ7fN764PzPVSU91mOZX6pPUozlHxkB4=; b=QJ4atWPi+VFj4IpA4YC7uFh0tyZ5aKASyJWAoePry3uHUCQcaryzPznbYtD8TDl8PJ6QGCN6m+oaS54lkWXp+5N+4pcq2LuP4H/a3TCWo7OMfkeC0/FQhPW3qde1one/mHnDPc4OvsOgQG/DsQkBBZc4m+2m3vYDV66UqEj/Zpq/eKvTUEOeovnYtmHAoD+jtcMkcXbh38qe/WeHl8BzSEwPF/1Vv7hUhFUiATmD/ey0I8wQ16a+uOGPSCeZ+1vC1qcakLOPoQ2bJMk55XfKV4WTfVGn4SB80hlc0YaDOI+7p407KcPwJSCWLdnyvOOL7qgnjokfQixkUNTAvhd+Gw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VrOtf4AlwuVvJ7fN764PzPVSU91mOZX6pPUozlHxkB4=; b=RASdFNvN7pBUPsDW3tVt577hbQl4Hse3lvx1FmcyoasCGv79y8pe6dBNcgAH/Jbwzi1zSbfET6TskjO8qVOFxLM5u0qhZGJ1/ZaTEjZOOJaZzj9oXEXJ5V3USrnR3kagD+tPVNZn8TjtdXbdktih4K5PtjaNS+2fyjq3KGifLdg= Received: from MW2PR16CA0068.namprd16.prod.outlook.com (2603:10b6:907:1::45) by BN6PR12MB1940.namprd12.prod.outlook.com (2603:10b6:404:fd::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.17; Wed, 22 Dec 2021 22:14:21 +0000 Received: from CO1NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:907:1:cafe::ee) by MW2PR16CA0068.outlook.office365.com (2603:10b6:907:1::45) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.20 via Frontend Transport; Wed, 22 Dec 2021 22:14:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1NAM11FT060.mail.protection.outlook.com (10.13.175.132) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4823.18 via Frontend Transport; Wed, 22 Dec 2021 22:14:20 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Wed, 22 Dec 2021 16:14:19 -0600 From: Andrey Grodzovsky To: , Subject: [RFC v2 7/8] drm/amdgpu: Drop concurrent GPU reset protection for device Date: Wed, 22 Dec 2021 17:13:59 -0500 Message-ID: <20211222221400.790842-3-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211222221400.790842-1-andrey.grodzovsky@amd.com> References: <20211222220506.789133-1-andrey.grodzovsky@amd.com> <20211222221400.790842-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 865aa01c-4653-40d0-326f-08d9c5986741 X-MS-TrafficTypeDiagnostic: BN6PR12MB1940:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:8882; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: XDG4CW37qWZ6PvzH+VYPOnAz0HFkvjvE3gXl/8kGO8O+wdPxI3UeIkjrH2aALrw7RpjlNETLWNJvqKOAeQMt9T5ud0LBiip2y3QxlBvmIBMsB0AZ0G/v0VPc2YE4v23IugZ+f+qq1P8YkkVvh782JwmfzOOKnHokBSkWfj5ppwIC4CanQy1JoDey+mDPZF0DhedLrOsfqZSjYvmqkawd0lZYQBWuLqOGr5sJSa9Y72u/VNpEYGaWHFdPLhaeKrZ6CU9x6LVCvnopbTBzxa0fdRXXPaRfhPc7B8xs95blxkfKI1GR/L0SseOdqH9exXgf6Jgxt0gKn6ZsroIyg02OSRDUutiRjWp4w3SxaopLfUxqdDezMEfXpIF39T1q1kJ+2KRAx7sBPka5GYFH7l+y7nwpP37LXvabtgWaZolLibhPIw+iDSrM701CQGHYE0lzMT1z+IwrE4Lo1v1ejVaRsfc2YAug8VOSiRFsQvbNaJbE9HHx9XrXmBOTe5ZLJR0zIzM4q4MSDwNR/6/Z7NCCdgEI620KBV4shlNrA1t4YxW6YaFg2zH9C4tjSEs8TDpRwFIef+mS4y+qDnXW/YraaBbKqdCN0Hq0X9Uf2/LABd1iM4XoQYfY4w5S9CvgjoY9D/uXdNQWL14NcaNv1r6wffhwCqj2zedyw39p46Q8K5ECtf8S8+6HRyQLi0eNj/+HHOLlcvGJj84J4kiRKUxoYaD9S24/CKUP0BPuvMum/6Fv1vf/h8bxhEwy/DUTsNwzqh1pWLbKUX0KmTVQTbAreq0Zsv5Xkqh0zsjaErAiuZ0= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(36840700001)(40470700002)(46966006)(508600001)(7696005)(2906002)(16526019)(356005)(83380400001)(81166007)(36860700001)(186003)(86362001)(40460700001)(26005)(8936002)(44832011)(66574015)(8676002)(54906003)(316002)(336012)(2616005)(426003)(82310400004)(110136005)(6666004)(36756003)(1076003)(70586007)(5660300002)(70206006)(4326008)(47076005)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Dec 2021 22:14:20.6100 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 865aa01c-4653-40d0-326f-08d9c5986741 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR12MB1940 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Since now all GPU resets are serialzied there is no need for this. This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout' Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 89 ++-------------------- 1 file changed, 7 insertions(+), 82 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 107a393ebbfd..fef952ca8db5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4763,11 +4763,10 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, return r; } -static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, +static void amdgpu_device_lock_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) { - if (atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) != 0) - return false; + atomic_set(&adev->in_gpu_reset, 1); if (hive) { down_write_nest_lock(&adev->reset_sem, &hive->hive_lock); @@ -4786,8 +4785,6 @@ static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, adev->mp1_state = PP_MP1_STATE_NONE; break; } - - return true; } static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) @@ -4798,46 +4795,6 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) up_write(&adev->reset_sem); } -/* - * to lockup a list of amdgpu devices in a hive safely, if not a hive - * with multiple nodes, it will be similar as amdgpu_device_lock_adev. - * - * unlock won't require roll back. - */ -static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) -{ - struct amdgpu_device *tmp_adev = NULL; - - if (adev->gmc.xgmi.num_physical_nodes > 1) { - if (!hive) { - dev_err(adev->dev, "Hive is NULL while device has multiple xgmi nodes"); - return -ENODEV; - } - list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) { - if (!amdgpu_device_lock_adev(tmp_adev, hive)) - goto roll_back; - } - } else if (!amdgpu_device_lock_adev(adev, hive)) - return -EAGAIN; - - return 0; -roll_back: - if (!list_is_first(&tmp_adev->gmc.xgmi.head, &hive->device_list)) { - /* - * if the lockup iteration break in the middle of a hive, - * it may means there may has a race issue, - * or a hive device locked up independently. - * we may be in trouble and may not, so will try to roll back - * the lock and give out a warnning. - */ - dev_warn(tmp_adev->dev, "Hive lock iteration broke in the middle. Rolling back to unlock"); - list_for_each_entry_continue_reverse(tmp_adev, &hive->device_list, gmc.xgmi.head) { - amdgpu_device_unlock_adev(tmp_adev); - } - } - return -EAGAIN; -} - static void amdgpu_device_resume_display_audio(struct amdgpu_device *adev) { struct pci_dev *p = NULL; @@ -5023,22 +4980,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, reset_context.hive = hive; clear_bit(AMDGPU_NEED_FULL_RESET, &reset_context.flags); - /* - * lock the device before we try to operate the linked list - * if didn't get the device lock, don't touch the linked list since - * others may iterating it. - */ - r = amdgpu_device_lock_hive_adev(adev, hive); - if (r) { - dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress", - job ? job->base.id : -1); - - /* even we skipped this reset, still need to set the job to guilty */ - if (job && job->vm) - drm_sched_increase_karma(&job->base); - goto skip_recovery; - } - /* * Build list of devices to reset. * In case we are in XGMI hive mode, resort the device list @@ -5058,6 +4999,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, /* block all schedulers and reset given job's ring */ list_for_each_entry(tmp_adev, device_list_handle, reset_list) { + + amdgpu_device_lock_adev(tmp_adev, hive); + /* * Try to put the audio codec into suspend state * before gpu reset started. @@ -5209,13 +5153,12 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, amdgpu_device_unlock_adev(tmp_adev); } -skip_recovery: if (hive) { mutex_unlock(&hive->hive_lock); amdgpu_put_xgmi_hive(hive); } - if (r && r != -EAGAIN) + if (r) dev_info(adev->dev, "GPU reset end with ret = %d\n", r); return r; } @@ -5438,20 +5381,6 @@ int amdgpu_device_baco_exit(struct drm_device *dev) return 0; } -static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) -{ - int i; - - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { - struct amdgpu_ring *ring = adev->rings[i]; - - if (!ring || !ring->sched.thread) - continue; - - cancel_delayed_work_sync(&ring->sched.work_tdr); - } -} - /** * amdgpu_pci_error_detected - Called when a PCI error is detected. * @pdev: PCI device struct @@ -5482,14 +5411,10 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta /* Fatal error, prepare for slot reset */ case pci_channel_io_frozen: /* - * Cancel and wait for all TDRs in progress if failing to - * set adev->in_gpu_reset in amdgpu_device_lock_adev - * * Locking adev->reset_sem will prevent any external access * to GPU during PCI error recovery */ - while (!amdgpu_device_lock_adev(adev, NULL)) - amdgpu_cancel_all_tdr(adev); + amdgpu_device_lock_adev(adev, NULL); /* * Block any work scheduling as we do for regular GPU reset From patchwork Wed Dec 22 22:14:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12697351 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 76713C433F5 for ; Wed, 22 Dec 2021 22:14:34 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1035110E387; Wed, 22 Dec 2021 22:14:26 +0000 (UTC) Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2065.outbound.protection.outlook.com [40.107.244.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8DF3F10E37D; Wed, 22 Dec 2021 22:14:24 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RhaDuc953YReVDGSpG+l0P3XG+JUVOJNSh2a5rJ4dff3QUdeoGFOfNZY/NXggRWgsl2dZLSHppXsFuWUbqC9Dt7kU/rFUq5HDq3GNOYWfsBkCXYANdyIqxYl85GpIvcbv5YK6dA07ssEW8Y2VEWDUiehBQllL4HYxn9H5DnZdcM40Jbrb6ls+VHezGl/XhfJoD4dV47bvtTQqvsH6tosJLxYh33pocjNhkF723mEcHIAeuUO4RN/TRy0dx+e00BQlE5s/+ze2Q6d+MLA3E5ErutOYAMWI8e5q81ZSRts+1ciexqs2uhHVrHIcB0CYih4cXMAGnzywbDl4E7Q5dCMqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0L7bKImYxTZpxJq1PyrTqr8pgBHKUKiAZaILq3hHEo0=; b=gWdfuAzQRsoHUY6ly9jt9CwNeSNvuB63ua8VFqQqRTxaxXUSdeJGVaS+8SDBtms1ezqBT480sQO/lbxRVqCirZXHAQoZNy0LTySWcMOdAA17bhEAzhjRg5lmqp1osntBSELS6LHaDr7pZzkL6NZoqLINzzCYsTRM9TIH5esqmvb8Uhq8RgOFgGRggKsAwxFvFOzIOkzIEJayKs7HHZANeXOOOIGe/omWvZTXDj8qhMb2z9p04ZsVdGWsgl4cwyIwYDqD4AY/ruDUdPC+psIHilTbCF+xKjqkAxBz4oqopWlP6htmw8LB8wmhcYC1qL37/+/V6lrccjw75nhHqRuorQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0L7bKImYxTZpxJq1PyrTqr8pgBHKUKiAZaILq3hHEo0=; b=zYUY86WZ3LzyHNvgH7F0AZ4ShxIOQXtLzczqwlNkD9tuR7/wnd4eU/bm46yMW9igih/8fp8eKamliSVrdXtMwnAjm+iU6vyjAJK5NXJACAvNTjAdDk0ckEvbeWTUxpOXpyBW5En2q1HmBrdfj8twfU2gzfrJHssLtS2DnP5mLg4= Received: from MW2PR16CA0047.namprd16.prod.outlook.com (2603:10b6:907:1::24) by BL0PR12MB4946.namprd12.prod.outlook.com (2603:10b6:208:1c5::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.17; Wed, 22 Dec 2021 22:14:22 +0000 Received: from CO1NAM11FT060.eop-nam11.prod.protection.outlook.com (2603:10b6:907:1:cafe::c1) by MW2PR16CA0047.outlook.office365.com (2603:10b6:907:1::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.14 via Frontend Transport; Wed, 22 Dec 2021 22:14:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by CO1NAM11FT060.mail.protection.outlook.com (10.13.175.132) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4823.18 via Frontend Transport; Wed, 22 Dec 2021 22:14:21 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Wed, 22 Dec 2021 16:14:20 -0600 From: Andrey Grodzovsky To: , Subject: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV Date: Wed, 22 Dec 2021 17:14:00 -0500 Message-ID: <20211222221400.790842-4-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211222221400.790842-1-andrey.grodzovsky@amd.com> References: <20211222220506.789133-1-andrey.grodzovsky@amd.com> <20211222221400.790842-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 8a0e10fc-8d9d-4818-2b15-08d9c59867bd X-MS-TrafficTypeDiagnostic: BL0PR12MB4946:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:5236; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 7QEzW3UFuagRhsk5awsjnq2p0ZS/VsCgfU0ZY4EljywqfvmQmYGfcGyEOVZZ6bEAB15BLITOepWVI/9MDUM3MAjvUzBZ0Z40MMK+nlUtxTP5pFoA4zFu1ox9wbUvC663S5C48za3PPpBZxqyT5BKmmaTr+rv+387dBRsREeBWijRORLg6RJpx0f3sWFsfiQd6eIWXDFo2O184H0qnIc30I6w9nEDDMkv3PbnTweRjN/PDNVSVl4v+HW5PU8FLMhvYV0u/5NljBQXinkh5TfvutQo8Rc36nO0kKCuRnca8cTmWVclL6wylcpyM3Bt4Hy/8t+KUAiB/nBX0aDrfQMMfqBExXf0Ps578B7k9ePBTw1q0Aax6N93hbmbonjv4G6bteyRVpWG4Dl6nA+ye7xzrS1deWh5k2jhKgixKlfbrB2z6TFSKKD7NAJAjO0vOOK9RCanQ2cJfZjdQL0OF+vlV9342Spp/4fYL+VwcuZBbGaizQNmZngd6UT5fewj68gYeGBDBeVNJv+cAokf8pivcnF8+EVHIdhu29Gna7a1ywjOkhUDCKYRkHbgpnYdYJaVIgzGZ8Bjx2kL0FfqYD09nHBJ3K9Wg4qjiwqgkxK0r7uyRf8aIs/Vcnn5K/bpu8LANfqKIsNMMIUGtyC8bVPKntOixS0J0vQnsrBai6A4zYC31YHP4zF1GQV/ZaDYl/41kQvi2C5QA67WE2I8hRoI4P4BeQqy6PABhAkU79Z8c5yB/mvQNry1Zyxr8xnK6Kbcz3RMkMkz5nprGG0KWSsQWn0c3mnxy2C2vUHR4u1FbXg= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(36840700001)(40470700002)(46966006)(70206006)(47076005)(336012)(356005)(4326008)(8936002)(5660300002)(16526019)(81166007)(36860700001)(82310400004)(70586007)(7696005)(6666004)(426003)(508600001)(2616005)(2906002)(86362001)(316002)(26005)(36756003)(186003)(83380400001)(1076003)(54906003)(8676002)(110136005)(40460700001)(44832011)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Dec 2021 22:14:21.4224 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8a0e10fc-8d9d-4818-2b15-08d9c59867bd X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT060.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL0PR12MB4946 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Since now flr work is serialized against GPU resets there is no need for this. Signed-off-by: Andrey Grodzovsky Acked-by: Christian König Acked-by: Christian König Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 ----------- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 ----------- 2 files changed, 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c index 487cd654b69e..7d59a66e3988 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c @@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, virt); int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT; - /* block amdgpu_gpu_recover till msg FLR COMPLETE received, - * otherwise the mailbox msg will be ruined/reseted by - * the VF FLR. - */ - if (!down_write_trylock(&adev->reset_sem)) - return; - amdgpu_virt_fini_data_exchange(adev); - atomic_set(&adev->in_gpu_reset, 1); xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0); @@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work) } while (timeout > 1); flr_done: - atomic_set(&adev->in_gpu_reset, 0); - up_write(&adev->reset_sem); - /* Trigger recovery for world switch failure if no TDR */ if (amdgpu_device_should_recover_gpu(adev) && (!amdgpu_device_has_job_running(adev) || diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c index e3869067a31d..f82c066c8e8d 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c @@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work) struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, virt); int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT; - /* block amdgpu_gpu_recover till msg FLR COMPLETE received, - * otherwise the mailbox msg will be ruined/reseted by - * the VF FLR. - */ - if (!down_write_trylock(&adev->reset_sem)) - return; - amdgpu_virt_fini_data_exchange(adev); - atomic_set(&adev->in_gpu_reset, 1); xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0); @@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work) } while (timeout > 1); flr_done: - atomic_set(&adev->in_gpu_reset, 0); - up_write(&adev->reset_sem); - /* Trigger recovery for world switch failure if no TDR */ if (amdgpu_device_should_recover_gpu(adev) && (!amdgpu_device_has_job_running(adev) ||