From patchwork Fri Dec 17 22:27:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12685753 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F0C67C433FE for ; Fri, 17 Dec 2021 22:28:16 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3EAAC10E476; Fri, 17 Dec 2021 22:28:15 +0000 (UTC) Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2071.outbound.protection.outlook.com [40.107.236.71]) by gabe.freedesktop.org (Postfix) with ESMTPS id A846210E5C3; Fri, 17 Dec 2021 22:28:13 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FHS8GGi4ObC/62N9dzJhuiSi1Pxcf55dzzqNDkv4KEnWtZr9/RsRCQuvPDFgmsoRlw4V/yqeqnR1UoJgruTWuy/OeiqbwrjjviVLwgNoeNJMpIYiptbCDMf1SYmhOhOkAQNmG+vqTrnKoy5JmUs842dvoemWjYtyt7L+ERl/Inpqne7eGUvDo0yOUkJC95g3O9PwO79fGBeE2P2clEVz+u6W88ExPkC/Fnk/thYLWxkhOEfmjyVMCN1C2RKxWdq32dENMGvWLKd7QAE+1DvwtXwcsiUtTk/MFExNB9FMdTgMzOSDE+Ch9exgAWgQqiG5Wno/ctiMx9PhEGWBOdRyzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Hu1UsnP5dzlqhRd1usnZSk8Ohz+/2ADy9Nznttuu9bY=; b=n1+rAJl0Ws5QXZBj2ebZ71lWu5KkFyPhAZi6ghmPiwtBohNaIqNJuRw2uWlzB6ybUaQEp8R+2jy8owsZlci1erMXcoY4HeaxtFfkXlTXgP8b2Vcs64OM7pOVoQRPhpwFku6kfcQY3OMosscuMooTRpXs7t17pIw0p/dG1zCXtgYWz20f4ilc8a7WagrZfmv20J0N/E0KpK6oAiXx+gnLtAGKEwVjk5DVDFKKjj6UfnQ/YlrYvho+suAHiTnMVUEpmrcKUDdv+ZwFejn0Rxmc+0biN0AZ0zAZs2cvUtJCPb0ROh9I5dzEwhUiyIEj58cW6OTSSjBIPas8U30FJNqmDw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Hu1UsnP5dzlqhRd1usnZSk8Ohz+/2ADy9Nznttuu9bY=; b=4SgjM12AWpesA0kPC30d3wdbmIebBOpQUnrwq+JkQVQtWCWhqlIb4qvuUH7rsYMySQVmgQS9Z+F7ukwiFNWipPwmyr5KDfHG9IB7id/4VOUmtEmRXhBHw+XNBiGzUSAuHQo3U5QzfZqbakVGmOaT7yS905452nRDmSXodubz4hI= Received: from BN6PR12CA0035.namprd12.prod.outlook.com (2603:10b6:405:70::21) by DM5PR12MB1129.namprd12.prod.outlook.com (2603:10b6:3:7a::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4778.17; Fri, 17 Dec 2021 22:28:11 +0000 Received: from BN8NAM11FT003.eop-nam11.prod.protection.outlook.com (2603:10b6:405:70:cafe::b4) by BN6PR12CA0035.outlook.office365.com (2603:10b6:405:70::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.14 via Frontend Transport; Fri, 17 Dec 2021 22:28:11 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by BN8NAM11FT003.mail.protection.outlook.com (10.13.177.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4801.14 via Frontend Transport; Fri, 17 Dec 2021 22:28:10 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Fri, 17 Dec 2021 16:28:09 -0600 From: Andrey Grodzovsky To: , Subject: [RFC 1/6] drm/amdgpu: Init GPU reset single threaded wq Date: Fri, 17 Dec 2021 17:27:40 -0500 Message-ID: <20211217222745.881637-2-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211217222745.881637-1-andrey.grodzovsky@amd.com> References: <20211217222745.881637-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: fcb1cb81-d0bb-4c00-b0b1-08d9c1ac81f2 X-MS-TrafficTypeDiagnostic: DM5PR12MB1129:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:3513; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: M3AXGPFuXMR3jUgAxzTsEI2LfM/RqvDxLmsZO+yYU2JUv3DGgky4JZnegptxUmPB0MKhg0l2weeNnjfSlnswjJ51BWfB6xo/GBY6on5x9D+Vv659rgqvUKFoSbtcxnW2xVVTI5jlvgEVE0wsjHI7DFgQCszjlOHFFnpXafqvIYwci/QUThg5ALyGbFkhDMG5biH/InvMtpmEh5p5zxiZHBWAcioblEArbfq8hZeMOJqqFHKtTJ+e0b8EztTYCZPi6hncwqn1RzOKUMgo6zMbzgefWVUK0jqQkDvtSoXeTi1e53hi8f6P+so6AzuXzc1enOskKp6dWpTv41a9y2xmrdoBAhAnNRD2EqG5T0B1737YDuO+vGpCiVom1W/eFV7g0thIOKVhzYziP7eIeV5AKdfJIgaPtTz5XHm3VUJuzgiCBjs/ZBc79OHOQ32wD0c2Y9XCMgepSlyvNaLikEEFNXd1vW29mKu/RcoFFGlvSlD8LzX2DOiZqHP7O3r0sVenEbjyTGEpUh/C6m/npG9+5Gubb83HzfZE3wu/D6+F+sNRQzoRJGSgEUopRgwtsOvNKPZN6YX/ldnmOXUmcy3x/7bnsBBiNAHSAb5d1tnWzZT2mif3XS5JVpmQWukODJwx0kWPGrvLOBdbB3RCTjspIDYSrrYNGYLY8paPV9DT8VuhpKuIdSqupHc4L46T8Pv8HL31bqcA621jDMmo8mg0H45Yji4Yoyv2mybuYDqbhNEvp8exEviz7cSQ3YciNSI9vcb3e1pNDwhFMc+MqHhykZTHI9Gqggi2fTWFoccWKOQ= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(40470700001)(336012)(6666004)(83380400001)(47076005)(2906002)(356005)(81166007)(1076003)(2616005)(426003)(70586007)(44832011)(36756003)(70206006)(82310400004)(36860700001)(508600001)(40460700001)(5660300002)(4326008)(8676002)(86362001)(8936002)(316002)(186003)(54906003)(16526019)(7696005)(26005)(110136005)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Dec 2021 22:28:10.7843 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fcb1cb81-d0bb-4c00-b0b1-08d9c1ac81f2 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT003.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR12MB1129 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Do it for both single device and XGMI hive cases. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 2 ++ 4 files changed, 37 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 9f017663ac50..b5ff76aae7e0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -812,6 +812,11 @@ struct amd_powerplay { #define AMDGPU_RESET_MAGIC_NUM 64 #define AMDGPU_MAX_DF_PERFMONS 4 + +struct amdgpu_reset_domain { + struct workqueue_struct *wq; +}; + struct amdgpu_device { struct device *dev; struct pci_dev *pdev; @@ -1096,6 +1101,8 @@ struct amdgpu_device { struct amdgpu_reset_control *reset_cntl; uint32_t ip_versions[HW_ID_MAX][HWIP_MAX_INSTANCE]; + + struct amdgpu_reset_domain reset_domain; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5625f7736e37..5f13195d23d1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2391,9 +2391,27 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (r) goto init_failed; - if (adev->gmc.xgmi.num_physical_nodes > 1) + if (adev->gmc.xgmi.num_physical_nodes > 1) { + struct amdgpu_hive_info *hive; + amdgpu_xgmi_add_device(adev); + hive = amdgpu_get_xgmi_hive(adev); + if (!hive || !hive->reset_domain.wq) { + DRM_ERROR("Failed to obtain reset domain info for XGMI hive:%llx", hive->hive_id); + r = -EINVAL; + goto init_failed; + } + + adev->reset_domain.wq = hive->reset_domain.wq; + } else { + adev->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-dev", 0); + if (!adev->reset_domain.wq) { + r = -ENOMEM; + goto init_failed; + } + } + /* Don't init kfd if whole hive need to be reset during init */ if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 0fad2bf854ae..8b116f398101 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -391,6 +391,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) goto pro_end; } + hive->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-hive", 0); + if (!hive->reset_domain.wq) { + dev_err(adev->dev, "XGMI: failed allocating wq for reset domain!\n"); + kfree(hive); + hive = NULL; + goto pro_end; + } + hive->hive_id = adev->gmc.xgmi.hive_id; INIT_LIST_HEAD(&hive->device_list); INIT_LIST_HEAD(&hive->node); @@ -400,6 +408,7 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) task_barrier_init(&hive->tb); hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN; hive->hi_req_gpu = NULL; + /* * hive pstate on boot is high in vega20 so we have to go to low * pstate on after boot. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index d2189bf7d428..6121aaa292cb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -42,6 +42,8 @@ struct amdgpu_hive_info { AMDGPU_XGMI_PSTATE_MAX_VEGA20, AMDGPU_XGMI_PSTATE_UNKNOWN } pstate; + + struct amdgpu_reset_domain reset_domain; }; struct amdgpu_pcs_ras_field { From patchwork Fri Dec 17 22:27:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12685755 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C7995C433FE for ; Fri, 17 Dec 2021 22:28:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C64C510E5C3; Fri, 17 Dec 2021 22:28:15 +0000 (UTC) Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1anam02on2043.outbound.protection.outlook.com [40.107.96.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9DA3010E476; Fri, 17 Dec 2021 22:28:14 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FMsN7LeZwk/5sBzPD5tB9GnlDC2ryO0MK0iw1uQgT4HCAinspaCx7HBxoZpOFhv7DRRy88iUiwPhFCBR/2yvbJtiq/IsEJpYFcWvP9wOQKmNWpurPu1omLqPTugZp5m1VmZ0dhwYVrTtnFGzkRPjB8Qv+PuZBVb4zBOBu7X5L5LqTD9DIIOBUPhOGiGaQ0jE294y1pTb7cR0LX6WFC8fav59/BaglqNwMzIsk/WroRNy6vvLFtNlPXczGqwUPscjOJJZ8Zr27xJimWFW8dHj8352s7kVXRhkuPEQwt0bX4uT63lWnT1Nbj2iwoi17NlHuegZt33h62Zpp4xLVimOGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rmZbxkfCHleGgNGqebSUvlC/FyR0ZtN5noYhW+prBvU=; b=ZFcYskDF9QyP1kpAsw5Tu6Bm8KaXcJHljmlBaB0hcD4wqfG9sKWvy2bYKEUmbAvJ0UqBAdA2dhYgJK7ZY1U+0YZKAPoEycHWamWGGeIIBHL4OSG8iWPzU2xA1C8xdfEixn0tX4YEizTNIXXbkVrjHIxVC66j6WatSQKCHWGHDsdhGDjOtsJu3xbAtjnlN9MejFRemcGXut4Vj+61sLDKsVjHOUsMLbcobi24x+qaFDI3NUTdKlzAwIgOIku+LbjqWBs0nDjVyd9YPbFSpIAAXLrEeDe43qmo81lqpG2gdiaRMVit3P04Wv4pUOnvmAa9lbCiEpyKfw+YwEB43kmkaA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rmZbxkfCHleGgNGqebSUvlC/FyR0ZtN5noYhW+prBvU=; b=jeQVflBHdLYuUbxrmRT/mb27NIMaMNPFyB5aZngDVd6WllaoEG1swAUwGYvNcENXnHD4cXJWOtdpbESvhUTC9KD7sgHVUf4fFwClIKFC/c9Tp0ngd4Ac1gFoNvFLalQ/1Xiu3+3jHXdPaVURupb7wjAWKzuxLUYj4r6GUZgW7AQ= Received: from BN8PR04CA0013.namprd04.prod.outlook.com (2603:10b6:408:70::26) by CH2PR12MB4277.namprd12.prod.outlook.com (2603:10b6:610:ae::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.14; Fri, 17 Dec 2021 22:28:12 +0000 Received: from BN8NAM11FT051.eop-nam11.prod.protection.outlook.com (2603:10b6:408:70:cafe::4b) by BN8PR04CA0013.outlook.office365.com (2603:10b6:408:70::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.15 via Frontend Transport; Fri, 17 Dec 2021 22:28:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by BN8NAM11FT051.mail.protection.outlook.com (10.13.177.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4801.14 via Frontend Transport; Fri, 17 Dec 2021 22:28:11 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Fri, 17 Dec 2021 16:28:10 -0600 From: Andrey Grodzovsky To: , Subject: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready Date: Fri, 17 Dec 2021 17:27:41 -0500 Message-ID: <20211217222745.881637-3-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211217222745.881637-1-andrey.grodzovsky@amd.com> References: <20211217222745.881637-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: b64f631c-9ab7-470b-b065-08d9c1ac829e X-MS-TrafficTypeDiagnostic: CH2PR12MB4277:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:510; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: tYmn9JpHhZFLZZBv8DX9O0ww1TYMBfzcoD4LyIm4UyAsFTUyQ0XNwegEb6fTlrxWKM9RoRxPl9Jea6oSYiLdMM8W3MF2xqvudiA4g9oswpp1WHBxzl0b+mDrGHHkIKxGAeAM28VkysaZwAAdCSQJZc1adySMj52fwS5AhJGpgrxEjZyMcj4koguQrv+aGYtMjuO2ccZNbvtAXuv2eEnZ0nV5SzCmqRsRSteKrpt0eOM1om+1PZ+eX8GQnbjRIYvu21iJfzAS18JuNnQ7fExc6iD9QTzXzSM35do7NkeifCgke0/kXFHc/bOaQpgXlTgUywLVc8B4TFOsa4l0s0L8rZxiwnuSrbxiuvU3W+FotCzwdnRo4dzGWxlkNwbTXVQUBMoM7Jc4rWXJNvQkxd/YUmgFUyewKWuBS4nKfkat/cqa/SAFHQJ4IC6PBN0kC8l3dqEFvK5EdrYerJAwzvZMAYB3C/pgDIDSA+UmuXt7WLNiXpTat5uo4bzUHZ4rdqi0Rep7dFxGOc+fE+dm9xoAcAeDQQlyj7/u8lmdbefgdzfWYifSIa8ZCy5zifutERjSAu9AlZXPyeKoqWdEUCzXxc+XrFjL25fX5d0XJA5t99TJZr+qSyCFwJPSYwr/PE5KguZv1/5quu9PF+oE3F+BMX/x3nSH0+AItz0lOP8FgdyRWTURHVsIpkQT922van1nSLcwMmKqD4es73HkHLsjTG5zT61ZlhYek/0F36htqFQNEVkdtgSVaX7vw1NH7zStz3l8AtYgmkZCGOgz3xSY1khAR2jH8yAPENhcbbBMly0= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(40470700001)(356005)(47076005)(2906002)(110136005)(8936002)(83380400001)(6666004)(40460700001)(8676002)(44832011)(26005)(508600001)(426003)(7696005)(4326008)(86362001)(70586007)(70206006)(82310400004)(186003)(36860700001)(5660300002)(36756003)(316002)(16526019)(2616005)(336012)(54906003)(1076003)(81166007)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Dec 2021 22:28:11.9185 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b64f631c-9ab7-470b-b065-08d9c1ac829e X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT051.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4277 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 34 ++-------------- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 + 3 files changed, 51 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5f13195d23d1..b595e6d699b5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev) return r; } +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) +{ + long timeout; + int r, i; + + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = adev->rings[i]; + + /* No need to setup the GPU scheduler for rings that don't need it */ + if (!ring || ring->no_scheduler) + continue; + + switch (ring->funcs->type) { + case AMDGPU_RING_TYPE_GFX: + timeout = adev->gfx_timeout; + break; + case AMDGPU_RING_TYPE_COMPUTE: + timeout = adev->compute_timeout; + break; + case AMDGPU_RING_TYPE_SDMA: + timeout = adev->sdma_timeout; + break; + default: + timeout = adev->video_timeout; + break; + } + + r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, + ring->num_hw_submission, amdgpu_job_hang_limit, + timeout, adev->reset_domain.wq, ring->sched_score, ring->name); + if (r) { + DRM_ERROR("Failed to create scheduler on ring %s.\n", + ring->name); + return r; + } + } + + return 0; +} + + /** * amdgpu_device_ip_init - run init for hardware IPs * @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) } } + r = amdgpu_device_init_schedulers(adev); + if (r) + goto init_failed; + /* Don't init kfd if whole hive need to be reset during init */ if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 3b7e86ea7167..5527c68c51de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, atomic_t *sched_score) { struct amdgpu_device *adev = ring->adev; - long timeout; - int r; if (!adev) return -EINVAL; @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring, spin_lock_init(&ring->fence_drv.lock); ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *), GFP_KERNEL); - if (!ring->fence_drv.fences) - return -ENOMEM; - /* No need to setup the GPU scheduler for rings that don't need it */ - if (ring->no_scheduler) - return 0; + ring->num_hw_submission = num_hw_submission; + ring->sched_score = sched_score; - switch (ring->funcs->type) { - case AMDGPU_RING_TYPE_GFX: - timeout = adev->gfx_timeout; - break; - case AMDGPU_RING_TYPE_COMPUTE: - timeout = adev->compute_timeout; - break; - case AMDGPU_RING_TYPE_SDMA: - timeout = adev->sdma_timeout; - break; - default: - timeout = adev->video_timeout; - break; - } - - r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, - num_hw_submission, amdgpu_job_hang_limit, - timeout, NULL, sched_score, ring->name); - if (r) { - DRM_ERROR("Failed to create scheduler on ring %s.\n", - ring->name); - return r; - } + if (!ring->fence_drv.fences) + return -ENOMEM; return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 4d380e79752c..a4b8279e3011 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -253,6 +253,8 @@ struct amdgpu_ring { bool has_compute_vm_bug; bool no_scheduler; int hw_prio; + unsigned num_hw_submission; + atomic_t *sched_score; }; #define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), (ib))) From patchwork Fri Dec 17 22:27:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12685757 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6BBE2C433EF for ; Fri, 17 Dec 2021 22:28:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 55C5910E65E; Fri, 17 Dec 2021 22:28:21 +0000 (UTC) Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2086.outbound.protection.outlook.com [40.107.244.86]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9785010E59F; Fri, 17 Dec 2021 22:28:15 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WGiATl5mHgUo2Za1GRGrM++mFBAaJIWZjIs4eQ/WOuNNagY1m/i3NGHAyVo+D8o6Wa3Fi7qxSQOH46s1m1JVY1L+8PuS9wQjhsjbhgL9saZzlYGH0RpW3UHjOsmJAWggHqfIoiTiqU09A3OPAxLUeNg60EvlCYqDxjOc7Za6dUgaACncpR5MOOpQXOPn+bv9+U2U9r+x9ZSuuqlN3OtBJxue8f39SHppz7LmsKGdjjQ9LsV1jK5s7g3ZKojzN+T2fLQsQWnSpuNrZuR5i4vs6tswAfn/rPjb2nPXPfAnds7g7EcorrcLUC+UgwwsQewFWXwpUiPLskscX149UpsZqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+4/2uRIx9zwn9X49l80cVOBWoM5CWkrEJhA6a+OVWwM=; b=RwWYd00BwliXD4ahVzaDUK2zXM69J8Vr+SGo6XnR59W8wR/8Lq90zLfsCBwBpgOm6HJ1eooeGQFG2ECFdDbAKN5SEGr7mAGR/qquroXp5xQKdtzWAOpZlhfj9P0kvABlPDRE1ZHIvvPIeiSF+qUzDUEO6HbcoYvPmf3bl1UEV1bWk+wvazTtF+SkkKqPdVLbDnf/y52nLeZvL0fhAqJX9c54T48ITeVOiHOcIbgVCkkqq+m/sjOzwQOV+SZVr1hyjw6IYxKeKw4kBh6betq+KuQ8mVVqKcy6zradAfg0nfiKowA5j6ehYdvthOmA11XhzVC0xTVJacYttBpo83AP2A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+4/2uRIx9zwn9X49l80cVOBWoM5CWkrEJhA6a+OVWwM=; b=sPNXBtrLCVjb2pLq6YeNhDQjgCDyXOhXot84tFVbXJZKZMp4ccDcBAUduKR/OoUvWY0er4DEmUNgq/xvijk7AKyeODVnbHHFa9NBAkfmBQtNabmYiU54AfyLHBuQZ7PTmp59gcnOlyc0Z1sZPOfw/bxHJp7YHLoaCE0QeV0+heE= Received: from BN8PR04CA0031.namprd04.prod.outlook.com (2603:10b6:408:70::44) by BYAPR12MB3349.namprd12.prod.outlook.com (2603:10b6:a03:ac::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.14; Fri, 17 Dec 2021 22:28:13 +0000 Received: from BN8NAM11FT051.eop-nam11.prod.protection.outlook.com (2603:10b6:408:70:cafe::5d) by BN8PR04CA0031.outlook.office365.com (2603:10b6:408:70::44) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.16 via Frontend Transport; Fri, 17 Dec 2021 22:28:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by BN8NAM11FT051.mail.protection.outlook.com (10.13.177.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4801.14 via Frontend Transport; Fri, 17 Dec 2021 22:28:12 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Fri, 17 Dec 2021 16:28:11 -0600 From: Andrey Grodzovsky To: , Subject: [RFC 3/6] drm/amdgpu: Fix crash on modprobe Date: Fri, 17 Dec 2021 17:27:42 -0500 Message-ID: <20211217222745.881637-4-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211217222745.881637-1-andrey.grodzovsky@amd.com> References: <20211217222745.881637-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: aace3870-462e-4566-43b4-08d9c1ac8342 X-MS-TrafficTypeDiagnostic: BYAPR12MB3349:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:3383; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Z1FC7QRN+dh9oW5hoPf1zMQQZQF0NcoZ8QEj3omKD0+D9CgaS2/Yfg2tC8ggvM4sOMyA9h1p7sQJ02BZJf2Pjka4bir8qrEt3a8hibXY0IDFg7iu12wNW4OPBmr01lVWQNCKrH/AYJ1c/fYZtErz3iZ1AJtDQVEQ7uK0YmMluWM0/lzfKqEd1RhUQr2uzlN1YbYZsXZY4U80RPMRDwO96FESmtFY/cn4jDT+Dds+fV4sPm0SyT4sq9Byc/PVS5Dpqw/y/NjOsLHTuzp4ri3EHADdW6zURPvm5MpxBJ76acR32nr1zhfWivHRHE4JX1H3u/oQf2obkXb77XOyVxkKrWLphltU8wMk4ONTnxdh232gvv+ePqjuIGp5G30AEysHjKwCx9DV7rc71ozg7zZeybN7hcavQdrcd+80ypVhM4WctalTlo49sYXMrM7G70rOQcjWYyPZbIHnblp39ts9BZHNO4+d6YtVCIlICm7pZxeI8653HfQPJtib2DRGFmetv/EugE0abVm2ztlenFDJkVb4n6ReUf/Xm3foQ9Z+wd0s5lyzm39uRxLmR/KGwFRFd1dpzn4rVWLhDw5+qoDsSCYdbaNNRt6oEq7YXkFMS2LzK7HxQtC2Sd2ZH0mG6m+/0z+dBvrQMBNKlnlHeYJFzLP4EiUiVZPeXTr8YlWU5H512WuSNS8xR41oCXFAdGLy8MgChcaweSdbZuEuPN9UE6pgFEcJlVQKUwIUsPBU7tYRsUDxAP5oN88X3XINUMKX1kvp6Dv/xKfxl6UXI1rOT2kN8foFK8rsRR1Z81jzWDA= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(40470700001)(36756003)(508600001)(44832011)(8676002)(426003)(54906003)(26005)(47076005)(7696005)(2616005)(36860700001)(336012)(110136005)(6666004)(1076003)(2906002)(4326008)(356005)(5660300002)(316002)(4744005)(8936002)(70586007)(83380400001)(16526019)(186003)(81166007)(40460700001)(70206006)(86362001)(82310400004)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Dec 2021 22:28:12.9735 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: aace3870-462e-4566-43b4-08d9c1ac8342 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT051.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB3349 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 5527c68c51de..8ebd954e06c6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) { + if (adev->in_suspend && !ring->no_scheduler) { drm_sched_resubmit_jobs(&ring->sched); drm_sched_start(&ring->sched, true); } From patchwork Fri Dec 17 22:27:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12685761 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D311AC433F5 for ; Fri, 17 Dec 2021 22:28:30 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B85A810E779; Fri, 17 Dec 2021 22:28:22 +0000 (UTC) Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam08on2070.outbound.protection.outlook.com [40.107.102.70]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2372810E6CE; Fri, 17 Dec 2021 22:28:17 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AWG9yl6z7/2i3bKvZjT/EpKFyiZ/DozGEVFmzK89rlDDGURxWsJukqVY0kkHnpqniq5FTIOrwOD3Mv74EX+elV5wsCd04sFQLfbDDgWoghGgjDG0sLXpsL/irq/gySai6r9pLsvV4/HfEMHNtsRLC432RLo7b/Gio2xKLZNZ5El/JGvasXvghbpdzYNFxHy/AeyqwtEzqU+kUMyDfjxUnKqvPNCG8+1M+Xi8ODFy1y0YFSDJW9mNxX0EUOS3nkercrCpbrv2iJTg+wi8PuxRqhDM12wAugr7o3hczR8D93D8usBf3zwXzGU3LVBcUFIqDfW78KBG/Ks2J9cKjUQIbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GOqzt8UQjxRRCGvkuiLk04dl9aIzxyq/VPOFpOyI8pA=; b=D5udkzSmRPg0b77v8o89gjL4F3+lP+aaXwg9u/buMNhee7w752bpH0J7uUVhZ56uS4vvuae+t4Nru++b3x6Kf9/sSr3MW9ir86nNbuFw1b423CMOk0n6Zq1QjDIsnLBvpVH5CH1o7t32beeI7knBLRycYCz+2JZNNWd74fVdJ80RdxlAXzfCobLPYw1uHeB0qv2DJz8i9mqNB++i5/57utg13KAKSdq5QOuEaa8kRHhZtOY7fB+ymF3WNovDxaD6sUddRPsCMnfzz4BDUEcwmryWTpxRRHMVr+YAIv+1K9LG6g4ttZUcU1wUW+Q7wSeC5341InXNuapJAgfHCeXc5Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GOqzt8UQjxRRCGvkuiLk04dl9aIzxyq/VPOFpOyI8pA=; b=d/1VKs/WBGhpGIN1hWB5pFaHqLOx8ADUteTRjs70+dfkZaOrAUPA5a1LFBggOlhnBkitQovFk2EqJctNCRM8IMrTVGwMDVZ5XdYReWbeV2ONGxvRX2dxWaK46a2xleSaccGTBbN+fJ9f3SSotG9zVrS5EId2bc3hjLHqBjcZrMI= Received: from BN8PR04CA0027.namprd04.prod.outlook.com (2603:10b6:408:70::40) by BYAPR12MB3080.namprd12.prod.outlook.com (2603:10b6:a03:de::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.14; Fri, 17 Dec 2021 22:28:14 +0000 Received: from BN8NAM11FT051.eop-nam11.prod.protection.outlook.com (2603:10b6:408:70:cafe::8a) by BN8PR04CA0027.outlook.office365.com (2603:10b6:408:70::40) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.14 via Frontend Transport; Fri, 17 Dec 2021 22:28:14 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by BN8NAM11FT051.mail.protection.outlook.com (10.13.177.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4801.14 via Frontend Transport; Fri, 17 Dec 2021 22:28:13 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Fri, 17 Dec 2021 16:28:12 -0600 From: Andrey Grodzovsky To: , Subject: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs Date: Fri, 17 Dec 2021 17:27:43 -0500 Message-ID: <20211217222745.881637-5-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211217222745.881637-1-andrey.grodzovsky@amd.com> References: <20211217222745.881637-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: b5c82c37-f06b-4a3e-dfb8-08d9c1ac83da X-MS-TrafficTypeDiagnostic: BYAPR12MB3080:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:6108; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 02CEsSi0v2xyM1gCk/6v6na7jdus/vl0xrnVRxad58F5VvpViZ5QvuXTNwqvA+iEQ3yFXl2GZxAYL1i+j2azMZr3+gj4p1Q+DDu8z4mlbfx016Et7K56vohSvCkN44zrPxgFLuNlDHP2iUBEjF6Xjk71JQmx0SFG1q4/iNwiNT67Cdh9MnmPi1GNWJCs2BEANBa9gnCAIhWJ7d+BW5QErOdH0EYdDP9yRvRGtjWIMVx1CVPr1Rq9eGiQ1TswB2/DYtY3lCNzOi2Yo3DmRvmVKguk88/qXievC8VYC8SZkbt0d4jccPgEORThF6H6hHCaR1QaUJSszrAhJimCduYghMY5imZ4pvR3h9DOtLrqlH4ULECPueYn3mSH76fIXqzkur7pJUj5toArdTkWhgIGaSUkg2ecStyjMr/Hd4bSWkIe4BLekpJdDLLmgZYokqgNOa9Qzw68SRpm74bl028PdsYfAl56h25VOSKfEMqy0B2WeBJZG9VE8TdkxQP/Pjc9s/E38o7kukugaxhcz1C5ZtRUQ8pIv48SbM7MvYFb/ohE75kJ0Aj6o9klrBlCMfBPRqL/AnPiNRumtBRmYx9BJbiT2lR/juQWiEOpVEAkTR2Jlo9QH7InoAF3TPGG/SVYaoS13wG6SLnulk+M/gR1OpKu7To1/26ocxQ6Rfd27gy96+SbeyU0ICpUcBWWFuIMn0vC09MmRTmorFpffrFSLC64gpgAj/dqdYQCdCBI7RZ1bsxyX3rO56BX1i9TMDYn7M8zPg34A2wU3ZbWOL9o4SuUrMmkMgu+EcvXg+zNwtM= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(40470700001)(36756003)(4326008)(47076005)(8936002)(5660300002)(86362001)(336012)(83380400001)(82310400004)(44832011)(40460700001)(70586007)(26005)(6666004)(7696005)(81166007)(356005)(110136005)(54906003)(316002)(16526019)(426003)(36860700001)(186003)(508600001)(8676002)(2906002)(70206006)(2616005)(1076003)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Dec 2021 22:28:13.9003 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b5c82c37-f06b-4a3e-dfb8-08d9c1ac83da X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT051.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB3080 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- 3 files changed, 35 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index b5ff76aae7e0..8e96b9a14452 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev); bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev); int amdgpu_device_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job* job); +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, + struct amdgpu_job *job); void amdgpu_device_pci_config_reset(struct amdgpu_device *adev); int amdgpu_device_pci_reset(struct amdgpu_device *adev); bool amdgpu_device_need_post(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index b595e6d699b5..55cd67b9ede2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs( * Returns 0 for success or an error on failure. */ -int amdgpu_device_gpu_recover(struct amdgpu_device *adev, +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, struct amdgpu_job *job) { struct list_head device_list, *device_list_handle = NULL; @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, return r; } +struct recover_work_struct { + struct work_struct base; + struct amdgpu_device *adev; + struct amdgpu_job *job; + int ret; +}; + +static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work) +{ + struct recover_work_struct *recover_work = container_of(work, struct recover_work_struct, base); + + recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job); +} +/* + * Serialize gpu recover into reset domain single threaded wq + */ +int amdgpu_device_gpu_recover(struct amdgpu_device *adev, + struct amdgpu_job *job) +{ + struct recover_work_struct work = {.adev = adev, .job = job}; + + INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work); + + if (!queue_work(adev->reset_domain.wq, &work.base)) + return -EAGAIN; + + flush_work(&work.base); + + return work.ret; +} + /** * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot * diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index bfc47bea23db..38c9fd7b7ad4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) ti.process_name, ti.tgid, ti.task_name, ti.pid); if (amdgpu_device_should_recover_gpu(ring->adev)) { - amdgpu_device_gpu_recover(ring->adev, job); + amdgpu_device_gpu_recover_imp(ring->adev, job); } else { drm_sched_suspend_timeout(&ring->sched); if (amdgpu_sriov_vf(adev)) From patchwork Fri Dec 17 22:27:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12685763 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8CC01C433F5 for ; Fri, 17 Dec 2021 22:28:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CC66010E6CE; Fri, 17 Dec 2021 22:28:24 +0000 (UTC) Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2080.outbound.protection.outlook.com [40.107.93.80]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7BC8E10E5E5; Fri, 17 Dec 2021 22:28:17 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bG6EG0QLJk8uuJJuEcXiTFUUZ/CgKeWCGRa50qyZWdTuHvIWS19YBiMmcU2w/rY4u35qg+0ZItbENXKxz1+6y1Edq3amZT4K3BSEkcfPLdDK/ZQj5utMtAh4ZuB8ie7gD4c37DGfyivAFCzJVyWIfmsn79S5kyRXXjRWZnsvXJgdPhqbFUSV1SQBo3HfcuqI9lZ05MrFDrbfXeZKs8aEfzSF2s2vhoA4nFkmAg2KJkMQ/AVLLRsBVrcTcz7nr1llA6X/y0ZT1YVE0S5e4WiUVmgPXvso3TnZig3yG/lvSKykToGlTG/9gLm92LAq7OnlbDKyNElgPt9NBW8+QZie0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CDVya4Pi/qTI2Ews7LWPaxgX8eqLLIQ1cz9/VdY2Qy4=; b=oNdiDkGDRBOT3NSXhgtKljhyESeQcVjwiqDmrUyZ8WSx6vTaNNT7udxK9MGf6xnL61gaGOtCqvjtcVOeaJOYK8XzCvNrz8Vk2LdIsEOH36BC71khmeaEZT//G3ivE9QokkrXGyiua/0hw1M5QaA4wRr13jCpDG4k+E8CqOI8n3oStZ4fFhQ8RhG6kgsl+VdQgg/QX9pCfddbHrkksYNXk30eVOwVytlIpT9emGX9NcuG7EqGbd8DCTvACPp7BCGUFby7c3XIiKloR7DZ4BWCZ+h3rYFlKaIr6Exky3G8m3eyukSdXDV7YLOvJ2p5/71S+TLYVUwO0hjUZmdxYJQ3tQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CDVya4Pi/qTI2Ews7LWPaxgX8eqLLIQ1cz9/VdY2Qy4=; b=hzA7F14IkLJSd+8PTF2TVKyth9u69bFv+pLl04mVulL5PT2Avn0z7Kz3F4BJxiITPO2P9JeAXEQ7ZnWigXC2qs/aAeHZNymlfMFAXwwfm+7kGh97V1PfuuISsGxxm3wkP2ACyDBSPEPdBTjcqBNZVgPmbDPvX/CRGypi+5tivuE= Received: from BN9PR03CA0066.namprd03.prod.outlook.com (2603:10b6:408:fc::11) by MWHPR12MB1278.namprd12.prod.outlook.com (2603:10b6:300:10::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4778.17; Fri, 17 Dec 2021 22:28:15 +0000 Received: from BN8NAM11FT037.eop-nam11.prod.protection.outlook.com (2603:10b6:408:fc:cafe::2e) by BN9PR03CA0066.outlook.office365.com (2603:10b6:408:fc::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.15 via Frontend Transport; Fri, 17 Dec 2021 22:28:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by BN8NAM11FT037.mail.protection.outlook.com (10.13.177.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4801.14 via Frontend Transport; Fri, 17 Dec 2021 22:28:14 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Fri, 17 Dec 2021 16:28:14 -0600 From: Andrey Grodzovsky To: , Subject: [RFC 5/6] drm/amdgpu: Drop hive->in_reset Date: Fri, 17 Dec 2021 17:27:44 -0500 Message-ID: <20211217222745.881637-6-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211217222745.881637-1-andrey.grodzovsky@amd.com> References: <20211217222745.881637-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: aa8e4a8d-9fb0-4477-d706-08d9c1ac846c X-MS-TrafficTypeDiagnostic: MWHPR12MB1278:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:923; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: /MowPwhv/aVdXkc+7lIqmZJQqPNopLQMCGtnSJtmZXaXCbAaAjO4diuytG8otLE/fVlUGfKj5ZmrBu0G6V9DKocV7F/QNHGwHrlQeA2ttTLLnxnjv2gsPHnmX7L8ZfCs2DPp1wbYETOrBC62NkWeD0qvv6Ni2vnf05XjrHeMMGF7WfigWb65IPJglR7gs8HIsHvIVF9Ho4aEOwynN0KggiaADSYEkj0ADCjFRT9ghILIQRSq+NeQjUKs52z7opfwRsyslWKR7g3PIzavd+tiV6w0mhEMm7TXGvrwG/wYKiS8vri4gjAc8MdrZqDyPEt7drhgwHobigjkDPnidAChzGHYkodBfH1P5Z/gUPo7xUQ3zh03lMvIbLl3rBc5u3iY3juuzx2Y3vrXFJjjcoMdoc9Xt3p3+iG92FjxoSydbcDkqaMxpNkjLEJ2T5VQFMlcL1hbAvKfqS5izBmURh7RY8Q1b+gnH1aldMRWJyfQOA6CJBiWmmuh8cntNwCZokdcEZlag6K060FW2Gd/5d1K2hix7U8QPIKxR8+Gh9Ty+se5HLMfLp+Lj+d59w1xZuARHPeJ+sV2GxgStcwouw76hUYFUab5M5/AgpIQNK3oEnHxfAuGsfc9661j4igvsbrUFTaq4+dB99UaA/BVbt7/oHvKJL9gIhOtlu3JuYS6LJuJyNkswNXRpjS+Mw6o6oTBxzAP38vFaHyMehS0V1xYbbDuguT6cbx9WtljxMao4wsLMZC7C/hv3prRhzMYx9PaA2axRYltcLd8zGnu24R+2/njamXzpdcYgvVtK8BgSJw= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(40470700001)(2906002)(8676002)(54906003)(2616005)(110136005)(44832011)(1076003)(316002)(336012)(36860700001)(8936002)(83380400001)(47076005)(40460700001)(5660300002)(6666004)(426003)(36756003)(81166007)(7696005)(508600001)(70206006)(356005)(86362001)(70586007)(26005)(4326008)(82310400004)(16526019)(186003)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Dec 2021 22:28:14.9268 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: aa8e4a8d-9fb0-4477-d706-08d9c1ac846c X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT037.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR12MB1278 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Since we serialize all resets no need to protect from concurrent resets. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +------------------ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 - 3 files changed, 1 insertion(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 55cd67b9ede2..d2701e4d0622 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5013,25 +5013,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, dev_info(adev->dev, "GPU %s begin!\n", need_emergency_restart ? "jobs stop":"reset"); - /* - * Here we trylock to avoid chain of resets executing from - * either trigger by jobs on different adevs in XGMI hive or jobs on - * different schedulers for same device while this TO handler is running. - * We always reset all schedulers for device and all devices for XGMI - * hive so that should take care of them too. - */ hive = amdgpu_get_xgmi_hive(adev); - if (hive) { - if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) { - DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", - job ? job->base.id : -1, hive->hive_id); - amdgpu_put_xgmi_hive(hive); - if (job && job->vm) - drm_sched_increase_karma(&job->base); - return 0; - } + if (hive) mutex_lock(&hive->hive_lock); - } reset_context.method = AMD_RESET_METHOD_NONE; reset_context.reset_req_dev = adev; @@ -5226,7 +5210,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, skip_recovery: if (hive) { - atomic_set(&hive->in_reset, 0); mutex_unlock(&hive->hive_lock); amdgpu_put_xgmi_hive(hive); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 8b116f398101..0d54bef5c494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -403,7 +403,6 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) INIT_LIST_HEAD(&hive->device_list); INIT_LIST_HEAD(&hive->node); mutex_init(&hive->hive_lock); - atomic_set(&hive->in_reset, 0); atomic_set(&hive->number_devices, 0); task_barrier_init(&hive->tb); hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index 6121aaa292cb..2f2ce53645a5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -33,7 +33,6 @@ struct amdgpu_hive_info { struct list_head node; atomic_t number_devices; struct mutex hive_lock; - atomic_t in_reset; int hi_req_count; struct amdgpu_device *hi_req_gpu; struct task_barrier tb; From patchwork Fri Dec 17 22:27:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Grodzovsky X-Patchwork-Id: 12685759 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4C04AC433EF for ; Fri, 17 Dec 2021 22:28:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DF8A810E780; Fri, 17 Dec 2021 22:28:22 +0000 (UTC) Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam08on2061.outbound.protection.outlook.com [40.107.102.61]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3195D10E5EE; Fri, 17 Dec 2021 22:28:18 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Whd+wUyAmFZCW+GBZg4p5NRePIbUeJ0PukM9BpoQ7Sv9U4w8J/6riomqQmhK3W0M0RaQApRAZT8+PfwUgPEsT6F5JJEGdTA6wGIbxUkID5xXw/F6aBP/dGKevMUdf/dZqrSDFbbllmTvK6YOE24wc3HZqcXPulpz/MIr1oyUfhGddhfWcmuZtmLyNs6B1O+wVHNtDNwhZmlpq7Gf84Z++kSsUTNQy3SEouk5xMCZeyEJ4qsagX57NSRDNN6XAXsU7SFuAvrWYhRVxGmnNMb1ssEpvJkAwqC1MYPwKNnf/ZKNNnMzHqqE+AuXaBf5VhECOyVJtr5M9InDoYm9t+9oKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eXpYa3Qsx0FBgMqDi0pVO5hbwnlso8LCeZKlHmlIl+Q=; b=kfRqupIK6w20Wx4nkqPiPWJ5hNiKPLdsKf7eIFUpsUQv97UCzxvt39Sz/GN5b9dMUqiTgjh9Am3M5aRKmPFW7v+cZgxGh56jGJ8ICKzBceNxvWMz8Zo2xrmBNRi2+Z+WC5FalyclZLRZjthUKEcNucccbaXC31HuhOvk1FPE7EA4O3dqj3nkxR9CGINEBrK3b7iAXk1/ZSmljWO4xdys+0rSFeL3VL3zV9O5JguPFb8CvWXms4pOWyYgJKaOcpqfnyucg4SH6dNyy4jaIgedJWUnpOb/tR55OnKD9xMVBTjfL0pzMcet9JKu9XcCZx5wyJxFzzsdmk+M3b5rsUAfUA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=eXpYa3Qsx0FBgMqDi0pVO5hbwnlso8LCeZKlHmlIl+Q=; b=Q/f7Xorc3hHHJcbDmxSFkVrepb94y/7/w4ESKlotCT/rvUNzu45otPau359Eavjft2hjQpLlprduz1ft1XtxyfSBWgvBiRc/kIIVgpd7DxEQRF5+3RIB05z5MiWd9MLI4pvhFxYXFdlKTye752+OhuG9o3i5agz/Eh8r8eRRUkI= Received: from BN9PR03CA0951.namprd03.prod.outlook.com (2603:10b6:408:108::26) by DM4PR12MB5310.namprd12.prod.outlook.com (2603:10b6:5:39e::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.14; Fri, 17 Dec 2021 22:28:16 +0000 Received: from BN8NAM11FT055.eop-nam11.prod.protection.outlook.com (2603:10b6:408:108:cafe::e0) by BN9PR03CA0951.outlook.office365.com (2603:10b6:408:108::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4801.14 via Frontend Transport; Fri, 17 Dec 2021 22:28:16 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; Received: from SATLEXMB03.amd.com (165.204.84.17) by BN8NAM11FT055.mail.protection.outlook.com (10.13.177.62) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4801.14 via Frontend Transport; Fri, 17 Dec 2021 22:28:16 +0000 Received: from agrodzovsky-All-Series.amd.com (10.180.168.240) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Fri, 17 Dec 2021 16:28:15 -0600 From: Andrey Grodzovsky To: , Subject: [RFC 6/6] drm/amdgpu: Drop concurrent GPU reset protection for device Date: Fri, 17 Dec 2021 17:27:45 -0500 Message-ID: <20211217222745.881637-7-andrey.grodzovsky@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211217222745.881637-1-andrey.grodzovsky@amd.com> References: <20211217222745.881637-1-andrey.grodzovsky@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB03.amd.com (10.181.40.144) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: cfd1b459-83e0-470a-8966-08d9c1ac850f X-MS-TrafficTypeDiagnostic: DM4PR12MB5310:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:8882; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: uZRnGe4LK0bAnZOM0AVK4oUM4+EUG/lopp3tr4Rpdu3xREb6tQdHloEPR39LYdT5BCBnoKml1pR8U0Hx/LXNZD/QcfRwGZVF7eUjfs1sGBXp2Qxo+jz1mQNTOLJGjCA39/JmYpVeyRZV/YVf3nHQ7rD+YbOP3QA3A7VBA9ZFDnT8qNyhizVgql1xNQRwSHm/fwIL+R6FPUmk1N9hb2+KYkN4wpF1/qyehXRIeB65TsoaWBm4fFD9Gc8t9RbML/MKUKIeneIujvMXV6nfhAYeTcvc6HHstFROYQh7yf9YB1OBQPcKgs4g2p51XJfXMISyItLDRsNzmYHjGvCLxdsYhxPPBCavActAe2e1se1qLRg035GgoqBoaD6KosGE8srE+jpKGEGHHEnX7R6ob6WOK7WSF/bGJjYxkZzpdadUCtjUczS13O3qhgujvJQsbKKkSsmRggSG6cYGLNck+rK4vHYA7/7i6HmXY7WogHyWaQNJaO2k4uk/5n3Wt9Tq7kebS50yHBAoutEJnwMzgZmnqyUM8EM7rFqu1C1b16Iev4YmnY5kPtbFkGVLPouxnTi5HOl8n05zX1bCA0nv2Y1cLLdswTEzTVnB0BnS/Ix18QZINIVbbhSeeU8WArdjogDfdAjMdBt1oSv2LEjXHgu3+uv8m3n6RfFV2MCcevVHHcNv4mLlNoMpCY1L4z5tFRio8+UCAs55Ct+COxFBf55ud2fWWX1hCErQ54Y9GeFiQgKvaWwtgIq7iE2FpPUJjFRudBnMDxO8s9KT96X2lA8YlX7Td4LENxdL4TfvIRUkVOQ= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB03.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(40470700001)(40460700001)(5660300002)(1076003)(44832011)(186003)(36860700001)(86362001)(336012)(16526019)(82310400004)(81166007)(47076005)(4326008)(6666004)(36756003)(316002)(356005)(110136005)(8936002)(7696005)(70586007)(83380400001)(2616005)(426003)(8676002)(508600001)(2906002)(70206006)(26005)(54906003)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Dec 2021 22:28:16.0090 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: cfd1b459-83e0-470a-8966-08d9c1ac850f X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT055.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB5310 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Since now all GPU resets are serialzied there is no need for this. This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout' Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 89 ++-------------------- 1 file changed, 7 insertions(+), 82 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index d2701e4d0622..311e0b9e1e4f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4763,11 +4763,10 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, return r; } -static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, +static void amdgpu_device_lock_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) { - if (atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) != 0) - return false; + atomic_set(&adev->in_gpu_reset, 1); if (hive) { down_write_nest_lock(&adev->reset_sem, &hive->hive_lock); @@ -4786,8 +4785,6 @@ static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, adev->mp1_state = PP_MP1_STATE_NONE; break; } - - return true; } static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) @@ -4798,46 +4795,6 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) up_write(&adev->reset_sem); } -/* - * to lockup a list of amdgpu devices in a hive safely, if not a hive - * with multiple nodes, it will be similar as amdgpu_device_lock_adev. - * - * unlock won't require roll back. - */ -static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive) -{ - struct amdgpu_device *tmp_adev = NULL; - - if (adev->gmc.xgmi.num_physical_nodes > 1) { - if (!hive) { - dev_err(adev->dev, "Hive is NULL while device has multiple xgmi nodes"); - return -ENODEV; - } - list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) { - if (!amdgpu_device_lock_adev(tmp_adev, hive)) - goto roll_back; - } - } else if (!amdgpu_device_lock_adev(adev, hive)) - return -EAGAIN; - - return 0; -roll_back: - if (!list_is_first(&tmp_adev->gmc.xgmi.head, &hive->device_list)) { - /* - * if the lockup iteration break in the middle of a hive, - * it may means there may has a race issue, - * or a hive device locked up independently. - * we may be in trouble and may not, so will try to roll back - * the lock and give out a warnning. - */ - dev_warn(tmp_adev->dev, "Hive lock iteration broke in the middle. Rolling back to unlock"); - list_for_each_entry_continue_reverse(tmp_adev, &hive->device_list, gmc.xgmi.head) { - amdgpu_device_unlock_adev(tmp_adev); - } - } - return -EAGAIN; -} - static void amdgpu_device_resume_display_audio(struct amdgpu_device *adev) { struct pci_dev *p = NULL; @@ -5023,22 +4980,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, reset_context.hive = hive; clear_bit(AMDGPU_NEED_FULL_RESET, &reset_context.flags); - /* - * lock the device before we try to operate the linked list - * if didn't get the device lock, don't touch the linked list since - * others may iterating it. - */ - r = amdgpu_device_lock_hive_adev(adev, hive); - if (r) { - dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress", - job ? job->base.id : -1); - - /* even we skipped this reset, still need to set the job to guilty */ - if (job && job->vm) - drm_sched_increase_karma(&job->base); - goto skip_recovery; - } - /* * Build list of devices to reset. * In case we are in XGMI hive mode, resort the device list @@ -5058,6 +4999,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, /* block all schedulers and reset given job's ring */ list_for_each_entry(tmp_adev, device_list_handle, reset_list) { + + amdgpu_device_lock_adev(tmp_adev, hive); + /* * Try to put the audio codec into suspend state * before gpu reset started. @@ -5208,13 +5152,12 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, amdgpu_device_unlock_adev(tmp_adev); } -skip_recovery: if (hive) { mutex_unlock(&hive->hive_lock); amdgpu_put_xgmi_hive(hive); } - if (r && r != -EAGAIN) + if (r) dev_info(adev->dev, "GPU reset end with ret = %d\n", r); return r; } @@ -5437,20 +5380,6 @@ int amdgpu_device_baco_exit(struct drm_device *dev) return 0; } -static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev) -{ - int i; - - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { - struct amdgpu_ring *ring = adev->rings[i]; - - if (!ring || !ring->sched.thread) - continue; - - cancel_delayed_work_sync(&ring->sched.work_tdr); - } -} - /** * amdgpu_pci_error_detected - Called when a PCI error is detected. * @pdev: PCI device struct @@ -5481,14 +5410,10 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta /* Fatal error, prepare for slot reset */ case pci_channel_io_frozen: /* - * Cancel and wait for all TDRs in progress if failing to - * set adev->in_gpu_reset in amdgpu_device_lock_adev - * * Locking adev->reset_sem will prevent any external access * to GPU during PCI error recovery */ - while (!amdgpu_device_lock_adev(adev, NULL)) - amdgpu_cancel_all_tdr(adev); + amdgpu_device_lock_adev(adev, NULL); /* * Block any work scheduling as we do for regular GPU reset