From patchwork Thu Jun 13 17:00:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ben Skeggs X-Patchwork-Id: 13697158 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D04B4C27C6E for ; Thu, 13 Jun 2024 17:01:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1234A10EB2D; Thu, 13 Jun 2024 17:01:19 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.b="XdBPFMZu"; dkim-atps=neutral Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2056.outbound.protection.outlook.com [40.107.92.56]) by gabe.freedesktop.org (Postfix) with ESMTPS id A3B8B10EB1A; Thu, 13 Jun 2024 17:00:21 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hP9FuqcoHPWngXyQjpVNn5QRzsEBK43ckaQ8vS0fcupqbo4ovo5VCfTJavinc4adjQjQ7C9PHrQ5OPNa9pPCZLcIpTGtAeXZmYd4Vtwo2iyqsozp8qI6Zcxm5A74NY9Pp1igqumPhPsbhlqtl/x1mr1FL6vt2Y9BeJ/dpbqHjUSTD9z0edKt3ndOOgnggbzExwt0V1lhMziuDat71VEL9g/nzM7Wd6xate9oeilEARRpkSwwk8/Wo+krjl4fyA8ItRDz/ZA1clGfQ7L0CluHYxPCXhgBYJ66ya2tM03gVEnORtvATfCMs7DL/2E782S3LGy/qyZ97LA/CKZ+ve1m6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=AV1vuEFMovDfxPLy3ODtP2O9GA0nTTNiggIUDi0TUCI=; b=imnKXpc2zpk/DbSUcdjeN6EEVKPRq1YIxl/x56ZReJqGAoYdJEj9N9Q20GAh/JPVYfdERbz5OFtYPLWlWFCdzehwQMXlB8tVY481J/iS8N5keVC8UBVVloONQn1CNzb4Y5ykZ1TA8gBo1qI00rh0jiAnmjQWr/8kGT6AD3BR4FShxTLtSo0GPggfDXvInBvI/DFr9BkyfmKNwCmPLx4xqm7om0B0jxwisN7r1SkFvyLZxmYRZMmH7/EeBrlxX2Rs180QoiKc+lJ87NvWNPM2wFNEv0CeIZz6hZ1iitgV3N87CiUJw+co/3NZH31HoChyet6zJ0JcLaR8afaTcyNtCg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AV1vuEFMovDfxPLy3ODtP2O9GA0nTTNiggIUDi0TUCI=; b=XdBPFMZuJr0OI650WOJFsGdgNEogA5BXhIN3nz42koOLd3X8B9UJ7+CVpfJckVgLbIZYiimeGyGXFmY6qXvSlJVbGl7NQ9pXkmr1bwjiUmdQhflhmKQTk+VvzJDVMv11kAtbn0LKe2BTSRYAizO4RtvgLR3DPY75Gv9ErX+FCqSNVGofoPYPLUQmIHT9W2yUdBGrm/UpKvVVseJsL+jyODcL4KSrFts6xV/TYizHpt9YcDGsOOgmhFVuQLh2vBvirTQt1+28mM+GZNZrAnxULoQCPU3uNehC+Cu7iGaRb0ZjlBSw5hCJxrUrPxP8IlSFp9NAxknCeKEgblfOzZiVaQ== Received: from SN4PR0501CA0066.namprd05.prod.outlook.com (2603:10b6:803:41::43) by DM4PR12MB6109.namprd12.prod.outlook.com (2603:10b6:8:ae::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7633.37; Thu, 13 Jun 2024 17:00:13 +0000 Received: from SA2PEPF00003F61.namprd04.prod.outlook.com (2603:10b6:803:41:cafe::59) by SN4PR0501CA0066.outlook.office365.com (2603:10b6:803:41::43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.21 via Frontend Transport; Thu, 13 Jun 2024 17:00:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SA2PEPF00003F61.mail.protection.outlook.com (10.167.248.36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.15 via Frontend Transport; Thu, 13 Jun 2024 17:00:13 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Thu, 13 Jun 2024 10:00:00 -0700 Received: from fedora.mshome.net (10.126.230.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Thu, 13 Jun 2024 09:59:59 -0700 From: Ben Skeggs To: , CC: Ben Skeggs Subject: [PATCH 16/21] drm/nouveau/nvkm: move pci probe() runpm quirk from drm Date: Fri, 14 Jun 2024 03:00:08 +1000 Message-ID: <20240613170046.88687-17-bskeggs@nvidia.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240613170046.88687-1-bskeggs@nvidia.com> References: <20240613170046.88687-1-bskeggs@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.230.35] X-ClientProxiedBy: rnnvmail201.nvidia.com (10.129.68.8) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA2PEPF00003F61:EE_|DM4PR12MB6109:EE_ X-MS-Office365-Filtering-Correlation-Id: 2eb20acc-d755-420c-c325-08dc8bca4acd X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230035|376009|1800799019|82310400021|36860700008; X-Microsoft-Antispam-Message-Info: =?utf-8?q?xsNTWv33IykfUq/r5bmL7vk5hkJllkg?= =?utf-8?q?D8zWOC1O9FzNyFPiCAmhTXeOR3A+CM5rfjT6OwG3fgJ91HAPeX6lCVQsbi85NM4/W?= =?utf-8?q?748d+rbWvVLnbLqh+bynhdggjV2nNDmz2nn+efPZvW7iJ7He9iwgx3X8u4wm05Gcx?= =?utf-8?q?+ToG+E+bVdVuYvgBnF+y0sPHOY6OJ/rdU1Aaw/uvB3x+5LPBortcTMIAP36w1Ak2N?= =?utf-8?q?WdbsCmYjyhoTw3JI/Loaon3F5OPXlJUOjtMxK6o7Q9l7ncltdBTgkCxtT3C2KO1VN?= =?utf-8?q?q+2d9OVZMLcVDu26VGR196X0V4NFY0JoPLC9fiMdnIB3NGdY/veCf6NfMBpe9mb75?= =?utf-8?q?nRc3dlNwBt7bUx/BaPTPVil2bq+ARwbYCDoCrxlnh3RaPPH/w/RKLzKfCAI5G+Qmx?= =?utf-8?q?nw06Vvy3IIAAGg001UJYtkWEeap5RMBni0qlv62hEKD3K486yKjfhayR4wWPpeWNk?= =?utf-8?q?mEM4WcNYqq4tPo/Uzbdz96GsqOokNjDccJO8kCH+xwy35J4MjntWewAGtyZftUoxI?= =?utf-8?q?zdMq+d0J0+tntDdiNJ9fbPk/5nuuya9r6OeVgFfHyhMocaDTO8Ggjogknk+kNiX4k?= =?utf-8?q?mKDfJv98bPl+82YPDoF1CRkKjJ+vPKofO6rv6obKXMMba1dUU3H5xT339KtODFBb5?= =?utf-8?q?/Ez1LR66i20RIjFWHXnJ3BDW9UKy8FupYHUTCxwptHWPVdZGz7b9+Dqfqrz16h4Ez?= =?utf-8?q?/oJUzKSEdQYg5pQUFLc0WN7sUYT5nOk/90Nne78soBzbPFQ+BI0h/kkRyXUa7eTD+?= =?utf-8?q?scVaWcEnkVP/J565oOwJdPAPi1zhcgQzYDonOfkkBDzAKXHDeRu1byFJqbhGsjsVj?= =?utf-8?q?F7BsRtpZuVfZeERYOy/2gEDKCpV6jdPB7eBhaisAy3KNDUboHm4DJJFfKrKqi6Kmn?= =?utf-8?q?YmG59bTeHtTyRBlYTJzQtYjOYJxBMw9vu+NHUBYsy3s49cQ0xrs/vGK9J2/zFh0TG?= =?utf-8?q?5c6MQWGsGa/WBJkObnNAqoLKiAztELVhH7v1rHBV7hMo/9MwZvCsgSJVEOTNxeu1H?= =?utf-8?q?8TJceLDVQtIMCd0FPocvDM1+pJFlC3/M3KlIKFeTxfqH4jZsd+NSWDMigSjnudQwT?= =?utf-8?q?7TdOnqIppbC+6Jx0YPJtoHDoIvK/cUff5o0Z/TSqhkmuMDhaDW2UOlIUSiidJJumr?= =?utf-8?q?6vZFg9LD2SjmTXM24M6/yYcnhw3yCz3EVrduXTNM4DcGF0aVrqbx3Ubh3911oMXiy?= =?utf-8?q?rb1cY/KBRld8OAzT7libE4iWM5sFONz7wsGjAZGTzono1lGpBBhVeDD7QVtQ4+Y1K?= =?utf-8?q?Yqct+0J/cNYk3KJyLE6fjRyCXyL7m0f44NRrTjROYXUXP+3ntCUlyT+pk9kbBuxHx?= =?utf-8?q?Mwkx81uDkz0w3J8MGN/J8l+D0GstMMw2Bw=3D=3D?= X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230035)(376009)(1800799019)(82310400021)(36860700008); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Jun 2024 17:00:13.3015 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2eb20acc-d755-420c-c325-08dc8bca4acd X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SA2PEPF00003F61.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB6109 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Signed-off-by: Ben Skeggs --- .../gpu/drm/nouveau/include/nvkm/core/pci.h | 2 + drivers/gpu/drm/nouveau/nouveau_drm.c | 61 ------------------ drivers/gpu/drm/nouveau/nouveau_drv.h | 2 - drivers/gpu/drm/nouveau/nvkm/device/pci.c | 62 +++++++++++++++++++ 4 files changed, 64 insertions(+), 63 deletions(-) diff --git a/drivers/gpu/drm/nouveau/include/nvkm/core/pci.h b/drivers/gpu/drm/nouveau/include/nvkm/core/pci.h index 0797225ab038..95deea8c65ff 100644 --- a/drivers/gpu/drm/nouveau/include/nvkm/core/pci.h +++ b/drivers/gpu/drm/nouveau/include/nvkm/core/pci.h @@ -7,6 +7,8 @@ struct nvkm_device_pci { struct nvkm_device device; struct pci_dev *pdev; + u8 old_pm_cap; + struct dev_pm_domain vga_pm_domain; }; diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 4bcfc2291c4d..76eddf172bb5 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -731,63 +731,6 @@ nouveau_drm_device_new(const struct drm_driver *drm_driver, struct device *paren return ret ? ERR_PTR(ret) : drm; } -/* - * On some Intel PCIe bridge controllers doing a - * D0 -> D3hot -> D3cold -> D0 sequence causes Nvidia GPUs to not reappear. - * Skipping the intermediate D3hot step seems to make it work again. This is - * probably caused by not meeting the expectation the involved AML code has - * when the GPU is put into D3hot state before invoking it. - * - * This leads to various manifestations of this issue: - * - AML code execution to power on the GPU hits an infinite loop (as the - * code waits on device memory to change). - * - kernel crashes, as all PCI reads return -1, which most code isn't able - * to handle well enough. - * - * In all cases dmesg will contain at least one line like this: - * 'nouveau 0000:01:00.0: Refused to change power state, currently in D3' - * followed by a lot of nouveau timeouts. - * - * In the \_SB.PCI0.PEG0.PG00._OFF code deeper down writes bit 0x80 to the not - * documented PCI config space register 0x248 of the Intel PCIe bridge - * controller (0x1901) in order to change the state of the PCIe link between - * the PCIe port and the GPU. There are alternative code paths using other - * registers, which seem to work fine (executed pre Windows 8): - * - 0xbc bit 0x20 (publicly available documentation claims 'reserved') - * - 0xb0 bit 0x10 (link disable) - * Changing the conditions inside the firmware by poking into the relevant - * addresses does resolve the issue, but it seemed to be ACPI private memory - * and not any device accessible memory at all, so there is no portable way of - * changing the conditions. - * On a XPS 9560 that means bits [0,3] on \CPEX need to be cleared. - * - * The only systems where this behavior can be seen are hybrid graphics laptops - * with a secondary Nvidia Maxwell, Pascal or Turing GPU. It's unclear whether - * this issue only occurs in combination with listed Intel PCIe bridge - * controllers and the mentioned GPUs or other devices as well. - * - * documentation on the PCIe bridge controller can be found in the - * "7th Generation IntelĀ® Processor Families for H Platforms Datasheet Volume 2" - * Section "12 PCI Express* Controller (x16) Registers" - */ - -static void quirk_broken_nv_runpm(struct pci_dev *pdev) -{ - struct nouveau_drm *drm = pci_get_drvdata(pdev); - struct pci_dev *bridge = pci_upstream_bridge(pdev); - - if (!bridge || bridge->vendor != PCI_VENDOR_ID_INTEL) - return; - - switch (bridge->device) { - case 0x1901: - drm->old_pm_cap = pdev->pm_cap; - pdev->pm_cap = 0; - NV_INFO(drm, "Disabling PCI power management to avoid bug\n"); - break; - } -} - static int nouveau_drm_probe(struct pci_dev *pdev, const struct pci_device_id *pent) { @@ -822,7 +765,6 @@ static int nouveau_drm_probe(struct pci_dev *pdev, else drm_fbdev_ttm_setup(drm->dev, 32); - quirk_broken_nv_runpm(pdev); return 0; fail_drm: @@ -846,9 +788,6 @@ nouveau_drm_remove(struct pci_dev *pdev) { struct nouveau_drm *drm = pci_get_drvdata(pdev); - /* revert our workaround */ - if (drm->old_pm_cap) - pdev->pm_cap = drm->old_pm_cap; nouveau_drm_device_remove(drm); nvkm_device_pci_driver.remove(pdev); diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index b44f0d408ccc..9ca0f6ab4359 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -218,8 +218,6 @@ struct nouveau_drm { */ struct mutex clients_lock; - u8 old_pm_cap; - struct { struct agp_bridge_data *bridge; u32 base; diff --git a/drivers/gpu/drm/nouveau/nvkm/device/pci.c b/drivers/gpu/drm/nouveau/nvkm/device/pci.c index d9b8e3bc4169..d454d56a7909 100644 --- a/drivers/gpu/drm/nouveau/nvkm/device/pci.c +++ b/drivers/gpu/drm/nouveau/nvkm/device/pci.c @@ -1598,6 +1598,11 @@ static void * nvkm_device_pci_dtor(struct nvkm_device *device) { struct nvkm_device_pci *pdev = nvkm_device_pci(device); + + /* revert our workaround */ + if (pdev->old_pm_cap) + pdev->pdev->pm_cap = pdev->old_pm_cap; + pci_disable_device(pdev->pdev); return pdev; } @@ -1624,6 +1629,62 @@ nvkm_device_pci_remove(struct pci_dev *dev) nvkm_device_del(&device); } +/* + * On some Intel PCIe bridge controllers doing a + * D0 -> D3hot -> D3cold -> D0 sequence causes Nvidia GPUs to not reappear. + * Skipping the intermediate D3hot step seems to make it work again. This is + * probably caused by not meeting the expectation the involved AML code has + * when the GPU is put into D3hot state before invoking it. + * + * This leads to various manifestations of this issue: + * - AML code execution to power on the GPU hits an infinite loop (as the + * code waits on device memory to change). + * - kernel crashes, as all PCI reads return -1, which most code isn't able + * to handle well enough. + * + * In all cases dmesg will contain at least one line like this: + * 'nouveau 0000:01:00.0: Refused to change power state, currently in D3' + * followed by a lot of nouveau timeouts. + * + * In the \_SB.PCI0.PEG0.PG00._OFF code deeper down writes bit 0x80 to the not + * documented PCI config space register 0x248 of the Intel PCIe bridge + * controller (0x1901) in order to change the state of the PCIe link between + * the PCIe port and the GPU. There are alternative code paths using other + * registers, which seem to work fine (executed pre Windows 8): + * - 0xbc bit 0x20 (publicly available documentation claims 'reserved') + * - 0xb0 bit 0x10 (link disable) + * Changing the conditions inside the firmware by poking into the relevant + * addresses does resolve the issue, but it seemed to be ACPI private memory + * and not any device accessible memory at all, so there is no portable way of + * changing the conditions. + * On a XPS 9560 that means bits [0,3] on \CPEX need to be cleared. + * + * The only systems where this behavior can be seen are hybrid graphics laptops + * with a secondary Nvidia Maxwell, Pascal or Turing GPU. It's unclear whether + * this issue only occurs in combination with listed Intel PCIe bridge + * controllers and the mentioned GPUs or other devices as well. + * + * documentation on the PCIe bridge controller can be found in the + * "7th Generation IntelĀ® Processor Families for H Platforms Datasheet Volume 2" + * Section "12 PCI Express* Controller (x16) Registers" + */ + +static void quirk_broken_nv_runpm(struct nvkm_device_pci *pdev) +{ + struct pci_dev *bridge = pci_upstream_bridge(pdev->pdev); + + if (!bridge || bridge->vendor != PCI_VENDOR_ID_INTEL) + return; + + switch (bridge->device) { + case 0x1901: + pdev->old_pm_cap = pdev->pdev->pm_cap; + pdev->pdev->pm_cap = 0; + nvdev_info(&pdev->device, "Disabling PCI power management to avoid bug\n"); + break; + } +} + static int nvkm_device_pci_probe(struct pci_dev *pci_dev, const struct pci_device_id *id) { @@ -1701,6 +1762,7 @@ nvkm_device_pci_probe(struct pci_dev *pci_dev, const struct pci_device_id *id) pdev->device.mmu->dma_bits = 32; } + quirk_broken_nv_runpm(pdev); done: if (ret) { nvkm_device_del(&device);