From patchwork Wed May 18 11:16:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12853499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 731BBC433EF for ; Wed, 18 May 2022 11:16:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235388AbiERLQi (ORCPT ); Wed, 18 May 2022 07:16:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235372AbiERLQh (ORCPT ); Wed, 18 May 2022 07:16:37 -0400 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2070.outbound.protection.outlook.com [40.107.236.70]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0256466CA4; Wed, 18 May 2022 04:16:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DlUKKe4ppBGEXRdFYs+dgfALdhkpZDEdQOzB4pKShX6wHTsbUMgqZ2i7bTGr91Pm+jAqbOAZv1BRmR975K4YrAIcIGjOWWJb3nl8tYnVQpMV6Ey089RUGs4SbkKkg5cma2xBj8jHB7sb+aV67DGyX+n64DfQvoZEsV17ADyWwbZWVVJDTDeRabsEIFJEiXqjMXCNuOU8RE/KO2zZB7pah357jgt1XJBW9a8TCD+1qYmKhsPwnCuQsHtaMwixB37I6Fl2FBvbU2c9dtFvrbIlqCiCayRpDolPWkUuNYn1Wj4XU5ki1aFY4tU7NmTC1r+anoSxuMkNMSBhuzvmfCZ2Iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zu2B41LaYkojggHM8c3io4sogUWsky60hLpgONN/rWQ=; b=jSTrNvDft68OOT04ev1u/xCLwolscS70xC8y/st7JRUoMsXfA0LtRp6WnRk86aSZN9V4/fKX7FWCNjX+pPRO8COUB+xrFc9WJUDgpJO/Gut8RzCbZ4B+7C20OqhRNfLC9BCMzqalfvv+w6X3evSOXpKBTQy1c+pMyluUtkwd3QcvsGitdYoOQvH9gTfnuTT9aa5KMUAOUAoBgRacqcUg94dlywgx1WLKFTM8h53sPFiS6XhpszMjUs3XAPIrfeHeAwUNIvp9uk1N6pWGs8hSUNN0qCrTxmhLUuSbkhwXkthUbJer1MRvD47zHyiwZJeqfN8s7+Oqin0JrN0WP54esQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.234) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zu2B41LaYkojggHM8c3io4sogUWsky60hLpgONN/rWQ=; b=A7GQm9RyDI5rdWwGNNXmBZZCxr8ptGsIEasgceWeaVJW1T2dBf/YJygVQFw0Z/mmhBLM5a2W5G1zuBa/QOTRNc8WhjxfZShUxxKcawh4O/w2PWKpy8PrqOSehwhRCjdRVttkctIxfkhgNtz/fSVOJknN882S7CSFdhi/Uv3y8cPOI370LGgWzblnhZSfI0tH8fTIeaip+Q+LkLa9dRkwfJRV9CPEGLi7d5wNr+aWe0qnsdXcq1u8ZRRXJGqRtQ1kgmyYZ63+HeufPA2Z1wdSmhqSRd8KZ9LBR7PenVoXJwqjzNrIaTTLBFHvK3JyE1YGCBeu6LHn9EDvxQYsWEKNfg== Received: from MW4PR03CA0316.namprd03.prod.outlook.com (2603:10b6:303:dd::21) by MN0PR12MB6199.namprd12.prod.outlook.com (2603:10b6:208:3c4::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5273.13; Wed, 18 May 2022 11:16:27 +0000 Received: from CO1NAM11FT029.eop-nam11.prod.protection.outlook.com (2603:10b6:303:dd:cafe::e) by MW4PR03CA0316.outlook.office365.com (2603:10b6:303:dd::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5273.13 via Frontend Transport; Wed, 18 May 2022 11:16:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.234) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.234 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.234; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.234) by CO1NAM11FT029.mail.protection.outlook.com (10.13.174.214) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5273.14 via Frontend Transport; Wed, 18 May 2022 11:16:26 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by DRHQMAIL101.nvidia.com (10.27.9.10) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Wed, 18 May 2022 11:16:26 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 18 May 2022 04:16:25 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Wed, 18 May 2022 04:16:20 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v5 1/4] vfio/pci: Invalidate mmaps and block the access in D3hot power state Date: Wed, 18 May 2022 16:46:09 +0530 Message-ID: <20220518111612.16985-2-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220518111612.16985-1-abhsahu@nvidia.com> References: <20220518111612.16985-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 928219ec-31b7-4b41-c687-08da38bfd983 X-MS-TrafficTypeDiagnostic: MN0PR12MB6199:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: vwKwusKU8bDPgvQxqibU2eg38+fTTlcC04DDQjYfyUIsiQJU1FPIaMfMiP1IpLaskcrBpoTcgAgkyUynU9bcEni3B6NelFJnt1bkEg+/hXpwyqhldKZnCPIpoHrGMNIJo/AyL9gthxFovB9dYy1AYFkWTLqXOSZPsUa+Ao2cF9t+13YK6j5oczpUCzxRHK3jk7m1l3Gw+ALxpyhe/IRJv30c7bCNJyMRGWBfRWgLDQwlP2tQWGFa7fVA7+Yo9BNkqsX2hXqSKPsudfDdgj0pht9MEPRu7cE8QJ7A8tvBwtDtKXeP3n8yaEOlzKdd4CKSPPrv94Uiepdb7wdSEHZgKpnJL4sgzOyCWJVrijVovXGcazQhvRUlPERrXQYEqBlKZsuX5SXcYPjg8L2NfcFYepHuzqj6i85RBr7dVhb0AJkBjb7hb4mDP94HXYM5lXR96WgI+Q3635jHb1oE77ZKGgdbrFUmllx72CSnxMrYBoYPLN8kroN7s3gZHUuPH/liuukN+oAVuYZu+jtPWfk6vCbiQAaHiKkC6oxsRRXqhYNT2QJrrHouqWsL29pIUiNEDk3e7uQn9+OmNkd0pkOxs7AKjrzhWc59cYuUR4gud16e+huEW7AuhD40jES7LqZKA6gu37kOISJ+7UPLd01j7tZFFgrniGobx56LY0uULDaUbMNJDF18cSeToHyj1Hh0D48kOQz9bGW3JNcTGSqTIJEL0oGxoGEb4mb9SwzZCTI= X-Forefront-Antispam-Report: CIP:12.22.5.234;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(46966006)(40470700004)(1076003)(107886003)(2616005)(82310400005)(83380400001)(70586007)(508600001)(8936002)(40460700003)(7416002)(356005)(2906002)(4326008)(36860700001)(8676002)(26005)(336012)(426003)(186003)(70206006)(81166007)(86362001)(54906003)(316002)(110136005)(36756003)(47076005)(6666004)(7696005)(5660300002)(32563001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 May 2022 11:16:26.4453 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 928219ec-31b7-4b41-c687-08da38bfd983 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.234];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT029.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB6199 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org According to [PCIe v5 5.3.1.4.1] for D3hot state "Configuration and Message requests are the only TLPs accepted by a Function in the D3Hot state. All other received Requests must be handled as Unsupported Requests, and all received Completions may optionally be handled as Unexpected Completions." Currently, if the vfio PCI device has been put into D3hot state and if user makes non-config related read/write request in D3hot state, these requests will be forwarded to the host and this access may cause issues on a few systems. This patch leverages the memory-disable support added in commit 'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")' to generate page fault on mmap access and return error for the direct read/write. If the device is D3hot state, then the error will be returned for MMIO access. The IO access generally does not make the system unresponsive so the IO access can still happen in D3hot state. The default value should be returned in this case without bringing down the complete system. Also, the power related structure fields need to be protected so we can use the same 'memory_lock' to protect these fields also. This protection is mainly needed when user changes the PCI power state by writing into PCI_PM_CTRL register. vfio_lock_and_set_power_state() wrapper function will take the required locks and then it will invoke the vfio_pci_set_power_state(). Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_config.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 6e58b4bf7a60..ea7d2306ba9d 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -402,11 +402,14 @@ bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev) u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]); /* + * Memory region cannot be accessed if device power state is D3. + * * SR-IOV VF memory enable is handled by the MSE bit in the * PF SR-IOV capability, there's therefore no need to trigger * faults based on the virtual value. */ - return pdev->no_command_memory || (cmd & PCI_COMMAND_MEMORY); + return pdev->current_state < PCI_D3hot && + (pdev->no_command_memory || (cmd & PCI_COMMAND_MEMORY)); } /* @@ -692,6 +695,22 @@ static int __init init_pci_cap_basic_perm(struct perm_bits *perm) return 0; } +/* + * It takes all the required locks to protect the access of power related + * variables and then invokes vfio_pci_set_power_state(). + */ +static void vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev, + pci_power_t state) +{ + if (state >= PCI_D3hot) + vfio_pci_zap_and_down_write_memory_lock(vdev); + else + down_write(&vdev->memory_lock); + + vfio_pci_set_power_state(vdev, state); + up_write(&vdev->memory_lock); +} + static int vfio_pm_config_write(struct vfio_pci_core_device *vdev, int pos, int count, struct perm_bits *perm, int offset, __le32 val) @@ -718,7 +737,7 @@ static int vfio_pm_config_write(struct vfio_pci_core_device *vdev, int pos, break; } - vfio_pci_set_power_state(vdev, state); + vfio_lock_and_set_power_state(vdev, state); } return count; From patchwork Wed May 18 11:16:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12853517 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2B69C4332F for ; Wed, 18 May 2022 11:17:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235502AbiERLRc (ORCPT ); Wed, 18 May 2022 07:17:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235506AbiERLR2 (ORCPT ); Wed, 18 May 2022 07:17:28 -0400 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2050.outbound.protection.outlook.com [40.107.244.50]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F051E8CB08; Wed, 18 May 2022 04:17:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IYYscXaCNGegs6+HVjocVG70uSNIjvgxwaSNrYJl7lR6Cp/93juNbbdEmJ5idfEZLPxyoQt+aNHUWSBMJyi03Rznhcmwx4eEjRNJ477vNcOcQaxCupXP+n9orYZsYD3rY/28fUyaOByn8IHKzi2prtdnOdQJR6pe3gVkcUaKTCnJkAZm1qKeOM5NqeenLYzFCsG+qLvBJCKpDGQ1/ZrEs76pEM8VfKYkl/RvuQWkITzMrxIHSPCkyEVgSOxj3TH4mw7AdsnUfHOscBydsml/0ImZXo3Ns1Y88qnwe/CDXFwbmk5+JSLkCmZTFwMz/KK5YGkqFMGAuJyegNCLiuiWSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tIrYyeuhK0wNwHuEr6Li1fl1j3IH9Aq9af1pGxkvlUQ=; b=g+pzKREo4V8MEsa52Qt7WmROJ7sOB2uuveTn3iO+EriVEOlJX/ipWcB6dSv0EG+LBbgEOSeWXHv457+OKQFQ1jrAdZHJPc7/MmMQWzpt0Sxi1rXE+zoVa0EdSqMivbdpjJDSZYo7fODcCx8nZVspivHa9Ce8AvwvekjsQVNTD1L8bi26uajzDdJ3VMAfs03qLqKh/t3HceiX7QdHF1u6RSZlJHv8wFCYoc9kQTVX3kk39nzVziL4z1UX2+RbXY6S/I5H7+TX4YDAAVq33aXLfCPGebm7YwdW88K8GRKpgjdEsizgRT5P7jo7MHVHD+UwL24LxInmOqW/JjQIhCO39w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.235) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tIrYyeuhK0wNwHuEr6Li1fl1j3IH9Aq9af1pGxkvlUQ=; b=sVukWzgGcz1v5kZdtUMed9/36RLNIrz764+Bw4L7FU9dmwx/ryMyk79eYzW7sVFqPRwbZCeZNpY1+wogBCD67+FHPxk606eu2iQ0YulyZ/qPTX/BOmanChI5twyl1Px95Af1VajmCtfAGQtaz61k9ejT5IuNloIv1B3UZHXkSQ97AhoXyNsJEBjsvhsrd3WYo50SrVhmrTuAOSLJSlYjY43SCe4OwFC1qbXm+JaxLW7O2e3qZ2ObJfcNsFCMxT1u3isoxLa6+rU6u5YA+5C8Byr1AQEdHDO0C16PC8c5ucae3hnbNB9X6JwQ1m5DOZ47D5q3i7n/gMkWT2qbAioubA== Received: from DS7PR03CA0212.namprd03.prod.outlook.com (2603:10b6:5:3ba::7) by CY4PR12MB1893.namprd12.prod.outlook.com (2603:10b6:903:127::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5273.13; Wed, 18 May 2022 11:17:22 +0000 Received: from DM6NAM11FT057.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3ba:cafe::9f) by DS7PR03CA0212.outlook.office365.com (2603:10b6:5:3ba::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5250.16 via Frontend Transport; Wed, 18 May 2022 11:17:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.235) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.235 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.235; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.235) by DM6NAM11FT057.mail.protection.outlook.com (10.13.172.252) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5273.14 via Frontend Transport; Wed, 18 May 2022 11:17:21 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by DRHQMAIL107.nvidia.com (10.27.9.16) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Wed, 18 May 2022 11:16:31 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 18 May 2022 04:16:31 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Wed, 18 May 2022 04:16:25 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v5 2/4] vfio/pci: Change the PF power state to D0 before enabling VFs Date: Wed, 18 May 2022 16:46:10 +0530 Message-ID: <20220518111612.16985-3-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220518111612.16985-1-abhsahu@nvidia.com> References: <20220518111612.16985-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 02f61f80-d2cb-436e-ec47-08da38bffa81 X-MS-TrafficTypeDiagnostic: CY4PR12MB1893:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: h6W0/6GCDYSIi8R/VLo+pC4XjZG8/nJ61+UasiWXaFzpbk7CWdwmPZJPcFmXW6iXGaLfJzwOpkPkkbmA0Px9yRymhR6gvOGa4eNinv40nKmNDLY53ekBKAR7EUjtI9Hj2DVYudY6m0Y9BqLzdeiENhd5yisdXjF3ZdUOH1k0a40SZqGb2rPxk3yuvahDt8llp/jCjN7OD8XjKAyd6sziqh4M8q6PDhE1tRaZVV6HKu0h9xsVIOMVB3LY9lSi551ZpeJJIyy/b5B9TqxKhkM8wLgZgS31BulCqTMAFMkBvNHA5szZgFI9QvFg5AG0tW82UwtO5dzoGNpMXgFyetxv7KoOzzmJ2hW+6m8gG84p+ke7E/MzwAu7EWrV8fsYtNQkLssujJoT9q/wmQIDv3RrHIF3RSAm/93lsInq8yqJTRzj2ECX9Yv0Y0TnBpGzAgRFtUoNNLjcvO5MjylW6MYyMP6gMU6aPFz99N3unYKBryIMnDPhXd/vk/BWos72Y8X1Vln/Heuto3Pm75LdSiOqAAC4r624XW9GHh+P9F3vJjaMqrhRppDgSqD2kpZSPDUSsUrRCQekDAWh9Pa0zrWMlRjHX91o1fcc6H/vUFxngSTaCYKZ+6poQAO4r0eKMCxDd4OUuiO49qnlAZLMWqXIgyjBCpNRVOYg3BdrPLW6KKBcQ90lde2aRULhmxWKadJws2Is+MTTQz+rKqXaQExvFg== X-Forefront-Antispam-Report: CIP:12.22.5.235;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(46966006)(40470700004)(36840700001)(2906002)(508600001)(110136005)(54906003)(7696005)(316002)(6666004)(8676002)(86362001)(356005)(70206006)(4326008)(81166007)(40460700003)(186003)(26005)(107886003)(5660300002)(36860700001)(336012)(426003)(1076003)(47076005)(82310400005)(7416002)(8936002)(2616005)(70586007)(36756003)(83380400001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 May 2022 11:17:21.7824 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 02f61f80-d2cb-436e-ec47-08da38bffa81 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.235];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT057.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR12MB1893 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org According to [PCIe v5 9.6.2] for PF Device Power Management States "The PF's power management state (D-state) has global impact on its associated VFs. If a VF does not implement the Power Management Capability, then it behaves as if it is in an equivalent power state of its associated PF. If a VF implements the Power Management Capability, the Device behavior is undefined if the PF is placed in a lower power state than the VF. Software should avoid this situation by placing all VFs in lower power state before lowering their associated PF's power state." From the vfio driver side, user can enable SR-IOV when the PF is in D3hot state. If VF does not implement the Power Management Capability, then the VF will be actually in D3hot state and then the VF BAR access will fail. If VF implements the Power Management Capability, then VF will assume that its current power state is D0 when the PF is D3hot and in this case, the behavior is undefined. To support PF power management, we need to create power management dependency between PF and its VF's. The runtime power management support may help with this where power management dependencies are supported through device links. But till we have such support in place, we can disallow the PF to go into low power state, if PF has VF enabled. There can be a case, where user first enables the VF's and then disables the VF's. If there is no user of PF, then the PF can put into D3hot state again. But with this patch, the PF will still be in D0 state after disabling VF's since detecting this case inside vfio_pci_core_sriov_configure() requires access to struct vfio_device::open_count along with its locks. But the subsequent patches related to runtime PM will handle this case since runtime PM maintains its own usage count. Also, vfio_pci_core_sriov_configure() can be called at any time (with and without vfio pci device user), so the power state change and SR-IOV enablement need to be protected with the required locks. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_core.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 05a3aa95ba52..9489ceea8875 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -217,6 +217,10 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat bool needs_restore = false, needs_save = false; int ret; + /* Prevent changing power state for PFs with VFs enabled */ + if (pci_num_vf(pdev) && state > PCI_D0) + return -EBUSY; + if (vdev->needs_pm_restore) { if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) { pci_save_state(pdev); @@ -1944,7 +1948,19 @@ int vfio_pci_core_sriov_configure(struct vfio_pci_core_device *vdev, } list_add_tail(&vdev->sriov_pfs_item, &vfio_pci_sriov_pfs); mutex_unlock(&vfio_pci_sriov_pfs_mutex); + + /* + * The PF power state should always be higher than the VF power + * state. If PF is in the low power state, then change the + * power state to D0 first before enabling SR-IOV. + * Also, this function can be called at any time, and userspace + * PCI_PM_CTRL write can race against this code path, + * so protect the same with 'memory_lock'. + */ + down_write(&vdev->memory_lock); + vfio_pci_set_power_state(vdev, PCI_D0); ret = pci_enable_sriov(pdev, nr_virtfn); + up_write(&vdev->memory_lock); if (ret) goto out_del; return nr_virtfn; From patchwork Wed May 18 11:16:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12853500 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09ECAC433EF for ; Wed, 18 May 2022 11:16:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235430AbiERLQq (ORCPT ); Wed, 18 May 2022 07:16:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235372AbiERLQm (ORCPT ); Wed, 18 May 2022 07:16:42 -0400 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2046.outbound.protection.outlook.com [40.107.223.46]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68C34674D3; Wed, 18 May 2022 04:16:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mqFQA87EnpQ09LC2vhedVnnDRPkJLmV2hX/U8n/v68EFvGxcmaPSyUER1WFUyk8SrsVCDNSZj8EPGwrbXKNXZ+HQ0hmx9shHGCinCRJxo//5PfbnJhUdQH4cuE/wBHpcV7fL1eBnKB8khBotOKJcVtu9OQA/6rWtmvyfdv8IzpX3Zv/EHXXL1NQ6mk/dDSMWkw4TGHk63b2e1vn6xSCmY/3RWQkt7b9G7Dld4v8Y1HbMpLfZyuLsrZl15ZQrQvp+AbJsOGGJrW9ZzuUMmQLPoWEojuKpBTSRL3OCc1fGzBbW8hHLMZ6zYyEKjSgStmXOVxwLY4QgR58UQG1HYIgceg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=85CCdCZQbNOGVJdgVRczXfrtu86zJq2RH8DcWHSPlo4=; b=URW+Co7uVl7iG86IYg8AFM8c2GMf7oc0i1v/PnQ23f1YNX11Xo9aF5nNd8sLCAITnqcUUhsi1oGAb0ti9f35mEbK4eLdrywVjZ4k1MUCv46Y0cavdiY1xv2m26o57/AGiqDgA4PelNATWOrCu1RWX9RC+m2wqmTAC9S0dhJ6ibYlnRIh5M5hwBeMAHEw+M4dbNYAe0Y2BDwHl4Ye7B7lSyvjmS6HCVZWCE+RNm3Pm8Usk3q4OgIR2AF/eY/FUG/fIjJgLyEsIQ/qvRE21ROb9UdKPhSyJWM7TEeD3zeEbJkDJPM1k5IiMTHaDOxCTQDRW6VSmOqgAZcMncWeSG4gMg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.238) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=85CCdCZQbNOGVJdgVRczXfrtu86zJq2RH8DcWHSPlo4=; b=eIxxKV1NkNGGebLmqBYHoIqJK8FapqEUme5vbWOxQhYqaaXG+A6NCbGMzhmQQt+is1fQdGP0yyzJn/g01kO0bUO3iheCo/YC9u/lLk7jtr69EPseBSceeBYJD4pz54uW+DntFlrtImm62eLJOvymwIQ9S7mkmeDMAHHjCWhMl2U4W87dkcBBoWnKRicZmYBV+nSkjuCIRnOqETMCrRTK9DUmrc5jpBiD4u2OmYia+j4H+N9XEFTVNEpIcn7kAPGXb/FZSJAv/ANs+tHWor2TpcXuDwUXzoorWYFBtmrZDScWD9Ee8lFtAnoRLhy/w63nnIDeG93wQOCCpP4nzMnuSg== Received: from DS7PR03CA0033.namprd03.prod.outlook.com (2603:10b6:5:3b5::8) by DM6PR12MB4042.namprd12.prod.outlook.com (2603:10b6:5:215::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5273.14; Wed, 18 May 2022 11:16:38 +0000 Received: from DM6NAM11FT006.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3b5:cafe::99) by DS7PR03CA0033.outlook.office365.com (2603:10b6:5:3b5::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5273.14 via Frontend Transport; Wed, 18 May 2022 11:16:38 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.238) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.238 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.238; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.238) by DM6NAM11FT006.mail.protection.outlook.com (10.13.173.104) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5273.14 via Frontend Transport; Wed, 18 May 2022 11:16:37 +0000 Received: from rnnvmail203.nvidia.com (10.129.68.9) by DRHQMAIL105.nvidia.com (10.27.9.14) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Wed, 18 May 2022 11:16:37 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail203.nvidia.com (10.129.68.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 18 May 2022 04:16:36 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Wed, 18 May 2022 04:16:31 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v5 3/4] vfio/pci: Virtualize PME related registers bits and initialize to zero Date: Wed, 18 May 2022 16:46:11 +0530 Message-ID: <20220518111612.16985-4-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220518111612.16985-1-abhsahu@nvidia.com> References: <20220518111612.16985-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 9e893369-4358-4441-460c-08da38bfe051 X-MS-TrafficTypeDiagnostic: DM6PR12MB4042:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 798b5BkDvjRrA+GXKiFmNeH+qD2R47wqJcx8X3yomIbQG0Fji3D4xAftqyilhRlWly528Rp+4/+ZuvhqNzFEafYPwUy+b32YavAohuZ5QUaXDO74CPzL2bX+Eov24LIxuqAenGDZ1P9Z5WGlO7J8JBeBdOkUW4+IqSGP2BsDqn6MBnO6OE4MNNqyVUoUY/uFlYyQpNUZm99wpE/dYyIy7MpmYwXhX0Cd5MOYOMjzS0/NATImHHAiT04PotIXgsZsrFBN7C5EYTOI/gLfwHvzLX9Desdh1kWHmkweUqEUMCts7q3yQawHwbRAgYNODQ5xq7O0zdgCBaiaVFBV9B4DfX+jzceoaVx7CTet4itLgPkxRjxJs61Egsss8oymO6dKXOScv8sw9stafnKBrGeLoYV5OOWToCJgT/8ew+aulHkmmG1gYf5qAISw3M/2iND6gWnBouqhc7JMlXqwsfjm8Ytnm7K6iOulxpHeYpxvI1l1D2AriGGP/j+cNo3pD0dxqTkhxQeRAmka+NfeMqhZH2Uq7gX2yPdhw54mF59232RnZgjmCF2+XdqWxMGevhcbL08ydwKUcLYyGV0rmcQ+cRc2X7WFKh7o9cod1GdEEUaTfK7Y27aesXzXAyHZxIGPKiOHuFH2ZMAJo6KgEsrAiLNiS+AO6IiGbYzepFJ2NjlYcaFLtIv5cmiQgjha5YxgFnK0P4ZW7reh9eSD4zxvPw== X-Forefront-Antispam-Report: CIP:12.22.5.238;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(83380400001)(86362001)(54906003)(70206006)(5660300002)(110136005)(8676002)(7416002)(4326008)(107886003)(40460700003)(508600001)(70586007)(186003)(47076005)(336012)(36860700001)(426003)(2616005)(316002)(1076003)(82310400005)(36756003)(81166007)(356005)(7696005)(2906002)(6666004)(8936002)(26005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 May 2022 11:16:37.8428 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9e893369-4358-4441-460c-08da38bfe051 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.238];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT006.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4042 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org If any PME event will be generated by PCI, then it will be mostly handled in the host by the root port PME code. For example, in the case of PCIe, the PME event will be sent to the root port and then the PME interrupt will be generated. This will be handled in drivers/pci/pcie/pme.c at the host side. Inside this, the pci_check_pme_status() will be called where PME_Status and PME_En bits will be cleared. So, the guest OS which is using vfio-pci device will not come to know about this PME event. To handle these PME events inside guests, we need some framework so that if any PME events will happen, then it needs to be forwarded to virtual machine monitor. We can virtualize PME related registers bits and initialize these bits to zero so vfio-pci device user will assume that it is not capable of asserting the PME# signal from any power state. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_config.c | 33 +++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index ea7d2306ba9d..9343f597182d 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -757,12 +757,29 @@ static int __init init_pci_cap_pm_perm(struct perm_bits *perm) */ p_setb(perm, PCI_CAP_LIST_NEXT, (u8)ALL_VIRT, NO_WRITE); + /* + * The guests can't process PME events. If any PME event will be + * generated, then it will be mostly handled in the host and the + * host will clear the PME_STATUS. So virtualize PME_Support bits. + * The vconfig bits will be cleared during device capability + * initialization. + */ + p_setw(perm, PCI_PM_PMC, PCI_PM_CAP_PME_MASK, NO_WRITE); + /* * Power management is defined *per function*, so we can let * the user change power state, but we trap and initiate the * change ourselves, so the state bits are read-only. + * + * The guest can't process PME from D3cold so virtualize PME_Status + * and PME_En bits. The vconfig bits will be cleared during device + * capability initialization. */ - p_setd(perm, PCI_PM_CTRL, NO_VIRT, ~PCI_PM_CTRL_STATE_MASK); + p_setd(perm, PCI_PM_CTRL, + PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS, + ~(PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS | + PCI_PM_CTRL_STATE_MASK)); + return 0; } @@ -1431,6 +1448,17 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo return 0; } +static void vfio_update_pm_vconfig_bytes(struct vfio_pci_core_device *vdev, + int offset) +{ + __le16 *pmc = (__le16 *)&vdev->vconfig[offset + PCI_PM_PMC]; + __le16 *ctrl = (__le16 *)&vdev->vconfig[offset + PCI_PM_CTRL]; + + /* Clear vconfig PME_Support, PME_Status, and PME_En bits */ + *pmc &= ~cpu_to_le16(PCI_PM_CAP_PME_MASK); + *ctrl &= ~cpu_to_le16(PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS); +} + static int vfio_fill_vconfig_bytes(struct vfio_pci_core_device *vdev, int offset, int size) { @@ -1554,6 +1582,9 @@ static int vfio_cap_init(struct vfio_pci_core_device *vdev) if (ret) return ret; + if (cap == PCI_CAP_ID_PM) + vfio_update_pm_vconfig_bytes(vdev, pos); + prev = &vdev->vconfig[pos + PCI_CAP_LIST_NEXT]; pos = next; caps++; From patchwork Wed May 18 11:16:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12853518 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4A51C43217 for ; Wed, 18 May 2022 11:17:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235443AbiERLRe (ORCPT ); Wed, 18 May 2022 07:17:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235505AbiERLR2 (ORCPT ); Wed, 18 May 2022 07:17:28 -0400 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2081.outbound.protection.outlook.com [40.107.244.81]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACD257CDF2; Wed, 18 May 2022 04:17:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nKP4Jp92cfBojmnZCijzt+TzdhV1o1PL58iuDth0rSPYkkdIDWnWo5oxDdJWqrpq7j4nPc8xPJCgHEtiIqjo0ofXIqIEY3rVfvy8ncXH38fQElwXva083W8pRq9QeYvpPIYhjnuMsZBxF8ee/KvnPdu2Od0WEMxaTX5ggmYaQGNqn+43rzjnoPa9tqvZsobQOmKhfcI1908EQIF1xkTOxiiwuWaXJDjXXcuJlf97GqP5r+rofIiYYEn41qVJLaF3q41V01yRZuocGTWnRS8tYliRjUMHgQoJqxOlbw1l3QxDfSaeI7VWks+LMzpj3P04QG+T4Yx23iD2goWsDDUYkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eo5LG217VYHf1oboG5EJwg3mUyfEraUoPi3m6i1N7jE=; b=L3WgH+HAQS+rF+cb/Uq+zfepWYHy9J7LcPYJZcywplvv80EE2fTr+E6W77qG/NSMA1m1U2htVM2WTrcsi9B7rsXYVxhsRJKo3Xs/tdmz7Qo6yB0Q0/Oj644LVK3zgdxjP/R4CJwynJWcHw3Mofy+UIRHBXAeyyxExCTYckgOWKRhC4s81C2bo34/ZlhrVFcJKJeMdfw+aEkoedNmxrYgwwypFBig0+uFlT7nuzHKwF+IH8PYfZAC8M/XCNgsfqqhw4UZQn0tVfTBCNjBAPvHhYImQLr7Ufwuz53igFkDPWVmbKLSmUB1/iK8IZMcsiQjRva2Ene6yOjNudZIGmdr3A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.235) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=eo5LG217VYHf1oboG5EJwg3mUyfEraUoPi3m6i1N7jE=; b=L+eXKiFLbz3Sn/nhaAxHA87HPOJ4v2BXqh9oiHXd33OqdegTkRDm3tY0lQgck2VEYIEKCGineyc4hUr6ZMYihM1LhwRe6Vu6Eks8Jl3I2kzqpG/j0ygIMkYvlJeNB4FYjK3ibmPxAQAWwOnxrZV5c4tydfkN+QUNxXcGrG4T40thQ27LrFAHXMLiIBATWEbDYKANqSSjMTiLUv2G1C8NEazNYMmhKRLdQOoE3avm+/HItSluKPRBmi7CBAkLJVTnsUDtLOYqKQzPnP89tGju2w+12hwkbX/GRLBJ+XwuP9KUfsW4wOp5wSvrKSFXYwX+eu/4WBmZvWJrpT31EljCiQ== Received: from DS7PR03CA0232.namprd03.prod.outlook.com (2603:10b6:5:3ba::27) by LV2PR12MB5943.namprd12.prod.outlook.com (2603:10b6:408:170::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5250.18; Wed, 18 May 2022 11:17:23 +0000 Received: from DM6NAM11FT057.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3ba:cafe::b9) by DS7PR03CA0232.outlook.office365.com (2603:10b6:5:3ba::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5273.15 via Frontend Transport; Wed, 18 May 2022 11:17:23 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.235) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.235 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.235; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.235) by DM6NAM11FT057.mail.protection.outlook.com (10.13.172.252) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5273.14 via Frontend Transport; Wed, 18 May 2022 11:17:22 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by DRHQMAIL107.nvidia.com (10.27.9.16) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Wed, 18 May 2022 11:16:43 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 18 May 2022 04:16:42 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Wed, 18 May 2022 04:16:37 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v5 4/4] vfio/pci: Move the unused device into low power state with runtime PM Date: Wed, 18 May 2022 16:46:12 +0530 Message-ID: <20220518111612.16985-5-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220518111612.16985-1-abhsahu@nvidia.com> References: <20220518111612.16985-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: f3499fbe-e67b-4769-b8ac-08da38bffb34 X-MS-TrafficTypeDiagnostic: LV2PR12MB5943:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: MUY5R1P3//G9LJcVf1dlovvoz67fQQ8mYWNgv2p12eoELnprWti29KydwABL3IQkoX9yRHqKH5cAsPbQyxu/A9D4UB+XpHyV962cgYR3K30kfLO3lIUUbHEPqZGvoUKFl2m6iffyZlWE9DMLinWoEbrEqvvVgLpQzppTOhYOtcwWpC2bErMLWWcsybQqoJzcvXf2sbE+gDP46Hm2YmIkUEIXiFAerc9xbf2bwY1CjuwvCoWfkPwB0+Y2XOc78zcv72Gs8irLcLC+mw6rE+jCk0jTRcqcQ8R8R1wM4xuxAxCM7bKKt1KVv9Z1OhEYciYD3ijZFAqK6+bBjgCK4fCS+iRs0T1fVlVPYpa3VqXWGI9831ezuwYvfXVu6blyBm0x+ZtQ45Uyb4K3Jq76BFbaJ4lPYJgWYlLHEiWMgaW1uDG5+dmWsbh+4cg/HpKdLPp8PESODiOerOC7pTGnxVGP/xpdWLX52G0CDmfTy31CU07IiBq+8pT52BXipmAqWf3JDhfiuPzJelef2elJCWIFJnzS/xKNijFAr7eimJXyzr8fdjmmv+aVbXWCu62Y3AVpxx2897/oqqW0Ieh+hsLntifjMDpVpOOt8Fx6wdsT/x/Tis+dJ0fOBxh0jVirDwL481BgobStQaxk4VTym2zZNDFrIZcZELbbYtpSMFf9vRUJFPA1xQvEZ9LyZHotV6rABN/SBBkn6wu8Lut8B10rsg== X-Forefront-Antispam-Report: CIP:12.22.5.235;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(40470700004)(36840700001)(46966006)(186003)(83380400001)(110136005)(107886003)(2616005)(26005)(40460700003)(82310400005)(36860700001)(336012)(6666004)(426003)(47076005)(81166007)(508600001)(4326008)(36756003)(30864003)(86362001)(8936002)(2906002)(54906003)(316002)(7416002)(7696005)(356005)(5660300002)(70586007)(1076003)(70206006)(8676002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 May 2022 11:17:22.9541 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f3499fbe-e67b-4769-b8ac-08da38bffb34 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.235];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT057.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB5943 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Currently, there is very limited power management support available in the upstream vfio_pci_core based drivers. If there are no users of the device, then the PCI device will be moved into D3hot state by writing directly into PCI PM registers. This D3hot state help in saving power but we can achieve zero power consumption if we go into the D3cold state. The D3cold state cannot be possible with native PCI PM. It requires interaction with platform firmware which is system-specific. To go into low power states (including D3cold), the runtime PM framework can be used which internally interacts with PCI and platform firmware and puts the device into the lowest possible D-States. This patch registers vfio_pci_core based drivers with the runtime PM framework. 1. The PCI core framework takes care of most of the runtime PM related things. For enabling the runtime PM, the PCI driver needs to decrement the usage count and needs to provide 'struct dev_pm_ops' at least. The runtime suspend/resume callbacks are optional and needed only if we need to do any extra handling. Now there are multiple vfio_pci_core based drivers. Instead of assigning the 'struct dev_pm_ops' in individual parent driver, the vfio_pci_core itself assigns the 'struct dev_pm_ops'. There are other drivers where the 'struct dev_pm_ops' is being assigned inside core layer (For example, wlcore_probe() and some sound based driver, etc.). 2. This patch provides the stub implementation of 'struct dev_pm_ops'. The subsequent patch will provide the runtime suspend/resume callbacks. All the config state saving, and PCI power management related things will be done by PCI core framework itself inside its runtime suspend/resume callbacks (pci_pm_runtime_suspend() and pci_pm_runtime_resume()). 3. Inside pci_reset_bus(), all the devices in dev_set needs to be runtime resumed. vfio_pci_dev_set_pm_runtime_get() will take care of the runtime resume and its error handling. 4. Inside vfio_pci_core_disable(), the device usage count always needs to be decremented which was incremented in vfio_pci_core_enable(). 5. Since the runtime PM framework will provide the same functionality, so directly writing into PCI PM config register can be replaced with the use of runtime PM routines. Also, the use of runtime PM can help us in more power saving. In the systems which do not support D3cold, With the existing implementation: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3hot So, with runtime PM, the upstream bridge or root port will also go into lower power state which is not possible with existing implementation. In the systems which support D3cold, // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3cold // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3cold So, with runtime PM, both the PCI device and upstream bridge will go into D3cold state. 6. If 'disable_idle_d3' module parameter is set, then also the runtime PM will be enabled, but in this case, the usage count should not be decremented. 7. vfio_pci_dev_set_try_reset() return value is unused now, so this function return type can be changed to void. 8. Use the runtime PM API's in vfio_pci_core_sriov_configure(). The device can be in low power state either with runtime power management (when there is no user) or PCI_PM_CTRL register write by the user. In both the cases, the PF should be moved to D0 state. For preventing any runtime usage mismatch, pci_num_vf() has been called explicitly during disable. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_core.c | 170 ++++++++++++++++++++----------- 1 file changed, 113 insertions(+), 57 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 9489ceea8875..a0d69ddaf90d 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -156,7 +156,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev) } struct vfio_pci_group_info; -static bool vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set); +static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set); static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, struct vfio_pci_group_info *groups); @@ -259,6 +259,17 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat return ret; } +/* + * The dev_pm_ops needs to be provided to make pci-driver runtime PM working, + * so use structure without any callbacks. + * + * The pci-driver core runtime PM routines always save the device state + * before going into suspended state. If the device is going into low power + * state with only with runtime PM ops, then no explicit handling is needed + * for the devices which have NoSoftRst-. + */ +static const struct dev_pm_ops vfio_pci_core_pm_ops = { }; + int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) { struct pci_dev *pdev = vdev->pdev; @@ -266,21 +277,23 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) u16 cmd; u8 msix_pos; - vfio_pci_set_power_state(vdev, PCI_D0); + if (!disable_idle_d3) { + ret = pm_runtime_resume_and_get(&pdev->dev); + if (ret < 0) + return ret; + } /* Don't allow our initial saved state to include busmaster */ pci_clear_master(pdev); ret = pci_enable_device(pdev); if (ret) - return ret; + goto out_power; /* If reset fails because of the device lock, fail this path entirely */ ret = pci_try_reset_function(pdev); - if (ret == -EAGAIN) { - pci_disable_device(pdev); - return ret; - } + if (ret == -EAGAIN) + goto out_disable_device; vdev->reset_works = !ret; pci_save_state(pdev); @@ -304,12 +317,8 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) } ret = vfio_config_init(vdev); - if (ret) { - kfree(vdev->pci_saved_state); - vdev->pci_saved_state = NULL; - pci_disable_device(pdev); - return ret; - } + if (ret) + goto out_free_state; msix_pos = pdev->msix_cap; if (msix_pos) { @@ -330,6 +339,16 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) return 0; + +out_free_state: + kfree(vdev->pci_saved_state); + vdev->pci_saved_state = NULL; +out_disable_device: + pci_disable_device(pdev); +out_power: + if (!disable_idle_d3) + pm_runtime_put(&pdev->dev); + return ret; } EXPORT_SYMBOL_GPL(vfio_pci_core_enable); @@ -437,8 +456,11 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) out: pci_disable_device(pdev); - if (!vfio_pci_dev_set_try_reset(vdev->vdev.dev_set) && !disable_idle_d3) - vfio_pci_set_power_state(vdev, PCI_D3hot); + vfio_pci_dev_set_try_reset(vdev->vdev.dev_set); + + /* Put the pm-runtime usage counter acquired during enable */ + if (!disable_idle_d3) + pm_runtime_put(&pdev->dev); } EXPORT_SYMBOL_GPL(vfio_pci_core_disable); @@ -1823,10 +1845,11 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_uninit_device); int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) { struct pci_dev *pdev = vdev->pdev; + struct device *dev = &pdev->dev; int ret; /* Drivers must set the vfio_pci_core_device to their drvdata */ - if (WARN_ON(vdev != dev_get_drvdata(&vdev->pdev->dev))) + if (WARN_ON(vdev != dev_get_drvdata(dev))) return -EINVAL; if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) @@ -1868,19 +1891,21 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) vfio_pci_probe_power_state(vdev); - if (!disable_idle_d3) { - /* - * pci-core sets the device power state to an unknown value at - * bootup and after being removed from a driver. The only - * transition it allows from this unknown state is to D0, which - * typically happens when a driver calls pci_enable_device(). - * We're not ready to enable the device yet, but we do want to - * be able to get to D3. Therefore first do a D0 transition - * before going to D3. - */ - vfio_pci_set_power_state(vdev, PCI_D0); - vfio_pci_set_power_state(vdev, PCI_D3hot); - } + /* + * pci-core sets the device power state to an unknown value at + * bootup and after being removed from a driver. The only + * transition it allows from this unknown state is to D0, which + * typically happens when a driver calls pci_enable_device(). + * We're not ready to enable the device yet, but we do want to + * be able to get to D3. Therefore first do a D0 transition + * before enabling runtime PM. + */ + vfio_pci_set_power_state(vdev, PCI_D0); + + dev->driver->pm = &vfio_pci_core_pm_ops; + pm_runtime_allow(dev); + if (!disable_idle_d3) + pm_runtime_put(dev); ret = vfio_register_group_dev(&vdev->vdev); if (ret) @@ -1889,7 +1914,9 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) out_power: if (!disable_idle_d3) - vfio_pci_set_power_state(vdev, PCI_D0); + pm_runtime_get_noresume(dev); + + pm_runtime_forbid(dev); out_vf: vfio_pci_vf_uninit(vdev); return ret; @@ -1906,7 +1933,9 @@ void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev) vfio_pci_vga_uninit(vdev); if (!disable_idle_d3) - vfio_pci_set_power_state(vdev, PCI_D0); + pm_runtime_get_noresume(&vdev->pdev->dev); + + pm_runtime_forbid(&vdev->pdev->dev); } EXPORT_SYMBOL_GPL(vfio_pci_core_unregister_device); @@ -1951,22 +1980,33 @@ int vfio_pci_core_sriov_configure(struct vfio_pci_core_device *vdev, /* * The PF power state should always be higher than the VF power - * state. If PF is in the low power state, then change the - * power state to D0 first before enabling SR-IOV. - * Also, this function can be called at any time, and userspace - * PCI_PM_CTRL write can race against this code path, + * state. The PF can be in low power state either with runtime + * power management (when there is no user) or PCI_PM_CTRL + * register write by the user. If PF is in the low power state, + * then change the power state to D0 first before enabling + * SR-IOV. Also, this function can be called at any time, and + * userspace PCI_PM_CTRL write can race against this code path, * so protect the same with 'memory_lock'. */ + ret = pm_runtime_resume_and_get(&pdev->dev); + if (ret) + goto out_del; + down_write(&vdev->memory_lock); vfio_pci_set_power_state(vdev, PCI_D0); ret = pci_enable_sriov(pdev, nr_virtfn); up_write(&vdev->memory_lock); - if (ret) + if (ret) { + pm_runtime_put(&pdev->dev); goto out_del; + } return nr_virtfn; } - pci_disable_sriov(pdev); + if (pci_num_vf(pdev)) { + pci_disable_sriov(pdev); + pm_runtime_put(&pdev->dev); + } out_del: mutex_lock(&vfio_pci_sriov_pfs_mutex); @@ -2041,6 +2081,27 @@ vfio_pci_dev_set_resettable(struct vfio_device_set *dev_set) return pdev; } +static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set) +{ + struct vfio_pci_core_device *cur; + int ret; + + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { + ret = pm_runtime_resume_and_get(&cur->pdev->dev); + if (ret) + goto unwind; + } + + return 0; + +unwind: + list_for_each_entry_continue_reverse(cur, &dev_set->device_list, + vdev.dev_set_list) + pm_runtime_put(&cur->pdev->dev); + + return ret; +} + /* * We need to get memory_lock for each device, but devices can share mmap_lock, * therefore we need to zap and hold the vma_lock for each device, and only then @@ -2147,43 +2208,38 @@ static bool vfio_pci_dev_set_needs_reset(struct vfio_device_set *dev_set) * - At least one of the affected devices is marked dirty via * needs_reset (such as by lack of FLR support) * Then attempt to perform that bus or slot reset. - * Returns true if the dev_set was reset. */ -static bool vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set) +static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set) { struct vfio_pci_core_device *cur; struct pci_dev *pdev; - int ret; + bool reset_done = false; if (!vfio_pci_dev_set_needs_reset(dev_set)) - return false; + return; pdev = vfio_pci_dev_set_resettable(dev_set); if (!pdev) - return false; + return; /* - * The pci_reset_bus() will reset all the devices in the bus. - * The power state can be non-D0 for some of the devices in the bus. - * For these devices, the pci_reset_bus() will internally set - * the power state to D0 without vfio driver involvement. - * For the devices which have NoSoftRst-, the reset function can - * cause the PCI config space reset without restoring the original - * state (saved locally in 'vdev->pm_save'). + * Some of the devices in the bus can be in the runtime suspended + * state. Increment the usage count for all the devices in the dev_set + * before reset and decrement the same after reset. */ - list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) - vfio_pci_set_power_state(cur, PCI_D0); + if (!disable_idle_d3 && vfio_pci_dev_set_pm_runtime_get(dev_set)) + return; - ret = pci_reset_bus(pdev); - if (ret) - return false; + if (!pci_reset_bus(pdev)) + reset_done = true; list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { - cur->needs_reset = false; + if (reset_done) + cur->needs_reset = false; + if (!disable_idle_d3) - vfio_pci_set_power_state(cur, PCI_D3hot); + pm_runtime_put(&cur->pdev->dev); } - return true; } void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga,