From patchwork Mon Apr 25 09:26:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12825484 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B6FBC433EF for ; Mon, 25 Apr 2022 09:27:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236481AbiDYJaA (ORCPT ); Mon, 25 Apr 2022 05:30:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236157AbiDYJ3n (ORCPT ); Mon, 25 Apr 2022 05:29:43 -0400 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2051.outbound.protection.outlook.com [40.107.93.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67CBE5598; Mon, 25 Apr 2022 02:26:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Yy61qtG2mOEMJ/2Kmx8vB5JAroBLvasgEXFx4q/3aggnnClq9fYtCH8zTb6yIJe85lDNI0ojUbPfh7OhTJf1yDlVkY299U9aWqK+cJLTOw5gNWaBcqUThrPSzQUnn70BrvTLGCbNyEoPcDO+SkqSUqRpWLl/b8M94FxpJbXXjtIOucleBzZ5NcOoPjcM4YRMCoWcoNFtMMXpwhsYs/ozvL4dUpyz+inDPJFKN6pdB0yu4hVVFq3zRH8d3L8Qccl1IHw+2osbe5uEoUjDuZnbj5ee0U6661Qoq6dDKCJvh4ii2+JQhPzGiQSG2qbR8CzeQcmEKzAJZzGaKgibM87PgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=20z38avv+TCJBK9Sxmqv6x4OqJYrcs1zev8HzqLXm74=; b=nTOBbopBUsSj5uDvCngAFM5KD807ASMZRfMgKUXuIFQlLXyXnRXHYUY3xkAOJTZCpZY/taYvmAwPQqI3QZ7KWXTRw616mJt3y9M06XcSBx/W1jr0J2LqcdEnMUuEAd+ghkCr5q+/kBJUxkkmaCG8BKeXqO5e1GStbxaT3lngqnC/d1nWjO1wGpyGrw9H/QLrz8fTCLIoqKLmAGMjq5EWjy35V29dNLhRCBq0McKikfF3pmu+3k0PrLmFlk03uYVI/4AEM+Xuic5n9a7BaROsw4tyjyH4+AHcMWsiGJpPTPsuBIjqv0NiaRHhY7Vra35nlpJ7UVvYtGvILl89a5HPzw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.235) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=20z38avv+TCJBK9Sxmqv6x4OqJYrcs1zev8HzqLXm74=; b=d+DghNvxhF3Gq/RXgMiGLiJAhhIClUFcQLFuZ4xMCej5/GvS8HzRW3cWAYkuErpknUkBh7ZEfRtuA9rSoqlXjvvfT1c5hkzJ6RNGQAB6ZpRqi0WvNdKNgUUdQqEWmzPvU+6HCPjcsXdHZOKMiK4MBAFN1jSP4ZX3xWB3nJSLEUrDRfcLNMQqNJa8aF7CU5RKTbzaWYmgUnO4kkZ6C7hJAmJhntU/xDnLfpEdtWE2tZNAIFwIKi6wA1DoZqBVSsVnk5qiFqf+hwL5AQ1t++PkQPVbOIdROqRsjaoyBpWGVBiE0WGJm5Dp+rlXSUxHY5oKasMsCUE+XEC/R0DrnHNYAw== Received: from BN9PR03CA0099.namprd03.prod.outlook.com (2603:10b6:408:fd::14) by BL1PR12MB5175.namprd12.prod.outlook.com (2603:10b6:208:318::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14; Mon, 25 Apr 2022 09:26:34 +0000 Received: from BN8NAM11FT065.eop-nam11.prod.protection.outlook.com (2603:10b6:408:fd:cafe::63) by BN9PR03CA0099.outlook.office365.com (2603:10b6:408:fd::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14 via Frontend Transport; Mon, 25 Apr 2022 09:26:34 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.235) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.235 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.235; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.235) by BN8NAM11FT065.mail.protection.outlook.com (10.13.177.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5186.14 via Frontend Transport; Mon, 25 Apr 2022 09:26:34 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL107.nvidia.com (10.27.9.16) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Mon, 25 Apr 2022 09:26:33 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Mon, 25 Apr 2022 02:26:32 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.6) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Mon, 25 Apr 2022 02:26:27 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v3 1/8] vfio/pci: Invalidate mmaps and block the access in D3hot power state Date: Mon, 25 Apr 2022 14:56:08 +0530 Message-ID: <20220425092615.10133-2-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220425092615.10133-1-abhsahu@nvidia.com> References: <20220425092615.10133-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 8b1b99f1-8199-4337-3f73-08da269db0e6 X-MS-TrafficTypeDiagnostic: BL1PR12MB5175:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: l5foAfFxfPS7SkxRM/SZMbnAsmec9TuyQ9TPMNsiGVIiTfzKdKyAs6Hz+Wg8Pz43HlBA+OBCH4J5e63bUlTauNXmE2D+nXRtTv3UsVhOnugf3xO72rsNf0z159zd3f7qFVhM7zdK3fLCUz7hE96TOTz202L8ynKCAY/NeQgni0xmBSud/yx1b9ZvVMn6Yn72kdJ8MsXEWPIN2HMMLDxBJI/ordAQ+NvxqrMAKHCkbAkkSS2owYT+iPJiISOQsBazIgunmeSmM/juAlBNqsrpAeo+UKYlV+G5/ES7fmn5up+0ZjoChZgkjoen8THxEEbnAnFuC+izMfrdTTwsaAtP52qS37PImDywwdPc7IuLJYUMAkhbNThCjMrOVJQpu39BjmCzNvrncQgpmtD2w9+jdx3W9MOrF9s4yIDUBtzPFr+1VRFLXdomVQHxny94yuaHIahbDSq3GFsTd/2PF0XK4Kt/l8ed2jXH8EWfZ0oxhbu/GU7c1L68Zx2bIo15J7jl6AKVGDaKoxkBH4wOhwMZ81bgV/tzHVjYDlvL2JaMJUBo2s4RktneLpBF3vj8SUSlABpg9rsVLAAcaAd0vFnuHv2UAc6UBnauTFFw+EE5q0WsUQDXZ7twB4QU8tswQCDFsDvt4xdIOuId9sbf2fP7qBVeiWqsk1FAPIUpJBNWNSb4O4r/Qbhflru8v3Zp7MOQ1/GwDlnYxfBGb8ruzTa0FdkLUhoWnMCa165vsvVKI7c= X-Forefront-Antispam-Report: CIP:12.22.5.235;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(40470700004)(46966006)(36840700001)(186003)(426003)(508600001)(1076003)(336012)(356005)(8936002)(316002)(8676002)(36756003)(70206006)(4326008)(5660300002)(70586007)(36860700001)(82310400005)(7416002)(107886003)(2616005)(86362001)(47076005)(81166007)(40460700003)(2906002)(26005)(7696005)(6666004)(54906003)(110136005)(83380400001)(32563001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Apr 2022 09:26:34.4072 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8b1b99f1-8199-4337-3f73-08da269db0e6 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.235];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT065.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5175 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org According to [PCIe v5 5.3.1.4.1] for D3hot state "Configuration and Message requests are the only TLPs accepted by a Function in the D3Hot state. All other received Requests must be handled as Unsupported Requests, and all received Completions may optionally be handled as Unexpected Completions." Currently, if the vfio PCI device has been put into D3hot state and if user makes non-config related read/write request in D3hot state, these requests will be forwarded to the host and this access may cause issues on a few systems. This patch leverages the memory-disable support added in commit 'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")' to generate page fault on mmap access and return error for the direct read/write. If the device is D3hot state, then the error will be returned for MMIO access. The IO access generally does not make the system unresponsive so the IO access can still happen in D3hot state. The default value should be returned in this case without bringing down the complete system. Also, the power related structure fields need to be protected so we can use the same 'memory_lock' to protect these fields also. This protection is mainly needed when user changes the PCI power state by writing into PCI_PM_CTRL register. vfio_lock_and_set_power_state() wrapper function will take the required locks and then it will invoke the vfio_pci_set_power_state(). Signed-off-by: Abhishek Sahu Reported-by: kernel test robot --- drivers/vfio/pci/vfio_pci_config.c | 19 ++++++++++++++++++- drivers/vfio/pci/vfio_pci_core.c | 4 +++- drivers/vfio/pci/vfio_pci_rdwr.c | 6 ++++-- include/linux/vfio_pci_core.h | 1 + 4 files changed, 26 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 6e58b4bf7a60..dd557edae6e1 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -692,6 +692,23 @@ static int __init init_pci_cap_basic_perm(struct perm_bits *perm) return 0; } +/* + * It takes all the required locks to protect the access of power related + * variables and then invokes vfio_pci_set_power_state(). + */ +static void +vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev, + pci_power_t state) +{ + if (state >= PCI_D3hot) + vfio_pci_zap_and_down_write_memory_lock(vdev); + else + down_write(&vdev->memory_lock); + + vfio_pci_set_power_state(vdev, state); + up_write(&vdev->memory_lock); +} + static int vfio_pm_config_write(struct vfio_pci_core_device *vdev, int pos, int count, struct perm_bits *perm, int offset, __le32 val) @@ -718,7 +735,7 @@ static int vfio_pm_config_write(struct vfio_pci_core_device *vdev, int pos, break; } - vfio_pci_set_power_state(vdev, state); + vfio_lock_and_set_power_state(vdev, state); } return count; diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 06b6f3594a13..f3dfb033e1c4 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -230,6 +230,8 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat ret = pci_set_power_state(pdev, state); if (!ret) { + vdev->power_state_d3 = (pdev->current_state >= PCI_D3hot); + /* D3 might be unsupported via quirk, skip unless in D3 */ if (needs_save && pdev->current_state >= PCI_D3hot) { /* @@ -1398,7 +1400,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) mutex_lock(&vdev->vma_lock); down_read(&vdev->memory_lock); - if (!__vfio_pci_memory_enabled(vdev)) { + if (!__vfio_pci_memory_enabled(vdev) || vdev->power_state_d3) { ret = VM_FAULT_SIGBUS; goto up_out; } diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 82ac1569deb0..fac6bb40a4ce 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -43,7 +43,8 @@ static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev, \ { \ if (test_mem) { \ down_read(&vdev->memory_lock); \ - if (!__vfio_pci_memory_enabled(vdev)) { \ + if (!__vfio_pci_memory_enabled(vdev) || \ + vdev->power_state_d3) { \ up_read(&vdev->memory_lock); \ return -EIO; \ } \ @@ -70,7 +71,8 @@ static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev, \ { \ if (test_mem) { \ down_read(&vdev->memory_lock); \ - if (!__vfio_pci_memory_enabled(vdev)) { \ + if (!__vfio_pci_memory_enabled(vdev) || \ + vdev->power_state_d3) { \ up_read(&vdev->memory_lock); \ return -EIO; \ } \ diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 48f2dd3c568c..505b2a74a479 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -124,6 +124,7 @@ struct vfio_pci_core_device { bool needs_reset; bool nointx; bool needs_pm_restore; + bool power_state_d3; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; int ioeventfds_nr; From patchwork Mon Apr 25 09:26:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12825483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F6EAC433F5 for ; Mon, 25 Apr 2022 09:26:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235643AbiDYJ3z (ORCPT ); Mon, 25 Apr 2022 05:29:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236323AbiDYJ3r (ORCPT ); Mon, 25 Apr 2022 05:29:47 -0400 Received: from NAM02-DM3-obe.outbound.protection.outlook.com (mail-dm3nam07on2053.outbound.protection.outlook.com [40.107.95.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87BF613F7C; Mon, 25 Apr 2022 02:26:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=P9s7L30jz5s0woeGo5yUReYWw9rb/qU8afre9euDHC59oU3tqctjsnGOLzfY3NYpcgoIX7dgAib1MqhZHxzXhHq/0PBAktybA0EDs6I7s7gqvKyU4bcqZcxvX0Q7MRMBs09v/o+N92/13CTDWRFMqefLLTdFqVhuCdSxgN97J2rXFD9EMrkhxg5kJ4vcSeVQI47dJ3xlfZABHeQznt5A9LOBquIL4LztAI26j2Az2oBU720XhjIywNbQXUh/KEWJL5dLxoSbF6W5/rylNsWqXOEKotg49n/d8UWq/zlkfTQ/ZYsP+LCCieGRItzHDvB4Bu/xfUodzxJY6Q36BRy8Nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YJZVB3rj/w+KrQi8nevnsyxBbrMbLsm66b0j7qO5kbc=; b=bCjzTOJyNlSVSrTGApRM9dZf7qdbMC0txY+wI28Xo2TuIRjzhWt9I+sl+uGlDeDnJEhBR/LjhT7VFxFFKTB6kqyFYY2hEQPwR2GrNB/+1ls4WsxDNNAzo74YxvGkyBkOXydV38Frsu/RwM8CtjN68oQ/UJzAblZoW+MAHWiYVuuihUasFpLIZDCvSBRiWTPpEXWjn3YxnUl2UTxt2RCW1lmWPGqpBepU0hcT+/6CG9shNWsXPst4ACjXk+zD3ZtWlu2jlhjnx/ZLUreDAJZQV4wmo19BmbNtaBP8gVrvAQcsXa5s/E0J5teP9vCpW9KBQI8Uj+fqoerJblNruM41NQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.234) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YJZVB3rj/w+KrQi8nevnsyxBbrMbLsm66b0j7qO5kbc=; b=GybILv/m7BNi8pX2lL8SuCC3Nrb5Va/1kzsJEGuDcfNPIbs95y4/adr5W5xJXYsotHn0OvFc+gvn8irVBPKTIeRwPvdyiWc2K6jJyro8g6YtAjHuqD9AutP/wO3A1E0P5b57y5vyBafjhP5p9lAy1esG8mmbhGjv9mWMQdkfFDEsLXmj965nYcXrbk9F9pfGwxZpB5KEjDTfKq8anYfwFieymUyDfRJoAOhYd/45l43wFhWHkxuTDwcVwyW1ccn7AheRtAGzILXaROYJcUdB9bIUFrGDb/6B4QWOdIxT7s5laGLVhhOedesZxIbkdDmkOzmd4tLe4pExsE46fTmrdw== Received: from DS7PR06CA0050.namprd06.prod.outlook.com (2603:10b6:8:54::32) by CY4PR12MB1910.namprd12.prod.outlook.com (2603:10b6:903:128::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.13; Mon, 25 Apr 2022 09:26:40 +0000 Received: from DM6NAM11FT011.eop-nam11.prod.protection.outlook.com (2603:10b6:8:54:cafe::41) by DS7PR06CA0050.outlook.office365.com (2603:10b6:8:54::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.13 via Frontend Transport; Mon, 25 Apr 2022 09:26:40 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.234) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.234 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.234; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.234) by DM6NAM11FT011.mail.protection.outlook.com (10.13.172.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5186.14 via Frontend Transport; Mon, 25 Apr 2022 09:26:40 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL101.nvidia.com (10.27.9.10) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Mon, 25 Apr 2022 09:26:39 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Mon, 25 Apr 2022 02:26:38 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.6) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Mon, 25 Apr 2022 02:26:33 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v3 2/8] vfio/pci: Change the PF power state to D0 before enabling VFs Date: Mon, 25 Apr 2022 14:56:09 +0530 Message-ID: <20220425092615.10133-3-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220425092615.10133-1-abhsahu@nvidia.com> References: <20220425092615.10133-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 98418fc6-cc42-45cf-6810-08da269db433 X-MS-TrafficTypeDiagnostic: CY4PR12MB1910:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: exKQja5L45maLOHx882lX7/g0rVRG2WHUTIbGsohq8xuAH+3bE6dGO76F/79TWomPH6SHVghIbHjWe+bTj4NzOSR3+OYNCfIJy5hk6NoYi3PZyePr+8SMLcIw+bDxyUkJBA9GfchTzZOcfPWjnweRbuUlH9MYaUHgpDaeNuTkPgv7sMM0jQMHBmQUzKsTFeROGIZ7ssjxqIcWqwsSfLKNIGalF0MjQRVmsypcrp6jbX+szsskM37JLXBIeY55Oyt3DfMfzSjUXH9A5kmJ2VrxvUfN74aReVB2e0FE3u69D5GuGLov5I4yAm6vNvlfV1InPIvRl5JpnAB3Ph84CWAwz4gUZmzuKK/BMc83lvLpcIsQ+wmgpML+x+o7ZWzVQ1fQPYOsfOYoOUNkCKSeJipy7gzia1XGG5SN6EoqgOav1j77pL7BgW6qtnsDMxoWxSw+iks1m5NXyF6bkNrPDXgTt5hVbK//sDJcVtB8Rrf2PFdEw1aTPHbQD9i6ZAoBCu0FRID0DG+7wzE3LX6bOfCXLIJqs+LX8dWc3IWZnPUuPh6dDjmzIoDDoJwj0lFXYBXEzk8Fz0xLmqBPPcHtB4oGKMw1hfQF/czB7HWy9xEbaybcxXGBGX1mxUPziUzndmFp9s3oASxkMbjnBfc0esp5S8AhJOAT/h7Z0zv30ids10gWNISbKje3DB86B31mlbWA9KNKr+02l7ueI8zdI8jfg== X-Forefront-Antispam-Report: CIP:12.22.5.234;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(46966006)(40470700004)(36840700001)(36860700001)(2906002)(336012)(86362001)(47076005)(316002)(4326008)(7416002)(70206006)(70586007)(5660300002)(8676002)(8936002)(36756003)(426003)(110136005)(2616005)(54906003)(7696005)(107886003)(1076003)(6666004)(26005)(82310400005)(356005)(40460700003)(83380400001)(81166007)(508600001)(186003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Apr 2022 09:26:40.0085 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 98418fc6-cc42-45cf-6810-08da269db433 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.234];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT011.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR12MB1910 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org According to [PCIe v5 9.6.2] for PF Device Power Management States "The PF's power management state (D-state) has global impact on its associated VFs. If a VF does not implement the Power Management Capability, then it behaves as if it is in an equivalent power state of its associated PF. If a VF implements the Power Management Capability, the Device behavior is undefined if the PF is placed in a lower power state than the VF. Software should avoid this situation by placing all VFs in lower power state before lowering their associated PF's power state." From the vfio driver side, user can enable SR-IOV when the PF is in D3hot state. If VF does not implement the Power Management Capability, then the VF will be actually in D3hot state and then the VF BAR access will fail. If VF implements the Power Management Capability, then VF will assume that its current power state is D0 when the PF is D3hot and in this case, the behavior is undefined. To support PF power management, we need to create power management dependency between PF and its VF's. The runtime power management support may help with this where power management dependencies are supported through device links. But till we have such support in place, we can disallow the PF to go into low power state, if PF has VF enabled. There can be a case, where user first enables the VF's and then disables the VF's. If there is no user of PF, then the PF can put into D3hot state again. But with this patch, the PF will still be in D0 state after disabling VF's since detecting this case inside vfio_pci_core_sriov_configure() requires access to struct vfio_device::open_count along with its locks. But the subsequent patches related with runtime PM will handle this case since runtime PM maintains its own usage count. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_core.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index f3dfb033e1c4..1271728a09db 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -217,6 +217,10 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat bool needs_restore = false, needs_save = false; int ret; + /* Prevent changing power state for PFs with VFs enabled */ + if (pci_num_vf(pdev) && state > PCI_D0) + return -EBUSY; + if (vdev->needs_pm_restore) { if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) { pci_save_state(pdev); @@ -1959,6 +1963,13 @@ int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn) } list_add_tail(&vdev->sriov_pfs_item, &vfio_pci_sriov_pfs); mutex_unlock(&vfio_pci_sriov_pfs_mutex); + + /* + * The PF power state should always be higher than the VF power + * state. If PF is in the low power state, then change the + * power state to D0 first before enabling SR-IOV. + */ + vfio_pci_set_power_state(vdev, PCI_D0); ret = pci_enable_sriov(pdev, nr_virtfn); if (ret) goto out_del; From patchwork Mon Apr 25 09:26:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12825485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD864C4332F for ; Mon, 25 Apr 2022 09:27:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238340AbiDYJab (ORCPT ); Mon, 25 Apr 2022 05:30:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237104AbiDYJ35 (ORCPT ); Mon, 25 Apr 2022 05:29:57 -0400 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2089.outbound.protection.outlook.com [40.107.220.89]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1A165598; Mon, 25 Apr 2022 02:26:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Tpl9m74kzedAseSsosC47E5P297qj7bGdo6rA4MgpQ4k7RB6TbJlltMnVYjPb5ViOp3zeGi0gLtmzLfZhIPWeu6fn7PzvqG0PyvD+kdvRtF0ADwvbtOr5oLigFJLfbxJHDDcpj08GmeW2ba5Fqc+YjImjXAkdgaKGuqRUwlXAJpdKc7eMQfGZ+7IKCOiz/KHWr3/Q/v08O3V0ELewITwOXDvfRfV3NCpFfgc5ZyBuVTvZAeLYfXv2VS67Fp/1vGSvy55+uxW/Z1BoM9NodA9JVYoiowT9ONGQWSyM+WFuVIlNNvD6Mxil/vNWKp5ueR6Fqhd82IVb1rUQ5HV8h86+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XF4DQJwVPGO1/HrAGjYmvpK5TFkOTisofm1vz5hCgQs=; b=E/QSbr+n9YmON7o5zU6QP1P76iBAaFowysocd+eOd6PdEFoJN/+wRwpNdxEG5rmfphn6Y9cL+0eOCA5xUbmxv1Zizl9/pghQD/G94H8mw31ntCo6bLYaknpQij6lWJMbRujS721xTwW2uZmVj55/ABpQ8pIVdmO4TkS2KPkNrv87PfSlZ5N4dCJRiTT+HMU1ZxSQ12q7XqFJTLAnB/6es9GGTuFmqiylDguAeGFY1g+PiOAKxLVU8S8ZagGv94XOjvjZQZ4b3H41qKKGBd36Otxl5ATreEATpe+bisXrOVsZK+LJ0k/+EefYqhxalp792waKuU5sZ7tRenHlKk+aYg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.235) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XF4DQJwVPGO1/HrAGjYmvpK5TFkOTisofm1vz5hCgQs=; b=AnHLXA/8BkqUbNzU8K3ITMs4ZiTbtW3Tq10lumydYDC2vaMOCXGQ6mD1SV7W/deogJzBr1//oWdDTM524yKsqNKiGReLGhySn/SmPJcpfFMwCEmKEHLtxxqByHt4Qu8bqSvaPgDs+RUdXmEZ6FEHTRglouhWUfVSTocq4LsgMLcM7VmGTqlx8a8LvH1R5VcUPi+sy13eHCqNrlXOrJ7ZgiBpXTgR1IFkfDEaO/KEFtdcRhcg6zm509lbiK5m/88+8jL7obXygL7W6O270bt7H9qUf0Ix0w5HlvEBt9SudfLggqGiMpmnYXr9bvV0CKJznjQBMMdLdIO5X0S4q3Q6BA== Received: from BN9PR03CA0874.namprd03.prod.outlook.com (2603:10b6:408:13c::9) by MWHPR12MB1837.namprd12.prod.outlook.com (2603:10b6:300:113::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14; Mon, 25 Apr 2022 09:26:47 +0000 Received: from BN8NAM11FT050.eop-nam11.prod.protection.outlook.com (2603:10b6:408:13c:cafe::5d) by BN9PR03CA0874.outlook.office365.com (2603:10b6:408:13c::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.15 via Frontend Transport; Mon, 25 Apr 2022 09:26:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.235) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.235 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.235; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.235) by BN8NAM11FT050.mail.protection.outlook.com (10.13.177.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5186.14 via Frontend Transport; Mon, 25 Apr 2022 09:26:47 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by DRHQMAIL107.nvidia.com (10.27.9.16) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Mon, 25 Apr 2022 09:26:44 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Mon, 25 Apr 2022 02:26:44 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.6) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Mon, 25 Apr 2022 02:26:39 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v3 3/8] vfio/pci: Virtualize PME related registers bits and initialize to zero Date: Mon, 25 Apr 2022 14:56:10 +0530 Message-ID: <20220425092615.10133-4-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220425092615.10133-1-abhsahu@nvidia.com> References: <20220425092615.10133-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c66211fe-47ed-4316-15e8-08da269db86a X-MS-TrafficTypeDiagnostic: MWHPR12MB1837:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: GX4aUNa4C/6x1NkGCtoVuphpEzOMPPYvVg/CjZwtlMfpNAzH7Vod7jd3P0ks/m8fCE24JEAN3Ll1wWfcbOkPaxnuSWAfB7eZKQJeX3bKmVgnrRG+r2IJH8G9P9O0i09k7cNeSDuuJq5syvlfRKC+jwso405Hrucnwqta0FUvOGcJNeV6SIJJLva/BzJAgm8zvdkI92vqSY2UoA5itCz+Uz81LdP92LyC1L9jQlqg/IgKWRljJk5EpCVYVkHATJ7MZnTiAbjPGpxhujtx4g4P5gFXptF4MWJ+0HdAW/vJYNZCv6Ld1kTaaaB2ndKIMrcWNxj55KHvY8UzL9Qveihk5hQ4xzwQuAmRlobLJcFy1MQTiUE77C99H0TNiHyNB7vHsOpgWlbji+vFhIBhJE2MjYOLbRlXMWMdQ30H5DS9B0EdSm0l+dGdVwCE3ckruptl7NBTM6Efm2fEfhDa3lt5Jf5X3BkrSVetVZKVhjeT5oCB5qSl/o3zB9MAhii8lRzRjKCOl52oUE4jpWL30+jXae7rEu8bPRHXqxsh6JYjyt6aOQgHjTmIGPNkl3GTTmrlJQptlTVhBkrjB+PnLQVmIUEVRLfP1BMS8fgrnRSSjbQgrqJtUhq0LNUszajtqZn+y6MaFjmCDsBsitBhDBYOP+MLj1b56P9jaZSi3+upC3JlwjVuyHhkzePlSF6wAg3JpaLllY9GW5XLA/f7L5kDjw== X-Forefront-Antispam-Report: CIP:12.22.5.235;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(40470700004)(36840700001)(46966006)(83380400001)(2906002)(70586007)(6666004)(336012)(508600001)(426003)(316002)(8676002)(4326008)(107886003)(1076003)(2616005)(8936002)(5660300002)(7416002)(7696005)(40460700003)(47076005)(86362001)(70206006)(26005)(186003)(54906003)(81166007)(110136005)(356005)(36860700001)(82310400005)(36756003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Apr 2022 09:26:47.0309 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c66211fe-47ed-4316-15e8-08da269db86a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.235];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT050.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR12MB1837 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If any PME event will be generated by PCI, then it will be mostly handled in the host by the root port PME code. For example, in the case of PCIe, the PME event will be sent to the root port and then the PME interrupt will be generated. This will be handled in drivers/pci/pcie/pme.c at the host side. Inside this, the pci_check_pme_status() will be called where PME_Status and PME_En bits will be cleared. So, the guest OS which is using vfio-pci device will not come to know about this PME event. To handle these PME events inside guests, we need some framework so that if any PME events will happen, then it needs to be forwarded to virtual machine monitor. We can virtualize PME related registers bits and initialize these bits to zero so vfio-pci device user will assume that it is not capable of asserting the PME# signal from any power state. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_config.c | 33 +++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index dd557edae6e1..af0ae80ef324 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -755,12 +755,29 @@ static int __init init_pci_cap_pm_perm(struct perm_bits *perm) */ p_setb(perm, PCI_CAP_LIST_NEXT, (u8)ALL_VIRT, NO_WRITE); + /* + * The guests can't process PME events. If any PME event will be + * generated, then it will be mostly handled in the host and the + * host will clear the PME_STATUS. So virtualize PME_Support bits. + * The vconfig bits will be cleared during device capability + * initialization. + */ + p_setw(perm, PCI_PM_PMC, PCI_PM_CAP_PME_MASK, NO_WRITE); + /* * Power management is defined *per function*, so we can let * the user change power state, but we trap and initiate the * change ourselves, so the state bits are read-only. + * + * The guest can't process PME from D3cold so virtualize PME_Status + * and PME_En bits. The vconfig bits will be cleared during device + * capability initialization. */ - p_setd(perm, PCI_PM_CTRL, NO_VIRT, ~PCI_PM_CTRL_STATE_MASK); + p_setd(perm, PCI_PM_CTRL, + PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS, + ~(PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS | + PCI_PM_CTRL_STATE_MASK)); + return 0; } @@ -1429,6 +1446,17 @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo return 0; } +static void vfio_update_pm_vconfig_bytes(struct vfio_pci_core_device *vdev, + int offset) +{ + __le16 *pmc = (__le16 *)&vdev->vconfig[offset + PCI_PM_PMC]; + __le16 *ctrl = (__le16 *)&vdev->vconfig[offset + PCI_PM_CTRL]; + + /* Clear vconfig PME_Support, PME_Status, and PME_En bits */ + *pmc &= ~cpu_to_le16(PCI_PM_CAP_PME_MASK); + *ctrl &= ~cpu_to_le16(PCI_PM_CTRL_PME_ENABLE | PCI_PM_CTRL_PME_STATUS); +} + static int vfio_fill_vconfig_bytes(struct vfio_pci_core_device *vdev, int offset, int size) { @@ -1552,6 +1580,9 @@ static int vfio_cap_init(struct vfio_pci_core_device *vdev) if (ret) return ret; + if (cap == PCI_CAP_ID_PM) + vfio_update_pm_vconfig_bytes(vdev, pos); + prev = &vdev->vconfig[pos + PCI_CAP_LIST_NEXT]; pos = next; caps++; From patchwork Mon Apr 25 09:26:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12825486 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C836FC433EF for ; Mon, 25 Apr 2022 09:27:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231445AbiDYJag (ORCPT ); Mon, 25 Apr 2022 05:30:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236157AbiDYJaZ (ORCPT ); Mon, 25 Apr 2022 05:30:25 -0400 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2059.outbound.protection.outlook.com [40.107.243.59]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9E3713F75; Mon, 25 Apr 2022 02:26:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CgHJCO9mvzZg3lt1PNZVG2xAVyR6T3m+VMESaQcEvgu6SF2KoorgJLdXpUVeoXHVqDYWm4JyBLqQN5ccfLa+yXsrFt0wqLZ71H/KVesSKBXhJokb/RKN4eNoR4ZR9ytPE/YrQtqzG3fyBX1ebO7VGzp5gK2QXKQXCsK9kaqYNdjbmoiE+Xd3MpmtC8V2Z9F+xpezv8MK6/izuU5o4DKJgEHiIretrqRT87GWEl8Iyf2CpcQBFsrFJlsbZ714zYAFonvn8MQMZIAZWvswcYoz6i0UebuXNlw0d2ihorq5c6EbtmFUPx+Vg1k6oi5yO01mCcjktgiZRVNSa80u7OdKNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HgZFhebwxLIINx9DwHJah7HYPOPuycDCyeEL58Ba2to=; b=CaxSf2aLM/ChXps2tj08DTnIsVYth2ug0sge93k9KZ4797ffa1XGBBSFAWuKw/9YENFvVkTAnWm+l/F83CmwcAyGgNIhFNSUoBzfjZxPKU7ieWvzBBHlUgQTzO++tQDTsJoovK/5loWh+04SUfRNh6OvhstJY7318Ni6/mtOWJ00y9SWGGvMPlK3+clN8hArOyb72G3adMYJP1ewIw0wlbApy5F6bY2geIvqNo2ptO5+cwQkxRoHhrOVQ+MQmzGVwKw5gHbOdkLYG+6W4deH5DJF2cYzc7Ess3z5oaEtDtS5i1mAnRRSDMhX++7shQeDB+g0dAVcJcJIwotlUrvqMg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.238) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HgZFhebwxLIINx9DwHJah7HYPOPuycDCyeEL58Ba2to=; b=n7jZ74JXHfAKdOMhsXpasYk5zwGTcB0piykVwt1UC9abrt7/QZ3ig5JC88/sCyJqk/BUv2ET/2Gqu1jQ0bxCJax94KHpmMzEKkbbB6CenucTbrV3gqOnYB8kiXaaKOxLZXmdmHQlsDN3vGSBN+jBVb1VfSnnGpN2M8yzIEl5Fq2oZc70Rc0hZkyVndOHdrYY95Bdo/782o5x0Pa6NCynd5w/DgxSqeFVtcep3wjAp7nTfaG7Om6S1hYYm/68oRX1nPL67A7ztHC+O7qstfn9cL3qv7RVIOgCjG2lWNgM8PJ63kRyKSAPfKXFjNvu6opUPMbImMnmkvVetchaMpntRg== Received: from MW4PR03CA0013.namprd03.prod.outlook.com (2603:10b6:303:8f::18) by CH2PR12MB4088.namprd12.prod.outlook.com (2603:10b6:610:a5::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14; Mon, 25 Apr 2022 09:26:51 +0000 Received: from CO1NAM11FT035.eop-nam11.prod.protection.outlook.com (2603:10b6:303:8f:cafe::a4) by MW4PR03CA0013.outlook.office365.com (2603:10b6:303:8f::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.14 via Frontend Transport; Mon, 25 Apr 2022 09:26:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.238) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.238 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.238; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.238) by CO1NAM11FT035.mail.protection.outlook.com (10.13.175.36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5186.14 via Frontend Transport; Mon, 25 Apr 2022 09:26:51 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by DRHQMAIL105.nvidia.com (10.27.9.14) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Mon, 25 Apr 2022 09:26:50 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Mon, 25 Apr 2022 02:26:49 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.6) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Mon, 25 Apr 2022 02:26:44 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v3 4/8] vfio/pci: Add support for setting driver data inside core layer Date: Mon, 25 Apr 2022 14:56:11 +0530 Message-ID: <20220425092615.10133-5-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220425092615.10133-1-abhsahu@nvidia.com> References: <20220425092615.10133-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: d5bfe851-0c87-447d-646a-08da269dbabf X-MS-TrafficTypeDiagnostic: CH2PR12MB4088:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: cHGAbtC97xCSpPGOOBnZ8eBB98bfyQxfXyHOANvor/mPqiN5uB5sOTYmB25+7eYgWl+xBbAL5YscHN6PENBFT8GHtDqIwc8CrvF492GAjMqoSZNDYx7kMXH1rJefb5yAF5NqwYNBfzgjCu4+OBXCc2FlbBwcmQNG5nY0/WK0jl5Inxr+XmsS9n6838qAuVy2xPFZDP6I+PosBP+TkCjvT0qlfwxNMOn1toz2tUa2izSfsKi6bksh+B2Y/xrLtR/3K4xcA7CdqPe+uvm5ZcOJCgff46TXArcwAjDKazKL2wvHCgp0U2JJdksyT4nevs35KSX/RYIMcUSjcS8+M0egeXEERd6txAv5rcvIO965AeWYhS5dPFsPEbZcNz7KXDMI18NRNvsOHRicddoM7bfymPqVd0r4xNAXzkF/joXvUGw8Qdf8Rw0rUZUuhWyqkmw14Pk06GxagRitAe5nySo6+58rpl1rbnGiSH8CZ3gEXY4R9w9OdbhekSXc8OwPWaiVSJC3C/BdcD1lewVc4Z2u3TXT7Mbp0jZCQzsEQSmR/F10nPMyzjyxWpRKlbzjlmz7O2sNa99qH0dgpb/WHFwtpOdLMaonMvS2f2OWRXnomi5X+/imWWfm6tSEAROTHkwCNJuplptpfqDjWgx0kIi98HjGcBpAuJNbmsAuQ2k3EjUC9lF2HK42nPXlUnA84eePC2PrfKFOBAzoSJw4PdqFLA== X-Forefront-Antispam-Report: CIP:12.22.5.238;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(40470700004)(36840700001)(46966006)(4326008)(508600001)(2906002)(5660300002)(8676002)(7416002)(7696005)(2616005)(426003)(186003)(336012)(83380400001)(47076005)(1076003)(86362001)(107886003)(8936002)(82310400005)(6666004)(26005)(36860700001)(70206006)(70586007)(81166007)(40460700003)(36756003)(110136005)(356005)(316002)(54906003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Apr 2022 09:26:51.0235 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d5bfe851-0c87-447d-646a-08da269dbabf X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.238];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT035.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4088 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The vfio driver is divided into two layers: core layer (implemented in vfio_pci_core.c) and parent driver (For example, vfio_pci, mlx5_vfio_pci, hisi_acc_vfio_pci, etc.). All the parent driver calls dev_set_drvdata() and assigns its own structure as driver data. Some of the callback functions are implemented in the core layer and these callback functions provide the reference of 'struct pci_dev' or 'struct device'. Currently, we use vfio_device_get_from_dev() which provides reference to the vfio_device for a device. But this function follows long path to extract the same. There are few cases, where we don't need to go through this long path if we get this through drvdata. This patch moves the setting of drvdata inside the core layer. If we see the current implementation of parent driver structure implementation, then 'struct vfio_pci_core_device' is a first member so the pointer of the parent structure and 'struct vfio_pci_core_device' should be the same. struct hisi_acc_vf_core_device { struct vfio_pci_core_device core_device; ... }; struct mlx5vf_pci_core_device { struct vfio_pci_core_device core_device; ... }; The vfio_pci.c uses 'struct vfio_pci_core_device' itself. To support getting the drvdata in both the layers, we can put the restriction to make 'struct vfio_pci_core_device' as a first member. Also, vfio_pci_core_register_device() has this validation which makes sure that this prerequisite is always satisfied. Signed-off-by: Abhishek Sahu --- .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 4 ++-- drivers/vfio/pci/mlx5/main.c | 3 +-- drivers/vfio/pci/vfio_pci.c | 4 ++-- drivers/vfio/pci/vfio_pci_core.c | 24 ++++++++++++++++--- include/linux/vfio_pci_core.h | 7 +++++- 5 files changed, 32 insertions(+), 10 deletions(-) diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c index 767b5d47631a..c76c09302a8f 100644 --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c @@ -1274,11 +1274,11 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device &hisi_acc_vfio_pci_ops); } - ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device); + ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device, + hisi_acc_vdev); if (ret) goto out_free; - dev_set_drvdata(&pdev->dev, hisi_acc_vdev); return 0; out_free: diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index bbec5d288fee..8689248f66f3 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -614,11 +614,10 @@ static int mlx5vf_pci_probe(struct pci_dev *pdev, } } - ret = vfio_pci_core_register_device(&mvdev->core_device); + ret = vfio_pci_core_register_device(&mvdev->core_device, mvdev); if (ret) goto out_free; - dev_set_drvdata(&pdev->dev, mvdev); return 0; out_free: diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 2b047469e02f..e0f8027c5cd8 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -151,10 +151,10 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) return -ENOMEM; vfio_pci_core_init_device(vdev, pdev, &vfio_pci_ops); - ret = vfio_pci_core_register_device(vdev); + ret = vfio_pci_core_register_device(vdev, vdev); if (ret) goto out_free; - dev_set_drvdata(&pdev->dev, vdev); + return 0; out_free: diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 1271728a09db..953ac33b2f5f 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1822,9 +1822,11 @@ void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev) } EXPORT_SYMBOL_GPL(vfio_pci_core_uninit_device); -int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) +int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev, + void *driver_data) { struct pci_dev *pdev = vdev->pdev; + struct device *dev = &pdev->dev; int ret; if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) @@ -1843,6 +1845,17 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) return -EBUSY; } + /* + * The 'struct vfio_pci_core_device' should be the first member of the + * of the structure referenced by 'driver_data' so that it can be + * retrieved with dev_get_drvdata() inside vfio-pci core layer. + */ + if ((struct vfio_pci_core_device *)driver_data != vdev) { + pci_warn(pdev, "Invalid driver data\n"); + return -EINVAL; + } + dev_set_drvdata(dev, driver_data); + if (pci_is_root_bus(pdev->bus)) { ret = vfio_assign_device_set(&vdev->vdev, vdev); } else if (!pci_probe_reset_slot(pdev->slot)) { @@ -1856,10 +1869,10 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) } if (ret) - return ret; + goto out_drvdata; ret = vfio_pci_vf_init(vdev); if (ret) - return ret; + goto out_drvdata; ret = vfio_pci_vga_init(vdev); if (ret) goto out_vf; @@ -1890,6 +1903,8 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) vfio_pci_set_power_state(vdev, PCI_D0); out_vf: vfio_pci_vf_uninit(vdev); +out_drvdata: + dev_set_drvdata(dev, NULL); return ret; } EXPORT_SYMBOL_GPL(vfio_pci_core_register_device); @@ -1897,6 +1912,7 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_register_device); void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev) { struct pci_dev *pdev = vdev->pdev; + struct device *dev = &pdev->dev; vfio_pci_core_sriov_configure(pdev, 0); @@ -1907,6 +1923,8 @@ void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev) if (!disable_idle_d3) vfio_pci_set_power_state(vdev, PCI_D0); + + dev_set_drvdata(dev, NULL); } EXPORT_SYMBOL_GPL(vfio_pci_core_unregister_device); diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 505b2a74a479..3c7d65e68340 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -225,7 +225,12 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev); void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev, struct pci_dev *pdev, const struct vfio_device_ops *vfio_pci_ops); -int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev); +/* + * The 'struct vfio_pci_core_device' should be the first member + * of the structure referenced by 'driver_data'. + */ +int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev, + void *driver_data); void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev); void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev); int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn); From patchwork Mon Apr 25 09:26:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12825490 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AB09C433EF for ; Mon, 25 Apr 2022 09:28:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237909AbiDYJbp (ORCPT ); Mon, 25 Apr 2022 05:31:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238078AbiDYJaa (ORCPT ); Mon, 25 Apr 2022 05:30:30 -0400 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (mail-bn8nam08on2070.outbound.protection.outlook.com [40.107.100.70]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D21A8205D7; Mon, 25 Apr 2022 02:26:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XDxGpwGzeKjYoTiIULkKsbSxs3086mh8Fn5AD81NYexDajA4miXXjsavdYg1dKgBNeF/pPYtT6TjPqC5cuOBdJX0f0M3OpteWOYd4OOGP2BTTbMB9sEdwhdoI5VdZPzXyoW6yr5Y8hjwMuROvyaiZkkSREFDx+xaiy13R6YEa4055k85m/oIJAzuc9HBdoQs7vibrzmYzgzrKnEwoVGrupH4PmfI6Wg4ljKW6WqWfVSlQPlL9OJrDQT1/VFx43oYNhjoC+VqSm4GYqMVAvdMFNG42PMCpeSBIJonAkJ/NJCyAj9/4fTLJQuUKmNiostvrgG96LbgQ5Vlfr0SoT6QJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HtFV40+9YXfDIIQBOAFSyepWqMunY6MokbyKS6vXFuY=; b=kSaJEJrQkZK3rCKpn8+ba/ZQyQd2L8CzWYW2nQ/It9M0btLbWB3ytaQ4JDBScKPeLD+GFvnXM4trzSudWlN+fchUWxudhSYPT6Id1m9PLgmNeK2wDRxFWDj2E3aS7SYu/tRw2rIXH7CMT4up/IjkUcDMPPH0aBm7+62usxNBK4+BiHNHSYJ+c/cGcSuXE57ERhK6nzkb5zmXDGFZrqonYmwwD2iq6gZ82h8MMjPSwgzoT2MEPMB5Qt8SF2vAq+3nucrmqxTBDTWvnf+cpyYR61x3BboRK0UGQeP4ob67TPbAL8T6rXnW1EgDZ1bQodMP4zTbwIMS7n7yQF353RJh0Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.234) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HtFV40+9YXfDIIQBOAFSyepWqMunY6MokbyKS6vXFuY=; b=mBUToX5iDxbp0xgaImARgrQXq19ZY9veQFdTzPqhbkglaq3sQhPeaLjwe+mzqPwSYB5uxzfDPPEdIIy1cbBs2zdERFlzP3QtWQgyurYHWY/aivsLo2jOYmR4MI6hEQ75ZIMsEqk75I01+lp1d/iaHJTbYrqrbV6q5l9G474brAoIQPEmAf0YfetzTkcbN/fjtau4QPi9mwUMXQAxGTjkS6zacazjuIMdRKa7QUXjHgB/7T2ek42u9+qENvwl0Ap8og63bIxZaIk0LxtxqEwRq+EGbZDmHGSPyYRajvHV121Icy5nfRx+3r0wfl04cGeWgp6mLCEp4aN0JH1UHTiTaA== Received: from DM6PR03CA0083.namprd03.prod.outlook.com (2603:10b6:5:333::16) by MN2PR12MB2877.namprd12.prod.outlook.com (2603:10b6:208:ae::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.13; Mon, 25 Apr 2022 09:26:57 +0000 Received: from DM6NAM11FT037.eop-nam11.prod.protection.outlook.com (2603:10b6:5:333:cafe::26) by DM6PR03CA0083.outlook.office365.com (2603:10b6:5:333::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.13 via Frontend Transport; Mon, 25 Apr 2022 09:26:57 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.234) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.234 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.234; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.234) by DM6NAM11FT037.mail.protection.outlook.com (10.13.172.122) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5186.14 via Frontend Transport; Mon, 25 Apr 2022 09:26:56 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by DRHQMAIL101.nvidia.com (10.27.9.10) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Mon, 25 Apr 2022 09:26:56 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Mon, 25 Apr 2022 02:26:55 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.6) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Mon, 25 Apr 2022 02:26:50 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v3 5/8] vfio/pci: Enable runtime PM for vfio_pci_core based drivers Date: Mon, 25 Apr 2022 14:56:12 +0530 Message-ID: <20220425092615.10133-6-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220425092615.10133-1-abhsahu@nvidia.com> References: <20220425092615.10133-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 28326f7c-7642-4f71-89f0-08da269dbe3b X-MS-TrafficTypeDiagnostic: MN2PR12MB2877:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 2gV/WxwgX/1vZ+iG9Xrhd86dAkKWItwr4aQBx67p7EmYfj/DXNCKGRBHg/TTg4c1oX5MvvvIO9vIZmHB9pz6HDm6LVUzBk8L0V6H5BY7bqwLbSd67hGVyGoH9+7gK6tmJbypHCUKP4IotuWUwb/aETFeyrmHWSzvdVtj5MYUa0KTBW732Ft+wWjfZGGqCtYnwZKv6GqkpDS4jDmObEIwAny2i4K55IgD7pXQbz8bqzhzfDLmLu2BE4qltRBXzvCEVH4UBRPyFzcGmTgJ+pe0D0UqnzYYEfPqpFzXyMshw0TvVyc9IOQ3Z6b26k6Sfw4Z3EeLKeCB+YKmymy6gblqYSxxEhF5Uvr6ueuTriGQfhVUACvyvqja6ADnHTqaefNcbeS+KJDZiQTGoQaY2gSRpoOmBVFiYftQsDToouaVa4GWfKhgjhspmvcdKU3VtL70ZeNeO8KN27PbiRQD31+wJZE36RgKik6dpMLyq7gXlg3kLNY32bCudQGsLEH7L0ifd/mg1EaSoo6PLQUCi2dgAlhyOyts7s27RDfUoxGARHhvrfY5bEB8irJ4dfpWZnlw9BJ/bO8fLp1n92clx3GczIW+lnYBEaUwMuI/XTR9rseHNGH9IFWGed5fTq1DPJ2Y+wVH8WjPe8iVNO1Thgm5CbgCsrj5C2BI7B2sevsZgFIo9/QjAFkk3pbyl/fIAeDZV0BGuK2fSjLxqlTVMiEGbw== X-Forefront-Antispam-Report: CIP:12.22.5.234;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(110136005)(316002)(54906003)(81166007)(26005)(8936002)(5660300002)(30864003)(47076005)(70206006)(70586007)(7696005)(4326008)(356005)(6666004)(508600001)(86362001)(186003)(7416002)(40460700003)(83380400001)(336012)(426003)(8676002)(107886003)(82310400005)(2616005)(36860700001)(36756003)(1076003)(2906002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Apr 2022 09:26:56.7921 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 28326f7c-7642-4f71-89f0-08da269dbe3b X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.234];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT037.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB2877 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Currently, there is very limited power management support available in the upstream vfio_pci_core based drivers. If there are no users of the device, then the PCI device will be moved into D3hot state by writing directly into PCI PM registers. This D3hot state help in saving power but we can achieve zero power consumption if we go into the D3cold state. The D3cold state cannot be possible with native PCI PM. It requires interaction with platform firmware which is system-specific. To go into low power states (including D3cold), the runtime PM framework can be used which internally interacts with PCI and platform firmware and puts the device into the lowest possible D-States. This patch registers vfio_pci_core based drivers with the runtime PM framework. 1. The PCI core framework takes care of most of the runtime PM related things. For enabling the runtime PM, the PCI driver needs to decrement the usage count and needs to provide 'struct dev_pm_ops' at least. The runtime suspend/resume callbacks are optional and needed only if we need to do any extra handling. Now there are multiple vfio_pci_core based drivers. Instead of assigning the 'struct dev_pm_ops' in individual parent driver, the vfio_pci_core itself assigns the 'struct dev_pm_ops'. There are other drivers where the 'struct dev_pm_ops' is being assigned inside core layer (For example, wlcore_probe() and some sound based driver, etc.). 2. This patch provides the stub implementation of 'struct dev_pm_ops'. The subsequent patch will provide the runtime suspend/resume callbacks. All the config state saving, and PCI power management related things will be done by PCI core framework itself inside its runtime suspend/resume callbacks (pci_pm_runtime_suspend() and pci_pm_runtime_resume()). 3. Inside pci_reset_bus(), all the devices in dev_set needs to be runtime resumed. vfio_pci_dev_set_pm_runtime_get() will take care of the runtime resume and its error handling. 4. Inside vfio_pci_core_disable(), the device usage count always needs to be decremented which was incremented in vfio_pci_core_enable(). 5. Since the runtime PM framework will provide the same functionality, so directly writing into PCI PM config register can be replaced with the use of runtime PM routines. Also, the use of runtime PM can help us in more power saving. In the systems which do not support D3cold, With the existing implementation: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3hot So, with runtime PM, the upstream bridge or root port will also go into lower power state which is not possible with existing implementation. In the systems which support D3cold, // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3cold // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3cold So, with runtime PM, both the PCI device and upstream bridge will go into D3cold state. 6. If 'disable_idle_d3' module parameter is set, then also the runtime PM will be enabled, but in this case, the usage count should not be decremented. 7. vfio_pci_dev_set_try_reset() return value is unused now, so this function return type can be changed to void. 8. Use the runtime PM API's in vfio_pci_core_sriov_configure(). For preventing any runtime usage mismatch, pci_num_vf() has been called explicitly during disable. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_core.c | 169 +++++++++++++++++++++---------- 1 file changed, 114 insertions(+), 55 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 953ac33b2f5f..aee5e0cd6137 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -156,7 +156,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev) } struct vfio_pci_group_info; -static bool vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set); +static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set); static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, struct vfio_pci_group_info *groups); @@ -261,6 +261,19 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat return ret; } +#ifdef CONFIG_PM +/* + * The dev_pm_ops needs to be provided to make pci-driver runtime PM working, + * so use structure without any callbacks. + * + * The pci-driver core runtime PM routines always save the device state + * before going into suspended state. If the device is going into low power + * state with only with runtime PM ops, then no explicit handling is needed + * for the devices which have NoSoftRst-. + */ +static const struct dev_pm_ops vfio_pci_core_pm_ops = { }; +#endif + int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) { struct pci_dev *pdev = vdev->pdev; @@ -268,21 +281,23 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) u16 cmd; u8 msix_pos; - vfio_pci_set_power_state(vdev, PCI_D0); + if (!disable_idle_d3) { + ret = pm_runtime_resume_and_get(&pdev->dev); + if (ret < 0) + return ret; + } /* Don't allow our initial saved state to include busmaster */ pci_clear_master(pdev); ret = pci_enable_device(pdev); if (ret) - return ret; + goto out_power; /* If reset fails because of the device lock, fail this path entirely */ ret = pci_try_reset_function(pdev); - if (ret == -EAGAIN) { - pci_disable_device(pdev); - return ret; - } + if (ret == -EAGAIN) + goto out_disable_device; vdev->reset_works = !ret; pci_save_state(pdev); @@ -306,12 +321,8 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) } ret = vfio_config_init(vdev); - if (ret) { - kfree(vdev->pci_saved_state); - vdev->pci_saved_state = NULL; - pci_disable_device(pdev); - return ret; - } + if (ret) + goto out_free_state; msix_pos = pdev->msix_cap; if (msix_pos) { @@ -332,6 +343,16 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) return 0; + +out_free_state: + kfree(vdev->pci_saved_state); + vdev->pci_saved_state = NULL; +out_disable_device: + pci_disable_device(pdev); +out_power: + if (!disable_idle_d3) + pm_runtime_put(&pdev->dev); + return ret; } EXPORT_SYMBOL_GPL(vfio_pci_core_enable); @@ -439,8 +460,11 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) out: pci_disable_device(pdev); - if (!vfio_pci_dev_set_try_reset(vdev->vdev.dev_set) && !disable_idle_d3) - vfio_pci_set_power_state(vdev, PCI_D3hot); + vfio_pci_dev_set_try_reset(vdev->vdev.dev_set); + + /* Put the pm-runtime usage counter acquired during enable */ + if (!disable_idle_d3) + pm_runtime_put(&pdev->dev); } EXPORT_SYMBOL_GPL(vfio_pci_core_disable); @@ -1879,19 +1903,24 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev, vfio_pci_probe_power_state(vdev); - if (!disable_idle_d3) { - /* - * pci-core sets the device power state to an unknown value at - * bootup and after being removed from a driver. The only - * transition it allows from this unknown state is to D0, which - * typically happens when a driver calls pci_enable_device(). - * We're not ready to enable the device yet, but we do want to - * be able to get to D3. Therefore first do a D0 transition - * before going to D3. - */ - vfio_pci_set_power_state(vdev, PCI_D0); - vfio_pci_set_power_state(vdev, PCI_D3hot); - } + /* + * pci-core sets the device power state to an unknown value at + * bootup and after being removed from a driver. The only + * transition it allows from this unknown state is to D0, which + * typically happens when a driver calls pci_enable_device(). + * We're not ready to enable the device yet, but we do want to + * be able to get to D3. Therefore first do a D0 transition + * before enabling runtime PM. + */ + vfio_pci_set_power_state(vdev, PCI_D0); + +#if defined(CONFIG_PM) + dev->driver->pm = &vfio_pci_core_pm_ops, +#endif + + pm_runtime_allow(dev); + if (!disable_idle_d3) + pm_runtime_put(dev); ret = vfio_register_group_dev(&vdev->vdev); if (ret) @@ -1900,7 +1929,9 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev, out_power: if (!disable_idle_d3) - vfio_pci_set_power_state(vdev, PCI_D0); + pm_runtime_get_noresume(dev); + + pm_runtime_forbid(dev); out_vf: vfio_pci_vf_uninit(vdev); out_drvdata: @@ -1922,8 +1953,9 @@ void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev) vfio_pci_vga_uninit(vdev); if (!disable_idle_d3) - vfio_pci_set_power_state(vdev, PCI_D0); + pm_runtime_get_noresume(dev); + pm_runtime_forbid(dev); dev_set_drvdata(dev, NULL); } EXPORT_SYMBOL_GPL(vfio_pci_core_unregister_device); @@ -1984,18 +2016,26 @@ int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn) /* * The PF power state should always be higher than the VF power - * state. If PF is in the low power state, then change the - * power state to D0 first before enabling SR-IOV. + * state. If PF is in the runtime suspended state, then resume + * it first before enabling SR-IOV. */ - vfio_pci_set_power_state(vdev, PCI_D0); - ret = pci_enable_sriov(pdev, nr_virtfn); + ret = pm_runtime_resume_and_get(&pdev->dev); if (ret) goto out_del; + + ret = pci_enable_sriov(pdev, nr_virtfn); + if (ret) { + pm_runtime_put(&pdev->dev); + goto out_del; + } ret = nr_virtfn; goto out_put; } - pci_disable_sriov(pdev); + if (pci_num_vf(pdev)) { + pci_disable_sriov(pdev); + pm_runtime_put(&pdev->dev); + } out_del: mutex_lock(&vfio_pci_sriov_pfs_mutex); @@ -2072,6 +2112,30 @@ vfio_pci_dev_set_resettable(struct vfio_device_set *dev_set) return pdev; } +static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set) +{ + struct vfio_pci_core_device *cur_pm; + struct vfio_pci_core_device *cur; + int ret = 0; + + list_for_each_entry(cur_pm, &dev_set->device_list, vdev.dev_set_list) { + ret = pm_runtime_resume_and_get(&cur_pm->pdev->dev); + if (ret < 0) + break; + } + + if (!ret) + return 0; + + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { + if (cur == cur_pm) + break; + pm_runtime_put(&cur->pdev->dev); + } + + return ret; +} + /* * We need to get memory_lock for each device, but devices can share mmap_lock, * therefore we need to zap and hold the vma_lock for each device, and only then @@ -2178,43 +2242,38 @@ static bool vfio_pci_dev_set_needs_reset(struct vfio_device_set *dev_set) * - At least one of the affected devices is marked dirty via * needs_reset (such as by lack of FLR support) * Then attempt to perform that bus or slot reset. - * Returns true if the dev_set was reset. */ -static bool vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set) +static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set) { struct vfio_pci_core_device *cur; struct pci_dev *pdev; - int ret; + bool reset_done = false; if (!vfio_pci_dev_set_needs_reset(dev_set)) - return false; + return; pdev = vfio_pci_dev_set_resettable(dev_set); if (!pdev) - return false; + return; /* - * The pci_reset_bus() will reset all the devices in the bus. - * The power state can be non-D0 for some of the devices in the bus. - * For these devices, the pci_reset_bus() will internally set - * the power state to D0 without vfio driver involvement. - * For the devices which have NoSoftRst-, the reset function can - * cause the PCI config space reset without restoring the original - * state (saved locally in 'vdev->pm_save'). + * Some of the devices in the bus can be in the runtime suspended + * state. Increment the usage count for all the devices in the dev_set + * before reset and decrement the same after reset. */ - list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) - vfio_pci_set_power_state(cur, PCI_D0); + if (!disable_idle_d3 && vfio_pci_dev_set_pm_runtime_get(dev_set)) + return; - ret = pci_reset_bus(pdev); - if (ret) - return false; + if (!pci_reset_bus(pdev)) + reset_done = true; list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { - cur->needs_reset = false; + if (reset_done) + cur->needs_reset = false; + if (!disable_idle_d3) - vfio_pci_set_power_state(cur, PCI_D3hot); + pm_runtime_put(&cur->pdev->dev); } - return true; } void vfio_pci_core_set_params(bool is_nointxmask, bool is_disable_vga, From patchwork Mon Apr 25 09:26:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12825487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F392FC433F5 for ; Mon, 25 Apr 2022 09:28:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239804AbiDYJbJ (ORCPT ); Mon, 25 Apr 2022 05:31:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238404AbiDYJac (ORCPT ); Mon, 25 Apr 2022 05:30:32 -0400 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2077.outbound.protection.outlook.com [40.107.223.77]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A741C24BC7; Mon, 25 Apr 2022 02:27:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=c7EA9TOlWudWx9KazzCBDgS5crG+qLjvADpidtSHA8KZxqsaLZlm6s/AkYU6xiOF8hWjtkwtM+ptHG+xi4+TyqZj8Ai8sMtWt21dQ9PQi1gq67SIoDe38D9gJkEwYpfpkNumjHwGGdk8M55yU4JpVQgj/Ojn4ZylC4AN/43+PjKestS+I0IctUtA6jEZEdzOb9mugnLjpdcWmTx+NY5ouUPJE+e6XdyUpmM2l2t+CNoyecb8n3uJUCp7Or5nXtGdESPwk4whozjN5iyEaFm+wd3+IDi2zLMiMzbL8MkqkBjoZy0czbdE7ZYTuD6XzHkXXUWTWXjQP7uRVCXvV7BMXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dzcKv0yCImOK30dpSshruBTbhscYD6Bewtj9ijd6x3g=; b=Z8HqdM5DNbudmB/dMqX1uzLlb9cFhGmS4Rt2mVyp7MIF1s9G2Rsee1rKNNO4Lk/bsnyu6cPrCIgR2UZM9Id/yG6cigs4g8yoQg33UhX8GkL7/e2r8ho0vknE5IOb1fH9fNFTTgk2QyZYsg9faLgRZf6UmTeqfjKtws4Jacd89smhHjK4M9Vd9zRZtLuqjqLewhsH/u8pLhfBOp3GZU2OjLkVUmVUO12eBEA+joyWojvzoQJvaO4K8YL4adL/rurQn3Jfd4Yf/3Ruqunv+Ebh0YUQ1gMmhiPBwvsej5kDBbb4hxDu21zVm3obZu9vhT/hSNNnCfl3C3TxMZ795QPwHQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dzcKv0yCImOK30dpSshruBTbhscYD6Bewtj9ijd6x3g=; b=exyUS3SRMP8OK87/NDteWqkXat1Z2M+YVmtYUo8SdIwv6BmbgAyPpM7OOM726DYD4EK1Z4ewA9qA0+T5Cu3WFtGZCfHOMbK7oSXFiOM5L9Vn/CwwfvzqaWtHXYPOifCgwlNMYqJ76q1gejgy3yBWOTfVxwl44d/qLQUlfziGNOetgXHAzvozB9FiU3om8/lYIlb/H1BthVemge8DPKMzI5m1ZgNLbr7YCV7NwBhZxMGxIAtgVkkhrh7wLZWXrreeo27XDs/oyQUqcBdatoMl8QhSSw5O+41Jgc0STPsByVv1/BjUktOgXq01X+HC4M4rN/K05rC/0CUKAmK9mnZIaQ== Received: from MW4PR04CA0363.namprd04.prod.outlook.com (2603:10b6:303:81::8) by BL1PR12MB5045.namprd12.prod.outlook.com (2603:10b6:208:310::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.13; Mon, 25 Apr 2022 09:27:03 +0000 Received: from CO1NAM11FT064.eop-nam11.prod.protection.outlook.com (2603:10b6:303:81:cafe::42) by MW4PR04CA0363.outlook.office365.com (2603:10b6:303:81::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.15 via Frontend Transport; Mon, 25 Apr 2022 09:27:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.236) by CO1NAM11FT064.mail.protection.outlook.com (10.13.175.77) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5186.14 via Frontend Transport; Mon, 25 Apr 2022 09:27:02 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Mon, 25 Apr 2022 09:27:01 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Mon, 25 Apr 2022 02:27:01 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.6) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Mon, 25 Apr 2022 02:26:56 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v3 6/8] vfio: Invoke runtime PM API for IOCTL request Date: Mon, 25 Apr 2022 14:56:13 +0530 Message-ID: <20220425092615.10133-7-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220425092615.10133-1-abhsahu@nvidia.com> References: <20220425092615.10133-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a117a7fd-6d36-40fb-5f7b-08da269dc176 X-MS-TrafficTypeDiagnostic: BL1PR12MB5045:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: e907AcHMkYsJNzmZSbIwgnpB6SHmSHO1ih+2Uy/exhU4PtmIf1n1OlXgpO14UvC7nbZhwXl+qkiuHHfqDNJ73ds4IUa71kZJC1wXUpacjwuuPK0yMLrxM8pKoLkl1mH2zLJl0/2JVft13qzIV0tvLdTku2cLLac6urM0fhzvmndmlfk6ONHH26DgSd3fNKuYhYfgk5753Ar2W/oIsKs3RzFZ4NGdT183+sCpcHspCsTzqN4ZYpcvffE2MqSD3fdHaJ393AGaoeyYqvndz8A/DJY3/eMbHGzStDktomOiDeI4GCthMTW68fWvy4l1NUhanbRFAyRtgMXm3UoBEctr+q9wd8eDG+W5ld+YHL1gqnxu1zQym1lM3ZrYw3GXtGOLt19u9cXsdwO/dKXwbE+975aB3Gg7hL6pQ9E+e9lR6WOHG9/HRAjnKlvgN+7MVMQDemwiaqJ4mxsvxncit4UKLKLMSEQM6GXkvvDMu6EPMik2W7Lb/w/kK2nCTNmB93pmtEt6nu14Icv7S+L1VtiEgxDEOFn5qk65bgyeIcd37fJvsJuQ7leB2CI6UK31WQZVEqCPMRgDCVplAcdpmIi6ire4a1c+3VWWhu+SKhioR/yjA5eUrCBK+4eDejaf/qNZy+rcYQQ+qE4V/XQeNC8X+0Pay3ME8p34SmZOpyYWw7kB2n8NlqdePrlmNhdRi3KvHDiPkw2f6dYu6YlNqIiaZg== X-Forefront-Antispam-Report: CIP:12.22.5.236;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(46966006)(40470700004)(82310400005)(8676002)(107886003)(2906002)(83380400001)(86362001)(26005)(6666004)(7696005)(508600001)(54906003)(336012)(1076003)(186003)(110136005)(426003)(2616005)(316002)(4326008)(40460700003)(47076005)(70206006)(70586007)(8936002)(36756003)(81166007)(7416002)(356005)(36860700001)(5660300002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Apr 2022 09:27:02.2734 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a117a7fd-6d36-40fb-5f7b-08da269dc176 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.236];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT064.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5045 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The vfio/pci driver will have runtime power management support where the user can put the device low power state and then PCI devices can go into the D3cold state. If the device is in low power state and user issues any IOCTL, then the device should be moved out of low power state first. Once the IOCTL is serviced, then it can go into low power state again. The runtime PM framework manages this with help of usage count. One option was to add the runtime PM related API's inside vfio/pci driver but some IOCTL (like VFIO_DEVICE_FEATURE) can follow a different path and more IOCTL can be added in the future. Also, the runtime PM will be added for vfio/pci based drivers variant currently but the other vfio based drivers can use the same in the future. So, this patch adds the runtime calls runtime related API in the top level IOCTL function itself. For the vfio drivers which do not have runtime power management support currently, the runtime PM API's won't be invoked. Only for vfio/pci based drivers currently, the runtime PM API's will be invoked to increment and decrement the usage count. Taking this usage count incremented while servicing IOCTL will make sure that user won't put the device into low power state when any other IOCTL is being serviced in parallel. Signed-off-by: Abhishek Sahu --- drivers/vfio/vfio.c | 44 +++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index a4555014bd1e..4e65a127744e 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -32,6 +32,7 @@ #include #include #include +#include #include "vfio.h" #define DRIVER_VERSION "0.3" @@ -1536,6 +1537,30 @@ static const struct file_operations vfio_group_fops = { .release = vfio_group_fops_release, }; +/* + * Wrapper around pm_runtime_resume_and_get(). + * Return 0, if driver power management callbacks are not present i.e. the driver is not + * using runtime power management. + * Return 1 upon success, otherwise -errno + */ +static inline int vfio_device_pm_runtime_get(struct device *dev) +{ +#ifdef CONFIG_PM + int ret; + + if (!dev->driver || !dev->driver->pm) + return 0; + + ret = pm_runtime_resume_and_get(dev); + if (ret < 0) + return ret; + + return 1; +#else + return 0; +#endif +} + /* * VFIO Device fd */ @@ -1845,15 +1870,28 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) { struct vfio_device *device = filep->private_data; + int pm_ret, ret = 0; + + pm_ret = vfio_device_pm_runtime_get(device->dev); + if (pm_ret < 0) + return pm_ret; switch (cmd) { case VFIO_DEVICE_FEATURE: - return vfio_ioctl_device_feature(device, (void __user *)arg); + ret = vfio_ioctl_device_feature(device, (void __user *)arg); + break; default: if (unlikely(!device->ops->ioctl)) - return -EINVAL; - return device->ops->ioctl(device, cmd, arg); + ret = -EINVAL; + else + ret = device->ops->ioctl(device, cmd, arg); + break; } + + if (pm_ret) + pm_runtime_put(device->dev); + + return ret; } static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf, From patchwork Mon Apr 25 09:26:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12825488 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3C91C433EF for ; Mon, 25 Apr 2022 09:28:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238778AbiDYJbW (ORCPT ); Mon, 25 Apr 2022 05:31:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237691AbiDYJav (ORCPT ); Mon, 25 Apr 2022 05:30:51 -0400 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2070.outbound.protection.outlook.com [40.107.244.70]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AF4D24BD6; Mon, 25 Apr 2022 02:27:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bKBzcS1RuUC86l/gLM1W8Zy16YprI72azC1QpNc/roveBcLLKJ32BMVHSy10e6Fg+YC4q1QMjpg5ZeHPz2fgH7sbtlOD2hPlASnUdiWZO52pNqGpXr5VV/0APVSCH4ag67CoiklE+9m8AtDiB6LxQtnMJxszDZTigSilACwZZZJfwva6jVbnKTIV5wOWJRSBqED3qL7chVLQhr6xmz7UuhQ+tdZ9NBClYqJgyIMRphZiwObnmgTq3+dQAixK9hJgTL/MTaT1wnEh4KPXWjCRH1xQqYKEZ7wKn3jPFekcEZb6zRphf3rtyaCiORCyu2I7cu/rPf7mgdGTExPJfI45gQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9TPIPnQZa5SH/TIfHRA7ehZxHeGjHAWMLvX5fSqLB50=; b=gqCwVZyHjkdzm2ayAXob5iZVQF/Zoh+5yZZyJECDKUNfMmmwX9hgVdtQe7KXAiN0c8m/bMkLy+S+LBn29Ghsw+gMlMPP1XH9sq3afuGdHUwg6CNITAt2Cricfb0l6kJSKVn30H2ASWPOTHb2zz+tbpmjsHHhlyU43/s+nPTUfIVuN/lA3lPurmAj6JiZnZuZn8go2cxJUBpObVyZqc9I6zC6oteH1iyBb+0yYcIyAvcjmU5BnBQD7fur4E6OAfZlA5JrapwEgvK5W/oNoHdnHJ9qTFAqbxHTWageZGYYTjAG1/xp7ymytVXRlltwQRHg1sBtDku5jt4YCxAPmt831g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.235) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9TPIPnQZa5SH/TIfHRA7ehZxHeGjHAWMLvX5fSqLB50=; b=s6PlEkE8EZ6PEFYxWJwo2uugplUbBszfjScbqSsAjfp59x/keRxvyIDTT/BmW6Tq+fmyFguSXcERp9aOOMnQWjEhoEyyxrq4n2N0QmoO904w6tfuEZ9TtaOIBCAjAGE8pKhUybin8CNPr/l/f+hnNV8O5Q0jo/5DYNvWA6nDBksBYdiG/iIFrgBD9EprmvoYuy5hgO04KX9l+Z3dU6nJR8BY9bBJ+7gwqExJlCH/ouKibPxVSWzumuv33KAiDVl4AyRsEKNN/LF5xmZZn5KiQTqTIy3NmDOHMwyFunWZcv7znIMAZyV5P+MXOY440p0ntYjUU4xQP1pzEjL4gJEk0g== Received: from BN7PR02CA0023.namprd02.prod.outlook.com (2603:10b6:408:20::36) by DM4PR12MB5215.namprd12.prod.outlook.com (2603:10b6:5:397::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.15; Mon, 25 Apr 2022 09:27:08 +0000 Received: from BN8NAM11FT063.eop-nam11.prod.protection.outlook.com (2603:10b6:408:20:cafe::df) by BN7PR02CA0023.outlook.office365.com (2603:10b6:408:20::36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.13 via Frontend Transport; Mon, 25 Apr 2022 09:27:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.235) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.235 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.235; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.235) by BN8NAM11FT063.mail.protection.outlook.com (10.13.177.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5186.14 via Frontend Transport; Mon, 25 Apr 2022 09:27:07 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by DRHQMAIL107.nvidia.com (10.27.9.16) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Mon, 25 Apr 2022 09:27:07 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Mon, 25 Apr 2022 02:27:06 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.6) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Mon, 25 Apr 2022 02:27:01 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v3 7/8] vfio/pci: Mask INTx during runtime suspend Date: Mon, 25 Apr 2022 14:56:14 +0530 Message-ID: <20220425092615.10133-8-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220425092615.10133-1-abhsahu@nvidia.com> References: <20220425092615.10133-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: d7a846fe-0ebf-4cc3-2540-08da269dc4d5 X-MS-TrafficTypeDiagnostic: DM4PR12MB5215:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: rlTI9xvDQIDBDqXGcPDP5jmgGBC3qoj7fKL45kWYtZmkvr2X9yXMt1VUD9hG7UbAuDH9CCceIUsvGjUMFsRGTrhtPnvinNxLEDwGjb/K59y8M7yNWG7aCdiHXNaZlhh8O7hmrPCT4YLiGv51llPs6rdtRXdHPYrihzGqkjmCtewoiqB27uP/lXgLxWJG4r+Wshc/TrVJ4JoKI4rQ1kRajowMfzZGDLeIVUL0HD3uq+Ycb5eM4rNJnaW5GVURSaO3Fr7v4POV+bbmvkBQ9C4vUFSmEuv/CzCgcF8t5OeJCX5x8TV+POpriBGCOFA1oCjUozdboze5AlqCMa51c+Jpfa1CA3rESDp7a8KrFeOZ6v5rwuYeXe9hTJoEi/WVE0ILzDOdkRgOOgYCQpgLjoQH/CxaP+dCcorigHA1fh6i5HM/6gGlbw4+qgfE+mk7JnxrreeieUKLXqUT0/anK2pTWPgyT+DF9QdCaBosXGylfLf8nZUvakKljJ88jiREErRmOyULYqPlosbXpzVKPqu34O+YSTOOAaETKACBpf4LhBfIAQirdgf+H1TM6zZ4dMP+NtziUXwGixoSoIQvEG/QW2E7GubjO82uieh3o1TOIgDKUdDTN1BokYdCBEHKErKSLbK9NVW3VoPMdPCMqlpuqfcmsPCvbJWlN9OpiClVcLWbnbcPMl7QjfrygyVjR2JwPoNGUWUh3hVh4A2oN+A7oA== X-Forefront-Antispam-Report: CIP:12.22.5.235;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(2906002)(83380400001)(7696005)(7416002)(316002)(110136005)(107886003)(2616005)(356005)(426003)(47076005)(336012)(81166007)(6666004)(1076003)(5660300002)(186003)(36860700001)(508600001)(86362001)(26005)(8936002)(15650500001)(82310400005)(40460700003)(36756003)(54906003)(70206006)(4326008)(8676002)(70586007)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Apr 2022 09:27:07.8651 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d7a846fe-0ebf-4cc3-2540-08da269dc4d5 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.235];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT063.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB5215 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch just adds INTx handling during runtime suspend/resume and all the suspend/resume related code for the user to put device into low power state will be added in subsequent patches. The INTx are shared among devices. Whenever any INTx interrupt comes for the vfio devices, then vfio_intx_handler() will be called for each device. Inside vfio_intx_handler(), it calls pci_check_and_mask_intx() and checks if the interrupt has been generated for the current device. Now, if the device is already in D3cold state, then the config space can not be read. Attempt to read config space in D3cold state can cause system unresponsiveness in few systems. To Prevent this, mask INTx in runtime suspend callback and unmask the same in runtime resume callback. If INTx has been already masked, then no handling is needed in runtime suspend/resume callbacks. 'pm_intx_masked' tracks this and vfio_pci_intx_mask() has been updated to return true if INTx has been masked inside this function. For the runtime suspend which is triggered for the no user of vfio device, the is_intx() will return false and these callbacks won't do anything. The MSI/MSI-X are not shared so no handling should be needed for these. vfio_msihandler() triggers eventfd_signal() without doing any device specific config access and when user receives this signal then user tries to perform any config access or IOCTL, then the device will be moved to D0 state first before servicing any request. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_core.c | 35 +++++++++++++++++++++++++++---- drivers/vfio/pci/vfio_pci_intrs.c | 6 +++++- include/linux/vfio_pci_core.h | 3 ++- 3 files changed, 38 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index aee5e0cd6137..05a68ca9d9e7 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -262,16 +262,43 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat } #ifdef CONFIG_PM +static int vfio_pci_core_runtime_suspend(struct device *dev) +{ + struct vfio_pci_core_device *vdev = dev_get_drvdata(dev); + + /* + * If INTx is enabled, then mask INTx before going into runtime + * suspended state and unmask the same in the runtime resume. + * If INTx has already been masked by the user, then + * vfio_pci_intx_mask() will return false and in that case, INTx + * should not be unmasked in the runtime resume. + */ + vdev->pm_intx_masked = (is_intx(vdev) && vfio_pci_intx_mask(vdev)); + + return 0; +} + +static int vfio_pci_core_runtime_resume(struct device *dev) +{ + struct vfio_pci_core_device *vdev = dev_get_drvdata(dev); + + if (vdev->pm_intx_masked) + vfio_pci_intx_unmask(vdev); + + return 0; +} + /* - * The dev_pm_ops needs to be provided to make pci-driver runtime PM working, - * so use structure without any callbacks. - * * The pci-driver core runtime PM routines always save the device state * before going into suspended state. If the device is going into low power * state with only with runtime PM ops, then no explicit handling is needed * for the devices which have NoSoftRst-. */ -static const struct dev_pm_ops vfio_pci_core_pm_ops = { }; +static const struct dev_pm_ops vfio_pci_core_pm_ops = { + SET_RUNTIME_PM_OPS(vfio_pci_core_runtime_suspend, + vfio_pci_core_runtime_resume, + NULL) +}; #endif int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 6069a11fb51a..1a37db99df48 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -33,10 +33,12 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused) eventfd_signal(vdev->ctx[0].trigger, 1); } -void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) +/* Returns true if INTx has been masked by this function. */ +bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) { struct pci_dev *pdev = vdev->pdev; unsigned long flags; + bool intx_masked = false; spin_lock_irqsave(&vdev->irqlock, flags); @@ -60,9 +62,11 @@ void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) disable_irq_nosync(pdev->irq); vdev->ctx[0].masked = true; + intx_masked = true; } spin_unlock_irqrestore(&vdev->irqlock, flags); + return intx_masked; } /* diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 3c7d65e68340..e84f31e44238 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -125,6 +125,7 @@ struct vfio_pci_core_device { bool nointx; bool needs_pm_restore; bool power_state_d3; + bool pm_intx_masked; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; int ioeventfds_nr; @@ -148,7 +149,7 @@ struct vfio_pci_core_device { #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev))) #define irq_is(vdev, type) (vdev->irq_type == type) -extern void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); +extern bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); extern void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev); extern int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, From patchwork Mon Apr 25 09:26:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abhishek Sahu X-Patchwork-Id: 12825489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84D91C433EF for ; Mon, 25 Apr 2022 09:28:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238404AbiDYJbY (ORCPT ); Mon, 25 Apr 2022 05:31:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238442AbiDYJav (ORCPT ); Mon, 25 Apr 2022 05:30:51 -0400 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2053.outbound.protection.outlook.com [40.107.92.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3064B24F00; Mon, 25 Apr 2022 02:27:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=V/tj9IooEr7UL6OlBccnrEXRYwl0NSA1Fmy3v5AFLPAlrkCmu7s+JLOT8LCrXCNeFCbNPcK37p+XC4pKjRZ2knOGWw7LyHCDcoaGdebrf2pUQPEh/QVuMkKYhuAyyUyEZcyODyoJZEE06xiMwnEaI4k8l/KvgDWyKNP1nuxMDBVR9SmDUWhn2k3xK8XbwXdPSzJf0vgf5KhIz9AJkeWS4V+4AG/IRWL/GVCB+jfVHHgBARcQfsy6PsWEFNSTDPSocoXjzik+xQIK16ySUDbRKHGaqWKA+GvZYZoBAaGNkWvYEgkvqa4J1dqavbpgPL/tzzxsU7++XnmWCYgZbxxNew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=xJelIOk3JzOwPDtODVJXcBMpjbWJ1tDEICmaqobWOBo=; b=VnmP2o4qA8IjY8jgLM/BJk4Fq7CzQSuUp3YSjuyP2XQk5Vj86/pD3NE5YJ8fOqJs+qnvJcXqioTW5rBZBUAfsvkRnmzjbeqCCDdECsDMuCnFZO53eM05gaKwchq0jbGOCWXrjksFqSkNrHtgb/Od9LgWOMGoDwjAxtzmLOgJfhYOSsmFrd0kZh9uHjbjlBXgWd06NgKH57SeLqnFfrGqDfqDyObL5JcpUZ4EasoYaHwKROM20/u7Sc8fTzGxP/3Z389JDQSp6eytoTOdgUk94brKyQVFMd/5JN87L3ldaEYZPJmcmvDT2jATOf4HlgOkHOlChS+h+AUE4IBROilFXA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.238) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xJelIOk3JzOwPDtODVJXcBMpjbWJ1tDEICmaqobWOBo=; b=XpBp1SUMl0dWeIp016nFyjzdP8QNlEyL3UrZAZuL906mx0deFJmOPvlpX0Zz7oi4K12NiJwfM3zIpGeb/YSLMVgwBVjcOgMT615fWk3OYJiZpfTdvbwIh+lu5Jbc/EYXMBZBBsqrBPKJiGbztGt915jpADARc1uiVndImIGVf1/opZPi8RTqT99M0tnQrdjXLVqHJZC6VtgX6gIejwQuYID5Mh/bkmcbnujb10tdmoIDhUoA44CSGXKz5mzGFbno7uwshXr8bghdaKdObEM7TsEiMHw2W42ou014GYN2ciOBB3lLUDotM9Nh2KVPorbiOQatvnEUY8N0ptsD+5VBng== Received: from MW4PR03CA0148.namprd03.prod.outlook.com (2603:10b6:303:8c::33) by CH2PR12MB3974.namprd12.prod.outlook.com (2603:10b6:610:2e::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.15; Mon, 25 Apr 2022 09:27:14 +0000 Received: from CO1NAM11FT006.eop-nam11.prod.protection.outlook.com (2603:10b6:303:8c:cafe::2f) by MW4PR03CA0148.outlook.office365.com (2603:10b6:303:8c::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.15 via Frontend Transport; Mon, 25 Apr 2022 09:27:14 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.238) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.238 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.238; helo=mail.nvidia.com; Received: from mail.nvidia.com (12.22.5.238) by CO1NAM11FT006.mail.protection.outlook.com (10.13.174.246) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5186.14 via Frontend Transport; Mon, 25 Apr 2022 09:27:13 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by DRHQMAIL105.nvidia.com (10.27.9.14) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Mon, 25 Apr 2022 09:27:13 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Mon, 25 Apr 2022 02:27:12 -0700 Received: from nvidia-abhsahu-1.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.6) with Microsoft SMTP Server id 15.2.986.22 via Frontend Transport; Mon, 25 Apr 2022 02:27:07 -0700 From: Abhishek Sahu To: Alex Williamson , Cornelia Huck , Yishai Hadas , Jason Gunthorpe , Shameer Kolothum , Kevin Tian , "Rafael J . Wysocki" CC: Max Gurtovoy , Bjorn Helgaas , , , , , Abhishek Sahu Subject: [PATCH v3 8/8] vfio/pci: Add the support for PCI D3cold state Date: Mon, 25 Apr 2022 14:56:15 +0530 Message-ID: <20220425092615.10133-9-abhsahu@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220425092615.10133-1-abhsahu@nvidia.com> References: <20220425092615.10133-1-abhsahu@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a48dd775-db94-490d-7fdd-08da269dc85c X-MS-TrafficTypeDiagnostic: CH2PR12MB3974:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: JnoSoLNn6OY6+zdw+0BAHq2h/HlYYzezqM2ubkZ1QXBEvKfG9e5i7TcBa61IqJr4zmPVtwZ2YvmqnBJemIdWCMXXmTaqTUTzNNV0kGJdxeEqPUbSPAOCzAzjR0buLgtI+7zRqYfTHDTb8rd6plNFoneJbfJjI8Hz0sfGFV6JrkSrG301rKBDfxwfAwb/lI3lLRbbKBKQUhbg6C8L9MlLBwT9esT7EEiKcrKz6WGCjQoBHiHgpB3KiQPG9e6SMD/c7UK3aNW7RfnjWi/Mu/D76Ia7oPYPyd82X//ABL9lpdZ8uzYOvbzGBpmcrVsbttnHDcxkmbmC99cBRDmPFQTYGHJLSYTbNvWJ+IPVMJQPcHdWl4XZBpVjDG9CFE2RjLUxB+vCVfmIAk6GF61azwnw0skH6T6kE4//FADIlXamirQKHVROj3qiyeaxSL6ZatpNDsc7jj1PsbXgPlJb6020o0oxaToRHwSFnRamZzoPUgzjN6q5MF46eJEqhApDKVMUIsY3ihFkKh4dTGf9n27uCQhibc8iReM43qy6NGjGCmur2R6wS79wrpB0TduGYpQVHmeIl4i8U7Fa2zs/nkbRB6hheXDTOmMrvjY+LIUBRIKnuXhacZSq127sHy4UZeaksfiV235Q2DZT+3J5ooGCENNMPgMjeM4/ikOWGSioJw10UbCtJoea7gxSlaLenqOOf5sTBEU0BubE5km6UmVdOHjw3ldufy58TNyML7uAIjU= X-Forefront-Antispam-Report: CIP:12.22.5.238;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:InfoNoRecords;CAT:NONE;SFS:(13230001)(4636009)(40470700004)(46966006)(36840700001)(426003)(336012)(6666004)(8936002)(36756003)(110136005)(47076005)(4326008)(5660300002)(26005)(508600001)(82310400005)(36860700001)(8676002)(83380400001)(40460700003)(70206006)(7696005)(54906003)(70586007)(356005)(86362001)(7416002)(2616005)(186003)(316002)(107886003)(1076003)(2906002)(81166007)(30864003)(32563001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Apr 2022 09:27:13.8166 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a48dd775-db94-490d-7fdd-08da269dc85c X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.238];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT006.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB3974 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Currently, if the runtime power management is enabled for vfio pci based device in the guest OS, then guest OS will do the register write for PCI_PM_CTRL register. This write request will be handled in vfio_pm_config_write() where it will do the actual register write of PCI_PM_CTRL register. With this, the maximum D3hot state can be achieved for low power. If we can use the runtime PM framework, then we can achieve the D3cold state which will help in saving maximum power. 1. Since D3cold state can't be achieved by writing PCI standard PM config registers, so this patch adds a new feature in the existing VFIO_DEVICE_FEATURE IOCTL. This IOCTL can be used to change the PCI device from D3hot to D3cold state and then D3cold to D0 state. The device feature uses low power term instead of D3cold so that if other vfio driver wants to implement low power support, then the same IOCTL can be used. 2. The hypervisors can implement virtual ACPI methods. For example, in guest linux OS if PCI device ACPI node has _PR3 and _PR0 power resources with _ON/_OFF method, then guest linux OS makes the _OFF call during D3cold transition and then _ON during D0 transition. The hypervisor can tap these virtual ACPI calls and then do the D3cold related IOCTL in the vfio driver. 3. The vfio driver uses runtime PM framework to achieve the D3cold state. For the D3cold transition, decrement the usage count and for the D0 transition, increment the usage count. 4. For D3cold, the device current power state should be D3hot. Then during runtime suspend, the pci_platform_power_transition() is required for D3cold state. If the D3cold state is not supported, then the device will still be in D3hot state. But with the runtime PM, the root port can now also go into suspended state. 5. For most of the systems, the D3cold is supported at the root port level. So, when root port will transition to D3cold state, then the vfio PCI device will go from D3hot to D3cold state during its runtime suspend. If root port does not support D3cold, then the root will go into D3hot state. 6. The runtime suspend callback can now happen for 2 cases: there are no users of vfio device and the case where user has initiated D3cold. The 'platform_pm_engaged' flag can help to distinguish between these 2 cases. 7. In D3cold, all kind of BAR related access needs to be disabled like D3hot. Additionally, the config space will also be disabled in D3cold state. To prevent access of config space in D3cold state, do increment the runtime PM usage count before doing any config space access. 8. If user has engaged low power entry through IOCTL, then user should do low power exit first. The user can issue config access or IOCTL after low power entry. We can add an explicit error check but since we are already waking-up device, so IOCTL and config access can be fulfilled. But 'power_state_d3' won't be cleared without issuing low power exit so all BAR related access will still return error till user do low power exit. 9. Since multiple layers are involved, so following is the high level code flow for D3cold entry and exit. D3cold entry: a. User put the PCI device into D3hot by writing into standard config register (vfio_pm_config_write() -> vfio_lock_and_set_power_state() -> vfio_pci_set_power_state()). The device power state will be D3hot and power_state_d3 will be true. b. Set vfio_device_feature_power_management::low_power_state = VFIO_DEVICE_LOW_POWER_STATE_ENTER and call VFIO_DEVICE_FEATURE IOCTL. c. Inside vfio_device_fops_unl_ioctl(), pm_runtime_resume_and_get() will be called first which will make the usage count as 2 and then vfio_pci_core_ioctl_feature() will be invoked. d. vfio_pci_core_feature_pm() will be called and it will go inside VFIO_DEVICE_LOW_POWER_STATE_ENTER switch case. platform_pm_engaged will be true and pm_runtime_put_noidle() will decrement the usage count to 1. e. Inside vfio_device_fops_unl_ioctl() while returning the pm_runtime_put() will make the usage count to 0 and the runtime PM framework will engage the runtime suspend entry. f. pci_pm_runtime_suspend() will be called and invokes driver runtime suspend callback. g. vfio_pci_core_runtime_suspend() will change the power state to D0 and do the INTx mask related handling. h. pci_pm_runtime_suspend() will take care of saving the PCI state and all power management handling for D3cold. D3cold exit: a. Set vfio_device_feature_power_management::low_power_state = VFIO_DEVICE_LOW_POWER_STATE_EXIT and call VFIO_DEVICE_FEATURE IOCTL. b. Inside vfio_device_fops_unl_ioctl(), pm_runtime_resume_and_get() will be called first which will make the usage count as 1. c. pci_pm_runtime_resume() will take care of moving the device into D0 state again and then vfio_pci_core_runtime_resume() will be called. d. vfio_pci_core_runtime_resume() will do the INTx unmask related handling. e. vfio_pci_core_ioctl_feature() will be invoked. f. vfio_pci_core_feature_pm() will be called and it will go inside VFIO_DEVICE_LOW_POWER_STATE_EXIT switch case. platform_pm_engaged and power_state_d3 will be cleared and pm_runtime_get_noresume() will make the usage count as 2. g. Inside vfio_device_fops_unl_ioctl() while returning the pm_runtime_put() will make the usage count to 1 and the device will be in D0 state only. Signed-off-by: Abhishek Sahu --- drivers/vfio/pci/vfio_pci_config.c | 11 ++- drivers/vfio/pci/vfio_pci_core.c | 131 ++++++++++++++++++++++++++++- include/linux/vfio_pci_core.h | 1 + include/uapi/linux/vfio.h | 18 ++++ 4 files changed, 159 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index af0ae80ef324..65b1bc9586ab 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -25,6 +25,7 @@ #include #include #include +#include #include @@ -1936,16 +1937,23 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_core_device *vdev, char __user ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite) { + struct device *dev = &vdev->pdev->dev; size_t done = 0; int ret = 0; loff_t pos = *ppos; pos &= VFIO_PCI_OFFSET_MASK; + ret = pm_runtime_resume_and_get(dev); + if (ret < 0) + return ret; + while (count) { ret = vfio_config_do_rw(vdev, buf, count, &pos, iswrite); - if (ret < 0) + if (ret < 0) { + pm_runtime_put(dev); return ret; + } count -= ret; done += ret; @@ -1953,6 +1961,7 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf, pos += ret; } + pm_runtime_put(dev); *ppos += done; return done; diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 05a68ca9d9e7..beac6e05f97f 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -234,7 +234,14 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat ret = pci_set_power_state(pdev, state); if (!ret) { - vdev->power_state_d3 = (pdev->current_state >= PCI_D3hot); + /* + * If 'platform_pm_engaged' is true then 'power_state_d3' can + * be cleared only when user makes the explicit request to + * move out of low power state by using power management ioctl. + */ + if (!vdev->platform_pm_engaged) + vdev->power_state_d3 = + (pdev->current_state >= PCI_D3hot); /* D3 might be unsupported via quirk, skip unless in D3 */ if (needs_save && pdev->current_state >= PCI_D3hot) { @@ -266,6 +273,25 @@ static int vfio_pci_core_runtime_suspend(struct device *dev) { struct vfio_pci_core_device *vdev = dev_get_drvdata(dev); + down_read(&vdev->memory_lock); + + /* 'platform_pm_engaged' will be false if there are no users. */ + if (!vdev->platform_pm_engaged) { + up_read(&vdev->memory_lock); + return 0; + } + + /* + * The user will move the device into D3hot state first before invoking + * power management ioctl. Move the device into D0 state here and then + * the pci-driver core runtime PM suspend will move the device into + * low power state. Also, for the devices which have NoSoftRst-, + * it will help in restoring the original state (saved locally in + * 'vdev->pm_save'). + */ + vfio_pci_set_power_state(vdev, PCI_D0); + up_read(&vdev->memory_lock); + /* * If INTx is enabled, then mask INTx before going into runtime * suspended state and unmask the same in the runtime resume. @@ -395,6 +421,19 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) /* * This function can be invoked while the power state is non-D0. + * This non-D0 power state can be with or without runtime PM. + * Increment the usage count corresponding to pm_runtime_put() + * called during setting of 'platform_pm_engaged'. The device will + * wake up if it has already went into suspended state. Otherwise, + * the next vfio_pci_set_power_state() will change the + * device power state to D0. + */ + if (vdev->platform_pm_engaged) { + pm_runtime_resume_and_get(&pdev->dev); + vdev->platform_pm_engaged = false; + } + + /* * This function calls __pci_reset_function_locked() which internally * can use pci_pm_reset() for the function reset. pci_pm_reset() will * fail if the power state is non-D0. Also, for the devices which @@ -1192,6 +1231,80 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd, } EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl); +#ifdef CONFIG_PM +static int vfio_pci_core_feature_pm(struct vfio_device *device, u32 flags, + void __user *arg, size_t argsz) +{ + struct vfio_pci_core_device *vdev = + container_of(device, struct vfio_pci_core_device, vdev); + struct pci_dev *pdev = vdev->pdev; + struct vfio_device_feature_power_management vfio_pm = { 0 }; + int ret = 0; + + ret = vfio_check_feature(flags, argsz, + VFIO_DEVICE_FEATURE_SET | + VFIO_DEVICE_FEATURE_GET, + sizeof(vfio_pm)); + if (ret != 1) + return ret; + + if (flags & VFIO_DEVICE_FEATURE_GET) { + down_read(&vdev->memory_lock); + vfio_pm.low_power_state = vdev->platform_pm_engaged ? + VFIO_DEVICE_LOW_POWER_STATE_ENTER : + VFIO_DEVICE_LOW_POWER_STATE_EXIT; + up_read(&vdev->memory_lock); + if (copy_to_user(arg, &vfio_pm, sizeof(vfio_pm))) + return -EFAULT; + return 0; + } + + if (copy_from_user(&vfio_pm, arg, sizeof(vfio_pm))) + return -EFAULT; + + /* + * The vdev power related fields are protected with memory_lock + * semaphore. + */ + down_write(&vdev->memory_lock); + switch (vfio_pm.low_power_state) { + case VFIO_DEVICE_LOW_POWER_STATE_ENTER: + if (!vdev->power_state_d3 || vdev->platform_pm_engaged) { + ret = EINVAL; + break; + } + + vdev->platform_pm_engaged = true; + + /* + * The pm_runtime_put() will be called again while returning + * from ioctl after which the device can go into runtime + * suspended. + */ + pm_runtime_put_noidle(&pdev->dev); + break; + + case VFIO_DEVICE_LOW_POWER_STATE_EXIT: + if (!vdev->platform_pm_engaged) { + ret = EINVAL; + break; + } + + vdev->platform_pm_engaged = false; + vdev->power_state_d3 = false; + pm_runtime_get_noresume(&pdev->dev); + break; + + default: + ret = EINVAL; + break; + } + + up_write(&vdev->memory_lock); + return ret; +} +#endif + static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags, void __user *arg, size_t argsz) { @@ -1226,6 +1339,10 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, switch (flags & VFIO_DEVICE_FEATURE_MASK) { case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN: return vfio_pci_core_feature_token(device, flags, arg, argsz); +#ifdef CONFIG_PM + case VFIO_DEVICE_FEATURE_POWER_MANAGEMENT: + return vfio_pci_core_feature_pm(device, flags, arg, argsz); +#endif default: return -ENOTTY; } @@ -2189,6 +2306,15 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, goto err_unlock; } + /* + * Some of the devices in the dev_set can be in the runtime suspended + * state. Increment the usage count for all the devices in the dev_set + * before reset and decrement the same after reset. + */ + ret = vfio_pci_dev_set_pm_runtime_get(dev_set); + if (ret) + goto err_unlock; + list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) { /* * Test whether all the affected devices are contained by the @@ -2244,6 +2370,9 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, else mutex_unlock(&cur->vma_lock); } + + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) + pm_runtime_put(&cur->pdev->dev); err_unlock: mutex_unlock(&dev_set->lock); return ret; diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index e84f31e44238..337983a877d6 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -126,6 +126,7 @@ struct vfio_pci_core_device { bool needs_pm_restore; bool power_state_d3; bool pm_intx_masked; + bool platform_pm_engaged; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; int ioeventfds_nr; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index fea86061b44e..53ff890dbd27 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -986,6 +986,24 @@ enum vfio_device_mig_state { VFIO_DEVICE_STATE_RUNNING_P2P = 5, }; +/* + * Use platform-based power management for moving the device into low power + * state. This low power state is device specific. + * + * For PCI, this low power state is D3cold. The native PCI power management + * does not support the D3cold power state. For moving the device into D3cold + * state, change the PCI state to D3hot with standard configuration registers + * and then call this IOCTL to setting the D3cold state. Similarly, if the + * device in D3cold state, then call this IOCTL to exit from D3cold state. + */ +struct vfio_device_feature_power_management { +#define VFIO_DEVICE_LOW_POWER_STATE_EXIT 0x0 +#define VFIO_DEVICE_LOW_POWER_STATE_ENTER 0x1 + __u64 low_power_state; +}; + +#define VFIO_DEVICE_FEATURE_POWER_MANAGEMENT 3 + /* -------- API for Type1 VFIO IOMMU -------- */ /**