From patchwork Tue Mar 14 20:56:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Volodymyr Babchuk X-Patchwork-Id: 13175007 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 247B5C6FD1F for ; Tue, 14 Mar 2023 20:57:51 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.509845.786452 (Exim 4.92) (envelope-from ) id 1pcBhy-0004DG-UA; Tue, 14 Mar 2023 20:57:22 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 509845.786452; Tue, 14 Mar 2023 20:57:22 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1pcBhy-0004D7-Qh; Tue, 14 Mar 2023 20:57:22 +0000 Received: by outflank-mailman (input) for mailman id 509845; Tue, 14 Mar 2023 20:57:21 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1pcBhw-0001PG-LY for xen-devel@lists.xenproject.org; Tue, 14 Mar 2023 20:57:20 +0000 Received: from mx0b-0039f301.pphosted.com (mx0b-0039f301.pphosted.com [148.163.137.242]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id ce7c6aef-c2aa-11ed-87f5-c1b5be75604c; Tue, 14 Mar 2023 21:57:19 +0100 (CET) Received: from pps.filterd (m0174680.ppops.net [127.0.0.1]) by mx0b-0039f301.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32EKchDo017378; Tue, 14 Mar 2023 20:56:35 GMT Received: from eur05-vi1-obe.outbound.protection.outlook.com (mail-vi1eur05lp2175.outbound.protection.outlook.com [104.47.17.175]) by mx0b-0039f301.pphosted.com (PPS) with ESMTPS id 3pb0520156-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 14 Mar 2023 20:56:35 +0000 Received: from VI1PR03MB3710.eurprd03.prod.outlook.com (2603:10a6:803:31::18) by PAXPR03MB7967.eurprd03.prod.outlook.com (2603:10a6:102:21a::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.26; Tue, 14 Mar 2023 20:56:30 +0000 Received: from VI1PR03MB3710.eurprd03.prod.outlook.com ([fe80::967e:573a:15a9:176e]) by VI1PR03MB3710.eurprd03.prod.outlook.com ([fe80::967e:573a:15a9:176e%4]) with mapi id 15.20.6178.026; Tue, 14 Mar 2023 20:56:30 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: ce7c6aef-c2aa-11ed-87f5-c1b5be75604c ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jC4APIhGtIE2q2vnB7Xgyp88/Qoj80o1jW8ZVmEsR9vzsBK/K7E9TKVH0ws3gYP/YP+M9F8QOAmNkVDpTUaIEk3cckeJDUxhkALykktfOg+vAAmly5lTrhiLI6GjXa7FSZr/zEVrsOryvVRh0JyevefLJHLd0d83VEWpAvWLCf8mAU95pr01MZRFUYQlym+9wKdLgtBxsULI9maLAFf1pnd7kIBUzCqSXrkc04IYaCdWRlS3+sOvIsChltiDYq8cIK7X8zgwLxhDG0dtK3mK5DFcFPvfL5TJEZCYO19KIVSLtu3BCcV52Bpz5JaJoAO1TSbPJb83UyZ2gNnVg6ID1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=x3giRBF2vC8XfohoRJp2TLUtqgs0zjuWEsEx8QPu5VA=; b=Tz/R6YZTMwc70J7ONg+kKpXLIbOTFUKWt/ae5r+pOO9D5VNXhFn6WhsVnCQdWjogdSv3XeT2XCcV5leL/fvTurxyRI7ZWz5Qz5AdHQfgeBeYm709FDveXJAGYyKjjKXZCKINQxroaWKAa54ad9Yp4QFJjYP/Dp8DQDZwKKepiLPlsFmHLrU0VKlg0unh+xHLHFxO8X6gLNzBUeRkX1xTehxHhMyyKac2nQekWI1bWlUGlnncA5euBRJUWu4/x52fznakBZsCJV31jGGoifpIy4jirsFf0Aa2ayzC6coDM6iqO4G7qy802N1Q9eMvOJnPS4qB5zs6OJEFTp0E3WaVxg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=epam.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=x3giRBF2vC8XfohoRJp2TLUtqgs0zjuWEsEx8QPu5VA=; b=dAviOwHeIzoIc/A3gZX5H+T7Vx+QUEPdEpSBpecXfxfM7wxgv433I/1U4T0S8qHWKVjVQyFyxzaa41H8qw6iLgiP29L6YfKNjxq988Xx9WL+ynHoFKMTz4gZd/wJqu9mFOA/6+EOwpbrsjDS2Dl2Jzcd2QZCwE79dvg98xmg7ATGRhSSajPuyV0LLY/GghGTCMCjMTI/30eQlwFTtbHgPBD9NCEpS2gnABk5CCzMT/Uu//Iv56WMyNZ2gkgN+djQnzDrzvKKA6TYmIzXe7a4IIr88MGTrb/SpxndggQwA7bOoVe4ZrQ3d93Xf0fOCczOGjYRc/MMtvIl0ErE2eQwgg== From: Volodymyr Babchuk To: "xen-devel@lists.xenproject.org" CC: Volodymyr Babchuk , Jan Beulich , Andrew Cooper , =?iso-8859-1?q?Roger_Pau_Monn=E9?= , Wei Liu , George Dunlap , Julien Grall , Stefano Stabellini , Paul Durrant , Kevin Tian Subject: [PATCH v3 2/6] xen: pci: introduce reference counting for pdev Thread-Topic: [PATCH v3 2/6] xen: pci: introduce reference counting for pdev Thread-Index: AQHZVrdzsdckomMx4kauxHkZQ597Ig== Date: Tue, 14 Mar 2023 20:56:29 +0000 Message-ID: <20230314205612.3703668-3-volodymyr_babchuk@epam.com> References: <20230314205612.3703668-1-volodymyr_babchuk@epam.com> In-Reply-To: <20230314205612.3703668-1-volodymyr_babchuk@epam.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: git-send-email 2.39.2 x-ms-publictraffictype: Email x-ms-traffictypediagnostic: VI1PR03MB3710:EE_|PAXPR03MB7967:EE_ x-ms-office365-filtering-correlation-id: a0e2b907-4a39-4d48-f078-08db24ce9661 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: Omev8bapjhzUupelKf/ySvHHGmZkYH1Re53t7uAlTwmc/m/KG/eOq4gqhxL7eVvdhCXNvK4yKPg0KAXz/9q88PVRTVdf3TADh/awFMo/IFz4VUg9lSXIrHpw6IjJwbCdI9Q6BlC5bD3PW89hBQjDPlpw1y1GZWuYUgLyipLMxQABF4kLGtjKoE6V8JIDEY/Gzf0c9ZIfyJFM9l8v8C/vpH8Z6vR/uUSQYdx04474IeJYm0S11RrJJ+L2cUSkqIEWQ7gjWZ0v8yMUGivy+D+eNZCZMvL2ty/Bb8b3FPZBwU5NdF47vEBZdVmyjufBR0B3188dQQPU8wVROQgk3ooA9aEN08zgQghYsGVkkeznWfM1jb2wX2sm4wKWDZhQkRc5r7lCsQ2YIlgOPX8Ri5Xr34n3Jon6a+mzKH1GXS5dsz8z6UjpDnk4lIMj3gqGkry7LKaGsd4DlfeTTD8A+MKEZLDCenj8T7X/A/TR5ULMHnHuyV6To2Zz8X2Az+LMQBXu3XPNSFUPIcrj7Nze/8D0LF4+SeehmhYdiFPGClE62m4aygvkYhSoDYzJdRl10nTXfISll/YY51F3qOhgTmprsqQOJEPtEdKa3nyCrvMWqSQqk0F+Ig9XqvdfGFUr6nwheUso77l/0yoDDUCKigg2GXOD1BfOs72EMbKO97QTNhH6RmsHxR9iDd5hOWzsnTdTU4JLV9ew9KYQomEIbEM8q5Ei6PnUYN5y6FRcB8c5iZY= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VI1PR03MB3710.eurprd03.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(4636009)(376002)(346002)(136003)(366004)(396003)(39860400002)(451199018)(54906003)(316002)(86362001)(36756003)(38070700005)(38100700002)(478600001)(122000001)(55236004)(83380400001)(6512007)(6506007)(2616005)(1076003)(186003)(26005)(7416002)(5660300002)(6486002)(91956017)(41300700001)(6916009)(71200400001)(76116006)(64756008)(30864003)(8936002)(8676002)(2906002)(66556008)(66476007)(66946007)(66446008)(4326008)(21314003)(579004)(559001);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?gm1U33arFh8VzSF4/mkdueb?= =?iso-8859-1?q?H4EX1NY93Wces6XjSzBQWrtE3m0ZoZY+b8z7f+rx2rPiwNGA95AMl1gnD+iS?= =?iso-8859-1?q?vODvNldMLOuEQXxIIcHHh7bjcws0hG3hsEu5InVqT/R0bnXXvPqA6YMJshIT?= =?iso-8859-1?q?JcyZ+g02D/dQVOHPJvIj4muQ1XFMbucVwQf2qTCL+ayfaIFQUCSNuy5RuNHo?= =?iso-8859-1?q?/jb+lUVUKrdYAEaFAOvgemTwXHgA2bBQidbBLJ40Mj3l2QLuiF6Tn9xCyVUR?= =?iso-8859-1?q?GkmjR6rHfzNLiDVNMszQVyvBKakXUDzChl+fwX7vqFOsSv84AiEQKVKCVW55?= =?iso-8859-1?q?oPJpgVXSkr4IWMTWmkw5cNF/zO85INObWmw+FhQ2q0PxgWhZYtbozMYjWA3p?= =?iso-8859-1?q?cvpF2PjJhoDFvMCVTgqilf3oolEZiNau7hWDJEfeBx4bVvuuEJPPmW6TjJ9Y?= =?iso-8859-1?q?zTOf6opbafCDxtPj7ZcV8dAqtYsfMf0rZ8x8PGq5buf70JqxRPPtZzgS9GnH?= =?iso-8859-1?q?iaMkbcLeZss+q7KYcTj6frdM0el4IUIGczR8IB/mmWC29w3kg4WS2mLVcCkY?= =?iso-8859-1?q?3dpcAJKuAvE0XYE2AuVtu8o0bZgOlxgJQ3HMQEdefWGnLpDrcwNSNUEnc8/4?= =?iso-8859-1?q?i8uF3m4CeVbHmsKgN2GXxD/u9N+wizoh3GOh9VPFY2WL4ambZMMyUmjUWEAx?= =?iso-8859-1?q?Des/+flRBnqyyYaAGs5D5dxfHxoIhaC/IW6RCiEZkRPu5OKUZS66wgohwbx+?= =?iso-8859-1?q?l6q+y3n6Z8oMIOkwxlGjUyz9XvrXDApMPN7mrbr/SLSPaMxNAd0fqnzj4UoE?= =?iso-8859-1?q?eCpM72XzpsIdqejcOTzABL/dILcEuoNZqHUzsy5kjpO59zadPQ3C3P/xfsEq?= =?iso-8859-1?q?8lzXSn1WGPjJtizkQtP0ev0DcGWGEY7bioUZSZrtjflzEHarv2kcKU2unz/y?= =?iso-8859-1?q?Ta96hHuxNTaq9rqSEEM9w07UMMOqWxE3DdU5xXUpOyogvnwlf8DCbGDXH3Ps?= =?iso-8859-1?q?WLg592znEzZ/JEznvKygyNM+9kx1qGY9mSn8FmEDmdD8iDBcqjo37CJ9xy7m?= =?iso-8859-1?q?lRZv/cTriCg3z+3FCGKUGYX8lkC1atpA1GELjlaJQzfYtXptrgn1TN8Dtf93?= =?iso-8859-1?q?ou6iOb5zQfznNPqWDo213/ezhElbSXYrXTGSHDNpbm5eMelhc+L9XVsE87KX?= =?iso-8859-1?q?BizKYwYQ1FXOkyV6ttLRB3SgKXSiLb2yzL6Ld//0xgzq3u2wRxYes7YnObzG?= =?iso-8859-1?q?tdmfkTjOBSC6UxmMZ93IUpWqvioqR6djfZjmEA2ceTGbz6NrCi4mq2EUsGLG?= =?iso-8859-1?q?LmO7hM3vecXlrIGEk/vxr+CfrUTs5XTETg+BmQOGR818z7gKFr46T9/b95it?= =?iso-8859-1?q?C9CFEjX8SRlsiUZKgPGyzgyGPt39FYbmXe7AlyeDGuQFsb0d/BhdiDFemDNT?= =?iso-8859-1?q?xjJKpeLCBSOof2TPZrLqFH8cPGwmhbIlGKS6pS31vO9uMzEc1R/twBSe3i7h?= =?iso-8859-1?q?GY/d1/FBvmAyx35wfWzr0NOOxld4vx4uOPcUNaYoZLETD+GE4fDU/U2IrBRZ?= =?iso-8859-1?q?DkOapXfIVg5XMHhSiRneblI8eflETG8aecJCKLIeQ0oWKuHs925av5vPw3Yu?= =?iso-8859-1?q?3e40KJoDLaj0ScnWBy64qQBGR2UrHArsoAqH4sw=3D=3D?= MIME-Version: 1.0 X-OriginatorOrg: epam.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: VI1PR03MB3710.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: a0e2b907-4a39-4d48-f078-08db24ce9661 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Mar 2023 20:56:29.7296 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b41b72d0-4e9f-4c26-8a69-f949f367c91d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 9m1ahttUilynrm5cSdiA2L7W44xc6n04nTe/wwHpmlbzRibjcyyH3z6G68VNOcwac1bzabl/8EerHr9hZUdNDKaLLNF3joahJIeQ9UgPGBc= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR03MB7967 X-Proofpoint-ORIG-GUID: ZTGSVGK1gJqAvJXQjNTJFi0fYEwGzaEF X-Proofpoint-GUID: ZTGSVGK1gJqAvJXQjNTJFi0fYEwGzaEF X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-14_14,2023-03-14_02,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 priorityscore=1501 suspectscore=0 clxscore=1015 mlxscore=0 malwarescore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 phishscore=0 bulkscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2303140168 Prior to this change, lifetime of pci_dev objects was protected by global pcidevs_lock(). Long-term plan is to remove this log, so we need some other mechanism to ensure that those objects will not disappear under feet of code that access them. Reference counting is a good choice as it provides easy to comprehend way to control object lifetime. This patch adds two new helper functions: pcidev_get() and pcidev_put(). pcidev_get() will increase reference counter, while pcidev_put() will decrease it, destroying object when counter reaches zero. pcidev_get() should be used only when you already have a valid pointer to the object or you are holding lock that protects one of the lists (domain, pseg or ats) that store pci_dev structs. pcidev_get() is rarely used directly, because there already are functions that will provide valid pointer to pci_dev struct: pci_get_pdev(), pci_get_real_pdev(). They will lock appropriate list, find needed object and increase its reference counter before returning to the caller. Naturally, pci_put() should be called after finishing working with a received object. This is the reason why this patch have so many pcidev_put()s and so little pcidev_get()s: existing calls to pci_get_*() functions now will increase reference counter automatically, we just need to decrease it back when we finished. This patch removes "const" qualifier from some pdev pointers because pcidev_put() technically alters the contents of pci_dev structure. Signed-off-by: Volodymyr Babchuk Suggested-by: Jan Beulich --- v3: - Moved in from another patch series - Fixed code formatting (tabs -> spaces) - Removed erroneous pcidev_put in vga.c - Added missing pcidev_put in couple of places - removed mention of pci_get_pdev_by_domain() --- xen/arch/x86/hvm/vmsi.c | 2 +- xen/arch/x86/irq.c | 4 + xen/arch/x86/msi.c | 44 +++++++- xen/arch/x86/pci.c | 3 + xen/arch/x86/physdev.c | 17 ++- xen/common/sysctl.c | 7 +- xen/drivers/passthrough/amd/iommu_init.c | 12 +- xen/drivers/passthrough/amd/iommu_map.c | 6 +- xen/drivers/passthrough/pci.c | 138 +++++++++++++++-------- xen/drivers/passthrough/vtd/quirks.c | 2 + xen/drivers/video/vga.c | 7 +- xen/drivers/vpci/vpci.c | 16 ++- xen/include/xen/pci.h | 18 +++ 13 files changed, 215 insertions(+), 61 deletions(-) diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 3cd4923060..8c3d673872 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -914,7 +914,7 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) spin_unlock(&msix->pdev->vpci->lock); process_pending_softirqs(); - /* NB: we assume that pdev cannot go away for an alive domain. */ + if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) ) return -EBUSY; if ( pdev->vpci->msix != msix ) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 20150b1c7f..87464d82c8 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -2175,6 +2175,7 @@ int map_domain_pirq( msi->entry_nr = ret; ret = -ENFILE; } + pcidev_put(pdev); goto done; } @@ -2189,6 +2190,7 @@ int map_domain_pirq( msi_desc->irq = -1; msi_free_irq(msi_desc); ret = -EBUSY; + pcidev_put(pdev); goto done; } @@ -2273,10 +2275,12 @@ int map_domain_pirq( } msi_desc->irq = -1; msi_free_irq(msi_desc); + pcidev_put(pdev); goto done; } set_domain_irq_pirq(d, irq, info); + pcidev_put(pdev); spin_unlock_irqrestore(&desc->lock, flags); } else diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c index d0bf63df1d..91926fce50 100644 --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -572,6 +572,10 @@ int msi_free_irq(struct msi_desc *entry) virt_to_fix((unsigned long)entry->mask_base)); list_del(&entry->list); + + /* Corresponds to pcidev_get() in msi[x]_capability_init() */ + pcidev_put(entry->dev); + xfree(entry); return 0; } @@ -644,6 +648,7 @@ static int msi_capability_init(struct pci_dev *dev, entry[i].msi.mpos = mpos; entry[i].msi.nvec = 0; entry[i].dev = dev; + pcidev_get(dev); } entry->msi.nvec = nvec; entry->irq = irq; @@ -703,22 +708,36 @@ static u64 read_pci_mem_bar(u16 seg, u8 bus, u8 slot, u8 func, u8 bir, int vf) !num_vf || !offset || (num_vf > 1 && !stride) || bir >= PCI_SRIOV_NUM_BARS || !pdev->vf_rlen[bir] ) + { + if ( pdev ) + pcidev_put(pdev); return 0; + } base = pos + PCI_SRIOV_BAR; vf -= PCI_BDF(bus, slot, func) + offset; if ( vf < 0 ) + { + pcidev_put(pdev); return 0; + } if ( stride ) { if ( vf % stride ) + { + pcidev_put(pdev); return 0; + } vf /= stride; } if ( vf >= num_vf ) + { + pcidev_put(pdev); return 0; + } BUILD_BUG_ON(ARRAY_SIZE(pdev->vf_rlen) != PCI_SRIOV_NUM_BARS); disp = vf * pdev->vf_rlen[bir]; limit = PCI_SRIOV_NUM_BARS; + pcidev_put(pdev); } else switch ( pci_conf_read8(PCI_SBDF(seg, bus, slot, func), PCI_HEADER_TYPE) & 0x7f ) @@ -925,6 +944,8 @@ static int msix_capability_init(struct pci_dev *dev, entry->dev = dev; entry->mask_base = base; + pcidev_get(dev); + list_add_tail(&entry->list, &dev->msi_list); *desc = entry; } @@ -999,6 +1020,7 @@ static int __pci_enable_msi(struct msi_info *msi, struct msi_desc **desc) { struct pci_dev *pdev; struct msi_desc *old_desc; + int ret; ASSERT(pcidevs_locked()); pdev = pci_get_pdev(NULL, msi->sbdf); @@ -1010,6 +1032,7 @@ static int __pci_enable_msi(struct msi_info *msi, struct msi_desc **desc) { printk(XENLOG_ERR "irq %d already mapped to MSI on %pp\n", msi->irq, &pdev->sbdf); + pcidev_put(pdev); return -EEXIST; } @@ -1020,7 +1043,10 @@ static int __pci_enable_msi(struct msi_info *msi, struct msi_desc **desc) __pci_disable_msix(old_desc); } - return msi_capability_init(pdev, msi->irq, desc, msi->entry_nr); + ret = msi_capability_init(pdev, msi->irq, desc, msi->entry_nr); + pcidev_put(pdev); + + return ret; } static void __pci_disable_msi(struct msi_desc *entry) @@ -1054,20 +1080,29 @@ static int __pci_enable_msix(struct msi_info *msi, struct msi_desc **desc) { struct pci_dev *pdev; struct msi_desc *old_desc; + int ret; ASSERT(pcidevs_locked()); pdev = pci_get_pdev(NULL, msi->sbdf); if ( !pdev || !pdev->msix ) + { + if ( pdev ) + pcidev_put(pdev); return -ENODEV; + } if ( msi->entry_nr >= pdev->msix->nr_entries ) + { + pcidev_put(pdev); return -EINVAL; + } old_desc = find_msi_entry(pdev, msi->irq, PCI_CAP_ID_MSIX); if ( old_desc ) { printk(XENLOG_ERR "irq %d already mapped to MSI-X on %pp\n", msi->irq, &pdev->sbdf); + pcidev_put(pdev); return -EEXIST; } @@ -1078,7 +1113,11 @@ static int __pci_enable_msix(struct msi_info *msi, struct msi_desc **desc) __pci_disable_msi(old_desc); } - return msix_capability_init(pdev, msi, desc); + ret = msix_capability_init(pdev, msi, desc); + + pcidev_put(pdev); + + return ret; } static void _pci_cleanup_msix(struct arch_msix *msix) @@ -1159,6 +1198,7 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool off) } else rc = msix_capability_init(pdev, NULL, NULL); + pcidev_put(pdev); pcidevs_unlock(); return rc; diff --git a/xen/arch/x86/pci.c b/xen/arch/x86/pci.c index 97b792e578..c1fcdf08d6 100644 --- a/xen/arch/x86/pci.c +++ b/xen/arch/x86/pci.c @@ -92,7 +92,10 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, pdev = pci_get_pdev(NULL, PCI_SBDF(seg, bdf)); if ( pdev ) + { rc = pci_msi_conf_write_intercept(pdev, reg, size, data); + pcidev_put(pdev); + } pcidevs_unlock(); diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 2f1d955a96..96214a3d40 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -533,7 +533,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) pcidevs_lock(); pdev = pci_get_pdev(NULL, PCI_SBDF(0, restore_msi.bus, restore_msi.devfn)); - ret = pdev ? pci_restore_msi_state(pdev) : -ENODEV; + if ( pdev ) + { + ret = pci_restore_msi_state(pdev); + pcidev_put(pdev); + } + else + ret = -ENODEV; + pcidevs_unlock(); break; } @@ -548,7 +555,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) pcidevs_lock(); pdev = pci_get_pdev(NULL, PCI_SBDF(dev.seg, dev.bus, dev.devfn)); - ret = pdev ? pci_restore_msi_state(pdev) : -ENODEV; + if ( pdev ) + { + ret = pci_restore_msi_state(pdev); + pcidev_put(pdev); + } + else + ret = -ENODEV; pcidevs_unlock(); break; } diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c index 02505ab044..9af07fa92a 100644 --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -438,7 +438,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) { physdev_pci_device_t dev; uint32_t node; - const struct pci_dev *pdev; + struct pci_dev *pdev; if ( copy_from_guest_offset(&dev, ti->devs, i, 1) ) { @@ -454,8 +454,11 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) node = XEN_INVALID_NODE_ID; else node = pdev->node; - pcidevs_unlock(); + if ( pdev ) + pcidev_put(pdev); + + pcidevs_unlock(); if ( copy_to_guest_offset(ti->nodes, i, &node, 1) ) { ret = -EFAULT; diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c index 9773ccfcb4..f90b1c1e58 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -646,6 +646,7 @@ static void cf_check parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[]) if ( pdev ) guest_iommu_add_ppr_log(pdev->domain, entry); + pcidev_put(pdev); } static void iommu_check_ppr_log(struct amd_iommu *iommu) @@ -749,6 +750,11 @@ static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu) } pcidevs_lock(); + /* + * XXX: it is unclear if this device can be removed. Right now + * there is no code that clears msi.dev, so no one will decrease + * refcount on it. + */ iommu->msi.dev = pci_get_pdev(NULL, PCI_SBDF(iommu->seg, iommu->bdf)); pcidevs_unlock(); if ( !iommu->msi.dev ) @@ -1274,7 +1280,7 @@ static int __init cf_check amd_iommu_setup_device_table( { if ( ivrs_mappings[bdf].valid ) { - const struct pci_dev *pdev = NULL; + struct pci_dev *pdev = NULL; /* add device table entry */ iommu_dte_add_device_entry(&dt[bdf], &ivrs_mappings[bdf]); @@ -1299,7 +1305,10 @@ static int __init cf_check amd_iommu_setup_device_table( pdev->msix ? pdev->msix->nr_entries : pdev->msi_maxvec); if ( !ivrs_mappings[bdf].intremap_table ) + { + pcidev_put(pdev); return -ENOMEM; + } if ( pdev->phantom_stride ) { @@ -1317,6 +1326,7 @@ static int __init cf_check amd_iommu_setup_device_table( ivrs_mappings[bdf].intremap_inuse; } } + pcidev_put(pdev); } amd_iommu_set_intremap_table( diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c index 993bac6f88..9d621e3d36 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -724,14 +724,18 @@ int cf_check amd_iommu_get_reserved_device_memory( if ( !iommu ) { /* May need to trigger the workaround in find_iommu_for_device(). */ - const struct pci_dev *pdev; + struct pci_dev *pdev; pcidevs_lock(); pdev = pci_get_pdev(NULL, sbdf); pcidevs_unlock(); if ( pdev ) + { iommu = find_iommu_for_device(seg, bdf); + /* XXX: Should we hold pdev reference till end of the loop? */ + pcidev_put(pdev); + } if ( !iommu ) continue; } diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index b42acb8d7c..b32382aca0 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -328,6 +328,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) *((u8*) &pdev->bus) = bus; *((u8*) &pdev->devfn) = devfn; pdev->domain = NULL; + refcnt_init(&pdev->refcnt); arch_pci_init_pdev(pdev); @@ -422,33 +423,6 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn) return pdev; } -static void free_pdev(struct pci_seg *pseg, struct pci_dev *pdev) -{ - /* update bus2bridge */ - switch ( pdev->type ) - { - unsigned int sec_bus, sub_bus; - - case DEV_TYPE_PCIe2PCI_BRIDGE: - case DEV_TYPE_LEGACY_PCI_BRIDGE: - sec_bus = pci_conf_read8(pdev->sbdf, PCI_SECONDARY_BUS); - sub_bus = pci_conf_read8(pdev->sbdf, PCI_SUBORDINATE_BUS); - - spin_lock(&pseg->bus2bridge_lock); - for ( ; sec_bus <= sub_bus; sec_bus++ ) - pseg->bus2bridge[sec_bus] = pseg->bus2bridge[pdev->bus]; - spin_unlock(&pseg->bus2bridge_lock); - break; - - default: - break; - } - - list_del(&pdev->alldevs_list); - pdev_msi_deinit(pdev); - xfree(pdev); -} - static void __init _pci_hide_device(struct pci_dev *pdev) { if ( pdev->domain ) @@ -517,10 +491,14 @@ struct pci_dev *pci_get_real_pdev(pci_sbdf_t sbdf) { if ( !(sbdf.devfn & stride) ) continue; + sbdf.devfn &= ~stride; pdev = pci_get_pdev(NULL, sbdf); if ( pdev && stride != pdev->phantom_stride ) + { + pcidev_put(pdev); pdev = NULL; + } } return pdev; @@ -548,13 +526,18 @@ struct pci_dev *pci_get_pdev(const struct domain *d, pci_sbdf_t sbdf) list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list ) if ( pdev->sbdf.bdf == sbdf.bdf && (!d || pdev->domain == d) ) + { + pcidev_get(pdev); return pdev; + } } else list_for_each_entry ( pdev, &d->pdev_list, domain_list ) if ( pdev->sbdf.bdf == sbdf.bdf ) + { + pcidev_get(pdev); return pdev; - + } return NULL; } @@ -663,7 +646,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, PCI_SBDF(seg, info->physfn.bus, info->physfn.devfn)); if ( pdev ) + { pf_is_extfn = pdev->info.is_extfn; + pcidev_put(pdev); + } pcidevs_unlock(); if ( !pdev ) pci_add_device(seg, info->physfn.bus, info->physfn.devfn, @@ -818,7 +804,9 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) if ( pdev->domain ) list_del(&pdev->domain_list); printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf); - free_pdev(pseg, pdev); + list_del(&pdev->alldevs_list); + pdev_msi_deinit(pdev); + pcidev_put(pdev); break; } @@ -848,7 +836,7 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus, { ret = iommu_quarantine_dev_init(pci_to_dev(pdev)); if ( ret ) - return ret; + goto out; target = dom_io; } @@ -878,6 +866,7 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus, pdev->fault.count = 0; out: + pcidev_put(pdev); if ( ret ) printk(XENLOG_G_ERR "%pd: deassign (%pp) failed (%d)\n", d, &PCI_SBDF(seg, bus, devfn), ret); @@ -1011,7 +1000,10 @@ void pci_check_disable_device(u16 seg, u8 bus, u8 devfn) pdev->fault.count >>= 1; pdev->fault.time = now; if ( ++pdev->fault.count < PT_FAULT_THRESHOLD ) + { + pcidev_put(pdev); pdev = NULL; + } } pcidevs_unlock(); @@ -1022,6 +1014,8 @@ void pci_check_disable_device(u16 seg, u8 bus, u8 devfn) * control it for us. */ cword = pci_conf_read16(pdev->sbdf, PCI_COMMAND); pci_conf_write16(pdev->sbdf, PCI_COMMAND, cword & ~PCI_COMMAND_MASTER); + + pcidev_put(pdev); } /* @@ -1138,6 +1132,7 @@ static int __hwdom_init cf_check _setup_hwdom_pci_devices( printk(XENLOG_WARNING "Dom%d owning %pp?\n", pdev->domain->domain_id, &pdev->sbdf); + pcidev_put(pdev); if ( iommu_verbose ) { pcidevs_unlock(); @@ -1385,33 +1380,28 @@ static int iommu_remove_device(struct pci_dev *pdev) return iommu_call(hd->platform_ops, remove_device, devfn, pci_to_dev(pdev)); } -static int device_assigned(u16 seg, u8 bus, u8 devfn) +static int device_assigned(struct pci_dev *pdev) { - struct pci_dev *pdev; int rc = 0; ASSERT(pcidevs_locked()); - pdev = pci_get_pdev(NULL, PCI_SBDF(seg, bus, devfn)); - - if ( !pdev ) - rc = -ENODEV; /* * If the device exists and it is not owned by either the hardware * domain or dom_io then it must be assigned to a guest, or be * hidden (owned by dom_xen). */ - else if ( pdev->domain != hardware_domain && - pdev->domain != dom_io ) + if ( pdev->domain != hardware_domain && + pdev->domain != dom_io ) rc = -EBUSY; return rc; } /* Caller should hold the pcidevs_lock */ -static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) +static int assign_device(struct domain *d, struct pci_dev *pdev, u32 flag) { const struct domain_iommu *hd = dom_iommu(d); - struct pci_dev *pdev; + uint8_t devfn; int rc = 0; if ( !is_iommu_enabled(d) ) @@ -1422,10 +1412,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) /* device_assigned() should already have cleared the device for assignment */ ASSERT(pcidevs_locked()); - pdev = pci_get_pdev(NULL, PCI_SBDF(seg, bus, devfn)); ASSERT(pdev && (pdev->domain == hardware_domain || pdev->domain == dom_io)); + devfn = pdev->devfn; + /* Do not allow broken devices to be assigned to guests. */ rc = -EBADF; if ( pdev->broken && d != hardware_domain && d != dom_io ) @@ -1460,7 +1451,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) done: if ( rc ) printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", - d, &PCI_SBDF(seg, bus, devfn), rc); + d, &PCI_SBDF(pdev->seg, pdev->bus, devfn), rc); /* The device is assigned to dom_io so mark it as quarantined */ else if ( d == dom_io ) pdev->quarantine = true; @@ -1595,6 +1586,9 @@ int iommu_do_pci_domctl( ASSERT(d); /* fall through */ case XEN_DOMCTL_test_assign_device: + { + struct pci_dev *pdev; + /* Don't support self-assignment of devices. */ if ( d == current->domain ) { @@ -1622,26 +1616,36 @@ int iommu_do_pci_domctl( seg = machine_sbdf >> 16; bus = PCI_BUS(machine_sbdf); devfn = PCI_DEVFN(machine_sbdf); - pcidevs_lock(); - ret = device_assigned(seg, bus, devfn); + pdev = pci_get_pdev(NULL, PCI_SBDF(seg, bus, devfn)); + if ( !pdev ) + { + printk(XENLOG_G_INFO "%pp non-existent\n", + &PCI_SBDF(seg, bus, devfn)); + ret = -EINVAL; + break; + } + + ret = device_assigned(pdev); if ( domctl->cmd == XEN_DOMCTL_test_assign_device ) { if ( ret ) { - printk(XENLOG_G_INFO "%pp already assigned, or non-existent\n", + printk(XENLOG_G_INFO "%pp already assigned\n", &PCI_SBDF(seg, bus, devfn)); ret = -EINVAL; } } else if ( !ret ) - ret = assign_device(d, seg, bus, devfn, flags); + ret = assign_device(d, pdev, flags); + + pcidev_put(pdev); pcidevs_unlock(); if ( ret == -ERESTART ) ret = hypercall_create_continuation(__HYPERVISOR_domctl, "h", u_domctl); break; - + } case XEN_DOMCTL_deassign_device: /* Don't support self-deassignment of devices. */ if ( d == current->domain ) @@ -1681,6 +1685,46 @@ int iommu_do_pci_domctl( return ret; } +static void release_pdev(refcnt_t *refcnt) +{ + struct pci_dev *pdev = container_of(refcnt, struct pci_dev, refcnt); + struct pci_seg *pseg = get_pseg(pdev->seg); + + printk(XENLOG_DEBUG "PCI release device %pp\n", &pdev->sbdf); + + /* update bus2bridge */ + switch ( pdev->type ) + { + unsigned int sec_bus, sub_bus; + + case DEV_TYPE_PCIe2PCI_BRIDGE: + case DEV_TYPE_LEGACY_PCI_BRIDGE: + sec_bus = pci_conf_read8(pdev->sbdf, PCI_SECONDARY_BUS); + sub_bus = pci_conf_read8(pdev->sbdf, PCI_SUBORDINATE_BUS); + + spin_lock(&pseg->bus2bridge_lock); + for ( ; sec_bus <= sub_bus; sec_bus++ ) + pseg->bus2bridge[sec_bus] = pseg->bus2bridge[pdev->bus]; + spin_unlock(&pseg->bus2bridge_lock); + break; + + default: + break; + } + + xfree(pdev); +} + +void pcidev_get(struct pci_dev *pdev) +{ + refcnt_get(&pdev->refcnt); +} + +void pcidev_put(struct pci_dev *pdev) +{ + refcnt_put(&pdev->refcnt, release_pdev); +} + /* * Local variables: * mode: C diff --git a/xen/drivers/passthrough/vtd/quirks.c b/xen/drivers/passthrough/vtd/quirks.c index fcc8f73e8b..d240da0416 100644 --- a/xen/drivers/passthrough/vtd/quirks.c +++ b/xen/drivers/passthrough/vtd/quirks.c @@ -429,6 +429,8 @@ static int __must_check map_me_phantom_function(struct domain *domain, rc = domain_context_unmap_one(domain, drhd->iommu, 0, PCI_DEVFN(dev, 7)); + pcidev_put(pdev); + return rc; } diff --git a/xen/drivers/video/vga.c b/xen/drivers/video/vga.c index 0a03508bee..1049d4da6d 100644 --- a/xen/drivers/video/vga.c +++ b/xen/drivers/video/vga.c @@ -114,7 +114,7 @@ void __init video_endboot(void) for ( bus = 0; bus < 256; ++bus ) for ( devfn = 0; devfn < 256; ++devfn ) { - const struct pci_dev *pdev; + struct pci_dev *pdev; u8 b = bus, df = devfn, sb; pcidevs_lock(); @@ -126,7 +126,11 @@ void __init video_endboot(void) PCI_CLASS_DEVICE) != 0x0300 || !(pci_conf_read16(PCI_SBDF(0, bus, devfn), PCI_COMMAND) & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) ) + { + if ( pdev ) + pcidev_put(pdev); continue; + } while ( b ) { @@ -157,6 +161,7 @@ void __init video_endboot(void) bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); pci_hide_device(0, bus, devfn); } + pcidev_put(pdev); } } diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 6d48d496bb..5232f9605b 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -317,8 +317,8 @@ static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size, uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size) { - const struct domain *d = current->domain; - const struct pci_dev *pdev; + struct domain *d = current->domain; + struct pci_dev *pdev; const struct vpci_register *r; unsigned int data_offset = 0; uint32_t data = ~(uint32_t)0; @@ -332,7 +332,11 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size) /* Find the PCI dev matching the address. */ pdev = pci_get_pdev(d, sbdf); if ( !pdev || !pdev->vpci ) + { + if ( pdev ) + pcidev_put(pdev); return vpci_read_hw(sbdf, reg, size); + } spin_lock(&pdev->vpci->lock); @@ -378,6 +382,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size) ASSERT(data_offset < size); } spin_unlock(&pdev->vpci->lock); + pcidev_put(pdev); if ( data_offset < size ) { @@ -420,8 +425,8 @@ static void vpci_write_helper(const struct pci_dev *pdev, void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size, uint32_t data) { - const struct domain *d = current->domain; - const struct pci_dev *pdev; + struct domain *d = current->domain; + struct pci_dev *pdev; const struct vpci_register *r; unsigned int data_offset = 0; const unsigned long *ro_map = pci_get_ro_map(sbdf.seg); @@ -443,6 +448,8 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size, pdev = pci_get_pdev(d, sbdf); if ( !pdev || !pdev->vpci ) { + if ( pdev ) + pcidev_put(pdev); vpci_write_hw(sbdf, reg, size, data); return; } @@ -483,6 +490,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size, ASSERT(data_offset < size); } spin_unlock(&pdev->vpci->lock); + pcidev_put(pdev); if ( data_offset < size ) /* Tailing gap, write the remaining. */ diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 5975ca2f30..6631643fb1 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -116,6 +117,9 @@ struct pci_dev { /* Device misbehaving, prevent assigning it to guests. */ bool broken; + /* Reference counter */ + refcnt_t refcnt; + enum pdev_type { DEV_TYPE_PCI_UNKNOWN, DEV_TYPE_PCIe_ENDPOINT, @@ -160,6 +164,14 @@ void pcidevs_lock(void); void pcidevs_unlock(void); bool_t __must_check pcidevs_locked(void); +/* + * Acquire and release reference to the given device. Holding + * reference ensures that device will not disappear under feet, but + * does not guarantee that code has exclusive access to the device. + */ +void pcidev_get(struct pci_dev *pdev); +void pcidev_put(struct pci_dev *pdev); + bool_t pci_known_segment(u16 seg); bool_t pci_device_detect(u16 seg, u8 bus, u8 dev, u8 func); int scan_pci_devices(void); @@ -177,8 +189,14 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, int pci_remove_device(u16 seg, u8 bus, u8 devfn); int pci_ro_device(int seg, int bus, int devfn); int pci_hide_device(unsigned int seg, unsigned int bus, unsigned int devfn); + +/* + * Next two functions will find a requested device and acquire + * reference to it. Use pcidev_put() to release the reference. + */ struct pci_dev *pci_get_pdev(const struct domain *d, pci_sbdf_t sbdf); struct pci_dev *pci_get_real_pdev(pci_sbdf_t sbdf); + void pci_check_disable_device(u16 seg, u8 bus, u8 devfn); uint8_t pci_conf_read8(pci_sbdf_t sbdf, unsigned int reg);