From patchwork Thu Aug 15 15:11:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764898 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 814DFC3DA7F for ; Thu, 15 Aug 2024 15:11:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F6DF6B0126; Thu, 15 Aug 2024 11:11:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 057006B0128; Thu, 15 Aug 2024 11:11:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D81436B0127; Thu, 15 Aug 2024 11:11:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AED6A6B011D for ; Thu, 15 Aug 2024 11:11:49 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 36043A8853 for ; Thu, 15 Aug 2024 15:11:49 +0000 (UTC) X-FDA: 82454819538.28.236B7AA Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2046.outbound.protection.outlook.com [40.107.93.46]) by imf02.hostedemail.com (Postfix) with ESMTP id 638848001F for ; Thu, 15 Aug 2024 15:11:46 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=izF+pujD; spf=pass (imf02.hostedemail.com: domain of jgg@nvidia.com designates 40.107.93.46 as permitted sender) smtp.mailfrom=jgg@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723734670; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eJJA1GQA3vJPtmW3yZinNnfONAqFQ4KjojeNqa3Q0/k=; b=e/ZWxclJ20Z/TELoIKmzg0FhkTPCkamik3VmOqbi2i/v0+fSwUirV+AYvSasPHZh4p0K61 l2KxE8YjQyOHFN7insww/VRvdIY2ifI5/Uo2J4RzwIl0lnpwKIiWGEKZSxaQQfxu5q2oI6 Vd6pXo33FnITqplZeLD09z1iPRpi/5I= ARC-Authentication-Results: i=2; imf02.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=izF+pujD; spf=pass (imf02.hostedemail.com: domain of jgg@nvidia.com designates 40.107.93.46 as permitted sender) smtp.mailfrom=jgg@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1723734670; a=rsa-sha256; cv=pass; b=j5xWg+5V+5EATteofyFyh0VxjFVd2XYxKId1mjLHquxdFOaSiR+n+WI7o0RL6rdGHqm358 Rao39Y6wp9p1LCrUfUm6YAidT1SXkGAa9JvGhRTgjKlYgH1cbdPxUd12w0CnN7/GC7jkM3 g8wHXaO5uyyOfhWKm6hMZAOeIrsDrvs= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=MFiusJvmurS8wISARywhk/bKo8z36QuBoLE+aV+z0NeddEyZkHoLSHC526DpntkDG2JwSOY/kkK2JPUNej8czxwziqNlHlaI3z+1iH5Rs8nt+0eeMVPR6U3cMrscesIVVNKdwO42IYkUTsmAl6VVZLAG34+Uuj7t+6elfOegv18eGjYifiDPdmNyZb6vReaCbKcyMG4rc3Gwopc5++LdOYy8RQi6bL8sWjis8s8j87XAWVIzxlQkLBhFD5FrllbBlVoqRNosH8SFgrA4b7MfP4jklDPBbv43dkXa3Lg/3uOZ30IBF5I9OxGcLeEe4FXzEMjg5ZJM7NL0cGKQDrfA3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eJJA1GQA3vJPtmW3yZinNnfONAqFQ4KjojeNqa3Q0/k=; b=YJKGUUblNrCZDaga5IqFrQNsuIXIIHzsziedgvaAN0RU9fgsrDfc9mz0hYE5wCfExVxueL5unfNheZVlV3lH3+31aYe3iHyMyq66U7VBKzQuj4omgnxt6uyE7AIIS1sd+OCog5Wi176XiDHVK/eXjhq+ktMfKVModjUlwjIvGzyoZGC0NBU39gWtfODH/KKmp6S1MAEGCouTV7R/oJQc2EXLNNqBoyjc2n1N+llCJ7aMJrnQy2cD1O5pqZ0M+taw6gzGYn/oMUK2C3JbziMXRCLqFq0tSmTAuBb/d+aKYKqp7OFND40U43BhvgFWgaQtYBIqzYDTgbSp5OU1rQEFxQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=eJJA1GQA3vJPtmW3yZinNnfONAqFQ4KjojeNqa3Q0/k=; b=izF+pujDchOiDYDyEc2zpKeWQehgzlmWdDMVXwS9CYAQFBhhNKQwVF39qYbGqNWpq86YfDwUMCDPPzA7UOuzDnr8r3AnEkne2mA0MrPSPiTv0usnlB/qRzu4uZOYBqKeKjAvuF+EKH5g8HHaQiopT3eRq8h5N+4qAXFxCudftgzIK2PcIZOQUvFLAdzS0+yHtk+ocWig3QfCVgncG5YdP+BaSbCX1/k+pR/uNyr1zD7oTQpcgL4vEsCo/1W5H1orGM3sn8Odfshxq6HSX6SCsTj9pcV/zL1A6hn/9j8pmRZ9i05qijOkHTmrfNRiW2HgQpQbIDKpwxsnKhRiBUKsKg== Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by DS0PR12MB6631.namprd12.prod.outlook.com (2603:10b6:8:d1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.18; Thu, 15 Aug 2024 15:11:40 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:40 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 02/16] genpt: Add a specialized allocator for page table levels Date: Thu, 15 Aug 2024 12:11:18 -0300 Message-ID: <2-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BLAPR05CA0046.namprd05.prod.outlook.com (2603:10b6:208:335::26) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|DS0PR12MB6631:EE_ X-MS-Office365-Filtering-Correlation-Id: 64bd94e8-6c90-4333-4927-08dcbd3c8e12 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: VP6dQLmgY3NjXPaFZ6xFFwUkqgXP5NkntvAg5PTwcnrgnYeEK/Zegm+7FQszerpVNs2NbecLyJ9ntgRaC2gqhCuHoaSNRjFCLCKHnX+PQ9IjyXBpI+HOmLVTFlVd/w0h/02pvhwa3y0dfFB06YDT6Cx8Rfa5UQsrYJSZ0ozCCU38/p3UcQvJzzlqIn/mHMTzG/G7yhy/ucfslfQhWm9fLnLOI4c1qksnGBWVDd5B2GwHVpp/fh3bySVq5ZWUhJBiKa44Ylhj7MOBhB8XMpIs6e1YuS4QTknW85fRdyjL0ZK6XjGlvzRraDVkrJ0XyIBbnSKmn0LK2SX5pzx7qzUlGybTKvLe9Xi2se01rQABDp2GYpMs8rvAO4/gDSZoTI1kIXkgvbPlQATFNVFyRShOG7g2/tNeEzFfuw/nCUq1SzySH+J93Lr3HXT1+6Dw3GM5bqJ+FjT3LIsbLP/YAL8maaOoiEE/UsErbvnapvfJHsJIUhmpZphSAP+J0aIe1a7X6cUT/E4kPyanmxuIE6TpZ4GGTsYMSvqD5YtlZs0R3L2QeQ48aN5+m8q6GHHr7z/+Ln3YhuTWEnG5y5XV4N7Q0kBgaRnayh587WMZnfj86Npnz9z9jICkhOeKo/+/6HuhWTWljzfqAn0F8L9CDyTob3d9VZibbJMTSUeUNAfF8e3fUIVvDDTMiZYEhdcFWacS4dyyXCu52sIteiXL6PlDFV8wj85WbimjOr4doHysKZOAGKyrqv8fYcL6TdssSjsq39iKBEPwYRFo928SRlsrUjB2noP7473c98qlFM4OfxEpK0JgAe/FVWEZO55kAwEPw+Oay6Xn8Cel/DA6aIlfQRQqInR61JQ0mrHwrhKsj9vP0sd+ewTByPUxkXzBwSfWVeree/F6rC/6Y/FF+IYtaJD5Pv9jrXtK4On+RXXjzad03spqY1dlsxseUSvMmrQWe34J9v4OwS4iv4FbLM+vCc2a3d1MYs7zhJaFG9T1pNaMtQ98iDhlC8qwqbwbTdc87xGWrSkEjoJpmlv+VPXmX58d9xKMsM1bhuyRI0lNsFOo63oOf4JkM2OPKj7s0iPypAgaI/xtW4BogEeqbhGDtEBBBbsKbfu856RkHGg2CfmZrHwTSYvoHiL/4h5D4jgY2wuceVX1uyXHIcDNGdOhIA3GZfArf6ZGwu7lq2fOgX+mkSMxO+F94vY4Y1MQmDUNsVK8t7oxrSu6C57jJif5g/S/JzwLnvy4Fo7fjm31hr8+z6UR8ojWOJx/TPAJ7L6PGVL3JZrr/2uTHzHjMgIkDFSfjwRVjYkI1ldOP1+x+w2/EEwsvYer6EorNzkfhVXHyUwu2qwb9w//Kif801zncg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(7416014)(376014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: k+fZLrzDM102u+qlw/t1/ymr1XjRCe5ES7bAhGZfpA4VoqGKFfmQEVKXAzyRNoi7ta/epMRn8bcsBK9MPive+NnqXjSNB+OJyR6zvbCStpYxQ/Jm9MnT03BDHB8HdszG+1636MNN+gTSh6fzgppcAEikuEPCn0FhJVWR7bwJ1nzJkSNE9DRYdgxj0c4VGw51U5oGgOXEKjCW5qf7ZfiOQo3SmV2sXJdKbKiZJwIzEHMdZRkUP571OTHFuacLLXdjzZUjhGJNFLmL1yMaiySn0jMv6iiQcBjAphxrCQW1V8XjMAhHYgeQFuC5/dZA9w5Ehbx5KkaUU9ChPZPJp8AZ9CG47ogz0Nz3jNkzj12bzv5AKmIe2DWAxqjgMeoQmevz89czLFspMOEY2F0X9Y2IRovte990FdT9+lONthesm8iaj3Ch+p6bI/U8iKtYm4UGsSsCMw9vQrajlTtFE3wsPjp0O//fa/ZASfkRVVBRH4FFp3CdjRSenjLghTlualMAeT0OFFYRbh2/6HBTbF/zDZu3FiiyafVUWZXL0Fu98KC5ejPREMJBQwTfchmtEa627CGDRWvVZK0ObJusTN+BeS1tDVP+YhD3K8INTX8NnUbVTyz9iNQN7EBDnP/HuECqscS+x7et4/6m9JsXyIUQohyR+f0PD7glaTXuuTSx/IomhiNFUzkCRFuioaSaJ6nPTVN1kLdf9tAlKOSB1jVDic/IoMWlcC+MfSzF22bWhevdg2pR/nE5+nKjuEvzG1D+9JaEjP0w/Uo9B7koVPbFsAoc6qeOnMotjC7ovpcor8ZA1GGodvsKBiM2tyDoq9w3JfwMY6q3ZIDfW+vgHU2HRwIChFmJ9WgERRXorxNh9LWFCBeGyRj0LfQXX20Vz/ktcebwFunqLs9Pw96RLaPTQL0yfJVXR43LcVzHOuf88ZeQBst9fkXNdXLIpEEqZyqaSsmn/2RWl0RzN9hqEIAUC2paBS1+QjzYTF2QTccNmS9MmXQaWD0j8/ddT+0xZm0zmmD8PhEUSR4J/ksg37/xZcuEPVtmkHVbp+ZwwbiFeyXy0P+tA70WPL5H+nCvboxr6nFNropeSPVn4fr2XbKZ1YusykaJNJ0a8clOPLrm9kztHD3R9Iz2fcxOyN22KC6bP8bR3P7cAyjym0kqKJX9kfH1u5qR5GBy8ERn3fD0iC42pHY11GQqaMa4UMImyCMocpBrxzFUTmh282zUHi4PlICkhkwkruSOxOALh7s5F+blsCPChhlWN2rsjUpzb+6XDJrdA9woVNOcRdyUQ3J4r8eRLIJRmJStAfV2lOXx7EOqu7WOoVZkm1QnD1FAZ3tH+b954CUuqRX9GriP+os9fxGjQmkYIbxdYgagSHu7v+MPIP4Bg/1cp0ye03E7GraoEVu/2sr3T9r/rCrkjQ3hQVIGmENh0C5LpUmVK+oYFxfxJ4DHJSxY1B41dL9yoelNi3S4Y1Nil4VB5BiZdMtH1KGqM8bPAVEZppk6I7pxrp5C9kb1slRc9YU5LKG9q6Kz1Eet2iwYA8JKLbmakq3O+HkNOgsdWnOhqL73OITT99E= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 64bd94e8-6c90-4333-4927-08dcbd3c8e12 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:36.0880 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: E2y5zo4N9OLEQHsgU/xG0d1seYXm6CE0zU7Aft7XzSB4AMOBaj3xglyrqOLZ4o95 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB6631 X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 638848001F X-Stat-Signature: 3af3zuo9rbsoymgt3qyiez3yom4a117k X-HE-Tag: 1723734706-761787 X-HE-Meta: U2FsdGVkX1+Hu6pPiF61AL6Jz/vgvtfwVQE2kWRmxAl0cSRd0Eb4DaPYOZlwdyB51iKreDJRJLEuz2zkUa+oKwpetA57AJMZ2+98ID9uzvdxVof4lIXcePKj0N3exAoaI6FfdIZbgAR5hBRzpn69Qiw2vCz9dsLXyb+F2HUZ2Ph5h8njAnV6JdpRBriQSPgIKnEVW8YRf31+jW+AAXm0y/lMJE1JZWowPEbDsjM4gHucGi99cHYr1OtSRziCQ/FaASiK13N3D98kjqv5Pw4uXCvg2lkALs9rBSpFuR4xycANS5lwubpHB7tJBOhhXdUXwszyY//Yf310BhZ510ZJv7XntLoUhUDQ8E2hVzYHQNmrMNOZoCcUZjF41WKJM1JIEZ1CdQvV3Pl+ImGNLAc+zOpDb4eRFUH99txgdFT5GV0LGQ5Gg33uIXLrGp1J10iCLpcwvkzP0J1RxfX4dQ2ZMuPq6MUOgH0E8wj4+gvEHN4sz4EL8EYwMzbTk5I3OE0kesnkP9YhsC6RkS68oHOaRfB9u+5RQnRVQ8iD/ljpIV5/Tjf+6gKleCPh1EdUf3ufDGRsO8LW4q6Ts3UVCLHHGLykkloXKp4IdNNDaycQUe1v4S/G5cEZfTAwZB1TLe3YP0m5Ti8lqVO2lK/ytszwhOZPaY+/EJr3/KWToZszLAXN8UqQNcpf8ZnhdDVUdsCzhor4IuuxLvmvZR6ads8BuCyijgdVi67jbJojjMygE38Mi21GxV8ATJIDV6LPA/lTRBH8xpYTDXtrdGPHAMAMWsWGMLTN4oJMHyTFsz/WMwQaJ3m7xdhsxxQROFHXnwAx56S4a4cn5J9S7eu9QctE7gDSt5lxxJCcOIADNzRWeir3Hrlacg4OogR11+DxZuYV+uTR06dGoB13FnYEr8A2IRs5l341SwOPfjEIDRNtocaKx90QmjVwvyJ4Tcm+oTJvBHZcbXsMCXaXTgCZKt5 TxDDxpaI cl1EcBo8tEVhrFdjndjpy0S6tArQvFDLYgPlqT/2NLdmEyI2wNtVNkE5rtEWWvexWftxA7HAGGOhq2i5KmW5ER79TfJ4WkuiMXWSAVQ3gGo+Wtgmr4OF8uXDPkcQ4//f9dvIgkIrjSYdMsGhPH07c5lrOD4VUSNHZTxj0AUsYeGOb4DDaBb9Q/AI2XbrcmELGt2Azu1hLdNhISODD6icyexhcnXkpBCBmWPaAH7g4hD4Gu4PlyBPfJUYFu1DOTDaytZac0SJedWFeq6Iu1yahg7pMXEq1+xqn2ajZJeq2err/sofhCsDqj2j3bA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A radix or "page table level" is the memory inside the page table used to store the data. Generally formats have a fixed size for these tables and all are uniform. It is usually PAGE_SIZE of their respective architectures, but not always. Often the top most table level has a different size than the rest. The key function of this allocator is a way to maintain a linked list of the memory, and a RCU free capability of those lists. Most of the algorithms in the iommu implementation rely on the linked lists, and the RCU is necessary for debugfs support. Use the new folio-ish infrastructure for creating a custom struct page to store the additional data. Included in this is some support for managing the CPU cache invalidation algorithm that ARM uses. The folio is used to record when the table memory has been DMA mapped along with helpers to DMA API map/unmap the memory. FIXME: Several of the formats require sub-page sizes (ie ARMv7s uses 1k tables pages on a 4k architecture, ARMv8 can use 4k/16k/64k pages regardless of the CPU PAGE_SIZE). 4:1 can be handled by giving up on the no-allocate RCU and storing 4 next pointers directly in the folio. The 16:1 case would require allocating additional memory to hold the metadata, much like Matthew's proposed memdesc. In a future memdesc world the per-folio metadata would be allocated to the required size. This logic is not implemented yet. FIXME: - sub-page sizes. Without support it wastes memory but is suitable for funtional testing. - This has become weirdly named - This is general, except it does use NR_IOMMU_PAGES Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/Kconfig | 8 ++ drivers/iommu/generic_pt/Makefile | 4 + drivers/iommu/generic_pt/pt_alloc.c | 174 ++++++++++++++++++++++++++++ drivers/iommu/generic_pt/pt_alloc.h | 98 ++++++++++++++++ 4 files changed, 284 insertions(+) create mode 100644 drivers/iommu/generic_pt/pt_alloc.c create mode 100644 drivers/iommu/generic_pt/pt_alloc.h diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig index 775a3afb563f72..c22a55b00784d0 100644 --- a/drivers/iommu/generic_pt/Kconfig +++ b/drivers/iommu/generic_pt/Kconfig @@ -19,4 +19,12 @@ config DEBUG_GENERIC_PT kernels. The kunit tests require this to be enabled to get full coverage. + +config IOMMU_PT + tristate "IOMMU Page Tables" + depends on IOMMU_SUPPORT + depends on GENERIC_PT + default n + help + Generic library for building IOMMU page tables endif diff --git a/drivers/iommu/generic_pt/Makefile b/drivers/iommu/generic_pt/Makefile index f66554cd5c4518..f7862499642237 100644 --- a/drivers/iommu/generic_pt/Makefile +++ b/drivers/iommu/generic_pt/Makefile @@ -1 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 +iommu_pt-y := \ + pt_alloc.o + +obj-$(CONFIG_IOMMU_PT) += iommu_pt.o diff --git a/drivers/iommu/generic_pt/pt_alloc.c b/drivers/iommu/generic_pt/pt_alloc.c new file mode 100644 index 00000000000000..4ee032161103f3 --- /dev/null +++ b/drivers/iommu/generic_pt/pt_alloc.c @@ -0,0 +1,174 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#include "pt_alloc.h" +#include "pt_log2.h" +#include +#include + +#define RADIX_MATCH(pg, rl) \ + static_assert(offsetof(struct page, pg) == \ + offsetof(struct pt_radix_meta, rl)) +RADIX_MATCH(flags, __page_flags); +RADIX_MATCH(rcu_head, rcu_head); /* Ensure bit 0 is clear */ +RADIX_MATCH(mapping, __page_mapping); +RADIX_MATCH(private, free_next); +RADIX_MATCH(page_type, __page_type); +RADIX_MATCH(_refcount, __page_refcount); +#ifdef CONFIG_MEMCG +RADIX_MATCH(memcg_data, memcg_data); +#endif +#undef RADIX_MATCH +static_assert(sizeof(struct pt_radix_meta) <= sizeof(struct page)); + +static inline struct folio *meta_to_folio(struct pt_radix_meta *meta) +{ + return (struct folio *)meta; +} + +void *pt_radix_alloc(struct pt_common *owner, int nid, size_t lg2sz, gfp_t gfp) +{ + struct pt_radix_meta *meta; + unsigned int order; + struct folio *folio; + + /* + * FIXME we need to support sub page size tables, eg to allow a 4K table + * on a 64K kernel. This should be done by allocating extra memory + * per page and placing the pointer in the meta. The extra memory can + * contain the additional list heads and rcu's required. + */ + if (lg2sz <= PAGE_SHIFT) + order = 0; + else + order = lg2sz - PAGE_SHIFT; + + folio = (struct folio *)alloc_pages_node( + nid, gfp | __GFP_ZERO | __GFP_COMP, order); + if (!folio) + return ERR_PTR(-ENOMEM); + + meta = folio_to_meta(folio); + meta->owner = owner; + meta->free_next = NULL; + meta->lg2sz = lg2sz; + + mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, + log2_to_int_t(long, order)); + lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, + log2_to_int_t(long, order)); + + return folio_address(folio); +} +EXPORT_SYMBOL_NS_GPL(pt_radix_alloc, GENERIC_PT); + +void pt_radix_free_list(struct pt_radix_list_head *list) +{ + struct pt_radix_meta *cur = list->head; + + while (cur) { + struct folio *folio = meta_to_folio(cur); + unsigned int order = folio_order(folio); + long pgcnt = 1UL << order; + + mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, -pgcnt); + lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, -pgcnt); + + cur = cur->free_next; + folio->mapping = NULL; + __free_pages(&folio->page, order); + } +} +EXPORT_SYMBOL_NS_GPL(pt_radix_free_list, GENERIC_PT); + +void pt_radix_free(void *radix) +{ + struct pt_radix_meta *meta = virt_to_meta(radix); + struct pt_radix_list_head list = { .head = meta }; + + pt_radix_free_list(&list); +} +EXPORT_SYMBOL_NS_GPL(pt_radix_free, GENERIC_PT); + +static void pt_radix_free_list_rcu_cb(struct rcu_head *head) +{ + struct pt_radix_meta *meta = + container_of(head, struct pt_radix_meta, rcu_head); + struct pt_radix_list_head list = { .head = meta }; + + pt_radix_free_list(&list); +} + +void pt_radix_free_list_rcu(struct pt_radix_list_head *list) +{ + if (!list->head) + return; + call_rcu(&list->head->rcu_head, pt_radix_free_list_rcu_cb); +} +EXPORT_SYMBOL_NS_GPL(pt_radix_free_list_rcu, GENERIC_PT); + +/* + * For incoherent memory we use the DMA API to manage the cache flushing. This + * is a lot of complexity compared to just calling arch_sync_dma_for_device(), + * but it is what the existing iommu drivers have been doing. + */ +int pt_radix_start_incoherent(void *radix, struct device *dma_dev, + bool still_flushing) +{ + struct pt_radix_meta *meta = virt_to_meta(radix); + dma_addr_t dma; + + dma = dma_map_single(dma_dev, radix, log2_to_int_t(size_t, meta->lg2sz), + DMA_TO_DEVICE); + if (dma_mapping_error(dma_dev, dma)) + return -EINVAL; + + /* The DMA API is not allowed to do anything other than DMA direct. */ + if (WARN_ON(dma != virt_to_phys(radix))) { + dma_unmap_single(dma_dev, dma, + log2_to_int_t(size_t, meta->lg2sz), + DMA_TO_DEVICE); + return -EOPNOTSUPP; + } + meta->incoherent = 1; + meta->still_flushing = 1; + return 0; +} +EXPORT_SYMBOL_NS_GPL(pt_radix_start_incoherent, GENERIC_PT); + +int pt_radix_start_incoherent_list(struct pt_radix_list_head *list, + struct device *dma_dev) +{ + struct pt_radix_meta *cur; + int ret; + + for (cur = list->head; cur; cur = cur->free_next) { + if (cur->incoherent) + continue; + + ret = pt_radix_start_incoherent( + folio_address(meta_to_folio(cur)), dma_dev, false); + if (ret) + return ret; + } + return 0; +} +EXPORT_SYMBOL_NS_GPL(pt_radix_start_incoherent_list, GENERIC_PT); + +void pt_radix_stop_incoherent_list(struct pt_radix_list_head *list, + struct device *dma_dev) +{ + struct pt_radix_meta *cur; + + for (cur = list->head; cur; cur = cur->free_next) { + struct folio *folio = meta_to_folio(cur); + + if (!cur->incoherent) + continue; + dma_unmap_single(dma_dev, virt_to_phys(folio_address(folio)), + log2_to_int_t(size_t, cur->lg2sz), + DMA_TO_DEVICE); + } +} +EXPORT_SYMBOL_NS_GPL(pt_radix_stop_incoherent_list, GENERIC_PT); diff --git a/drivers/iommu/generic_pt/pt_alloc.h b/drivers/iommu/generic_pt/pt_alloc.h new file mode 100644 index 00000000000000..9751cc63b7d13f --- /dev/null +++ b/drivers/iommu/generic_pt/pt_alloc.h @@ -0,0 +1,98 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#ifndef __GENERIC_PT_PT_ALLOC_H +#define __GENERIC_PT_PT_ALLOC_H + +#include +#include +#include + +/* + * Per radix table level allocation meta data. This is very similar in purpose + * to the struct ptdesc. + * + * radix levels have special properties: + * - Always a power of two size + * - Can be threaded on a list without a memory allocation + * - Can be RCU freed without a memory allocation + */ +struct pt_radix_meta { + unsigned long __page_flags; + + struct rcu_head rcu_head; + union { + struct { + u8 lg2sz; + u8 incoherent; + u8 still_flushing; + }; + unsigned long __page_mapping; + }; + struct pt_common *owner; + struct pt_radix_meta *free_next; + + unsigned int __page_type; + atomic_t __page_refcount; +#ifdef CONFIG_MEMCG + unsigned long memcg_data; +#endif +}; + +static inline struct pt_radix_meta *folio_to_meta(struct folio *folio) +{ + return (struct pt_radix_meta *)folio; +} + +static inline struct pt_radix_meta *virt_to_meta(const void *addr) +{ + return folio_to_meta(virt_to_folio(addr)); +} + +struct pt_radix_list_head { + struct pt_radix_meta *head; +}; + +void *pt_radix_alloc(struct pt_common *owner, int nid, size_t log2size, + gfp_t gfp); +void pt_radix_free(void *radix); +void pt_radix_free_list(struct pt_radix_list_head *list); +void pt_radix_free_list_rcu(struct pt_radix_list_head *list); + +static inline void pt_radix_add_list(struct pt_radix_list_head *head, + void *radix) +{ + struct pt_radix_meta *meta = virt_to_meta(radix); + + meta->free_next = head->head; + head->head = meta->free_next; +} + +int pt_radix_start_incoherent(void *radix, struct device *dma_dev, + bool still_flushing); +int pt_radix_start_incoherent_list(struct pt_radix_list_head *list, + struct device *dma_dev); +void pt_radix_stop_incoherent_list(struct pt_radix_list_head *list, + struct device *dma_dev); + +static inline void pt_radix_done_incoherent_flush(void *radix) +{ + struct pt_radix_meta *meta = virt_to_meta(radix); + + /* + * Release/acquire is against the cache flush, + * pt_radix_still_incoherent() must not return 0 until the HW observes + * the flush. + */ + smp_store_release(&meta->still_flushing, 0); +} + +static inline bool pt_radix_incoherent_still_flushing(void *radix) +{ + struct pt_radix_meta *meta = virt_to_meta(radix); + + return smp_load_acquire(&meta->still_flushing); +} + +#endif