From patchwork Mon Feb 5 12:01:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545314 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01962C48291 for ; Mon, 5 Feb 2024 12:02:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 86C456B0082; Mon, 5 Feb 2024 07:02:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F4076B0083; Mon, 5 Feb 2024 07:02:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66E2B6B0085; Mon, 5 Feb 2024 07:02:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5337F6B0082 for ; Mon, 5 Feb 2024 07:02:32 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 21E62A1876 for ; Mon, 5 Feb 2024 12:02:32 +0000 (UTC) X-FDA: 81757612944.16.758EABA Received: from smtp-fw-9106.amazon.com (smtp-fw-9106.amazon.com [207.171.188.206]) by imf20.hostedemail.com (Postfix) with ESMTP id C76D01C0021 for ; Mon, 5 Feb 2024 12:02:28 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=b3mmLUM8; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf20.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 207.171.188.206 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134549; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ulQJ3D4Me8ScxlcztMMDIgEvnR/mGPMynfspToLkF2M=; b=zwH8vBVPJ9c00JcLtYJttGY8sMz6+qogayUhjflvAFZ2tgpmrROTwUjyc0LRfbsjrYoc5V qQXcyJd7H6u6u50H+m2EI1JcmtKCLL1mcu0RhPsZJeo8caYoaufIK1OfgeNsw8ns9pR534 rox+GpktS7y22g5fB6XjQTTlvZJHHQ4= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=b3mmLUM8; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf20.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 207.171.188.206 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134549; a=rsa-sha256; cv=none; b=w+B9wCTRViKxarj8M7I1d7DMIA7wZtAyK8o1l2Lr+sV/udnZj7vGdkxKZ6nnsr7h6o9XoC 22BvKf3XWFmAqP0FfdKqzKsR3SCO0K8QEa9vMW2C5Xu+Ms0w3Lyemt8IH2dq47tspvIUpb vbPatnoFxzdAhWPV4gzbPqWHLS8XgAc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134549; x=1738670549; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ulQJ3D4Me8ScxlcztMMDIgEvnR/mGPMynfspToLkF2M=; b=b3mmLUM8uxUdbQnSkV2tBea6B5synisslurnQMLhT1Y48dqsPRoeoRTI 6zWc/GGxW9avbDPVCdKxH58P9Dr0k5iHKpkCzx+wqjbVJk03MVe6yB7e7 EYdCy2Rd/TN/Ur0HgkAZqbavOeRmsOHJuvj7iR+0Yjh9S/ByJIIE4Kmac g=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="702145833" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9106.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:02:22 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.43.254:59802] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id c85ecf83-e3a4-4c42-963b-1a5c7099c0b7; Mon, 5 Feb 2024 12:02:20 +0000 (UTC) X-Farcaster-Flow-ID: c85ecf83-e3a4-4c42-963b-1a5c7099c0b7 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:20 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:14 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 01/18] pkernfs: Introduce filesystem skeleton Date: Mon, 5 Feb 2024 12:01:46 +0000 Message-ID: <20240205120203.60312-2-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D045UWA002.ant.amazon.com (10.13.139.12) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: C76D01C0021 X-Stat-Signature: p51m9uk9wf1zdqsrpczxpycqyayojnbj X-Rspam-User: X-HE-Tag: 1707134548-226095 X-HE-Meta: U2FsdGVkX1/KQpWGZCWsiV5LxH43T4M+KeVElH9tUMph2yoJ8Wp7jYs5TCBVK3voFb4PY56hu6C4uedS6wikycbyePcGM5Y45+iyi50sCpg4ve4bJVHKg/RzO0kubwKshNt6LNnr+7AeMpS/AUbz1J0nZ5Qq+MrEe16q3+S2wXulN6200E/J/fQ1MeQMq2BkGQIGpkYBjTquAADRAa2GZ7na2D0ZNAK1oCtvH4t27Ni+WqSBIMYgPyHAagcSwnDmnXPfboR4vP+sLSU4IA6/HuSxi3p5HpPXkmyF4LjKhdDjveZ/d5tzOdfQ89B4W9eAM+Taw36lvbG/uE7Md9HuN5BdiYsbMz02us+puu7TWONSetkx5ZkSwlRM1BwAsyQ13P9drXdQ+Vn9VMZKEoN047shND9u35EEUdunNNGpfDMGBgSX4DKF958goytsEoqHui2OQXRw37ThOflH2xGBVHy++26TAfGQn+zgd9ne6COY1KTT1IZ3Vm2Cxt2GBZUw2gl21odG6QbCfvna+cFYO6gLMb/kci3QxJClDuqzAin+fIwvO9lve6jEq7fdKkUkPNxslQq+YPGwXjk2cbBDZKS6qRT7GRdLv3cjcWqCrwMk/byhVOfKC52Mchp5CkN2my4l5tqoIiMz8S2OjtpfUfsT28v2KPE/qSsMFCaq5jAGDzrifUXXRgADOZyxXI3JfrO8Y4nPIdLeyis1c0yFutE/WgCw1SEjnQKk3Y6KKrk7kB9erWMwqR/jk3KfPwh6LKBoYo3WwU1pjfk7vHcCbwOHs0rGEBeilgyR5gYAFs1zptfeHyGKbnO8Z2HHYtPxQi+nblPRmXUlfXWlGeJO7BMsNXNIH5sHiIUxkUCc60CXIhZugWrim/inud+CthyYE8nVOI4FyySPQmXqjlvd/Jx+qusRaqSDpFuMpwfD/CDZHrN/+9fRlCcHS2SGb6iLWoaY3EmtZguwFVEqoaD zIL5WBx6 ApBD+f5eKbycCiQDw4RKSRwpwu9sBYpL2dcupd91pTuswtSX5QpCoWhO9AXDn6EMLkt3zLOLwtXl3S5BbRzVMiIeWTRd3JUveDh0x1XRinu/9nPlFT8+FYUjZiImR7lzBlctG933w0KjBtIO84FSIDhCVSxxePGOBpvUeqEPlURAycjboSPASGeIZ8x/mWH+UopH8JQnJd/NO8CJ16+j721ojNwyhPUaGY/rrZp8MTBNcZRdooeohrt1Uu1Lz+oOn532Y3GzeQloI20qemFckvjN0GNRaDTo6MSHdKNR5IPcgV4jOUrtGnIHHLbp1GGHjkhZAV8mVz3aLZhaNiNR2KKugDF7PQ1b+2qcsJW+jGoQ8/3C8vTIDkGsRuBiDmkYwpD3FLMNoS5+CHtdcyHeid3gpdsq+0XJ1W9ha9PSOrhjpRYfT2bthhfKGcjLR3y+Rr8FsEABPUJo0ze27icU7sfwrSSOmGYbX1ODUvwOsM/8xYjD0MxLS9PZ2ssEuc/xKIl3iuphr5qGPChHC7AhG6l3yCjFtfVXETWbgODEgo+2HKuIyBaLbqrA6Qg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add an in-memory filesystem: pkernfs. Memory is donated to pkernfs by carving it out of the normal System RAM range with the memmap= cmdline parameter and then giving that same physical range to pkernfs with the pkernfs= cmdline parameter. A new filesystem is added; so far it doesn't do much except persist a super block at the start of the donated memory and allows itself to be mounted. --- fs/Kconfig | 1 + fs/Makefile | 3 ++ fs/pkernfs/Kconfig | 9 ++++ fs/pkernfs/Makefile | 6 +++ fs/pkernfs/pkernfs.c | 99 ++++++++++++++++++++++++++++++++++++++++++++ fs/pkernfs/pkernfs.h | 6 +++ 6 files changed, 124 insertions(+) create mode 100644 fs/pkernfs/Kconfig create mode 100644 fs/pkernfs/Makefile create mode 100644 fs/pkernfs/pkernfs.c create mode 100644 fs/pkernfs/pkernfs.h diff --git a/fs/Kconfig b/fs/Kconfig index aa7e03cc1941..33a9770ae657 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -331,6 +331,7 @@ source "fs/sysv/Kconfig" source "fs/ufs/Kconfig" source "fs/erofs/Kconfig" source "fs/vboxsf/Kconfig" +source "fs/pkernfs/Kconfig" endif # MISC_FILESYSTEMS diff --git a/fs/Makefile b/fs/Makefile index f9541f40be4e..1af35b494b5d 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -19,6 +19,9 @@ obj-y := open.o read_write.o file_table.o super.o \ obj-$(CONFIG_BUFFER_HEAD) += buffer.o mpage.o obj-$(CONFIG_PROC_FS) += proc_namespace.o + +obj-y += pkernfs/ + obj-$(CONFIG_LEGACY_DIRECT_IO) += direct-io.o obj-y += notify/ obj-$(CONFIG_EPOLL) += eventpoll.o diff --git a/fs/pkernfs/Kconfig b/fs/pkernfs/Kconfig new file mode 100644 index 000000000000..59621a1d9aef --- /dev/null +++ b/fs/pkernfs/Kconfig @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config PKERNFS_FS + bool "Persistent Kernel filesystem (pkernfs)" + help + An in-memory filesystem on top of reserved memory specified via + pkernfs= cmdline argument. Used for storing kernel state and + userspace memory which is preserved across kexec to support + live update. diff --git a/fs/pkernfs/Makefile b/fs/pkernfs/Makefile new file mode 100644 index 000000000000..17258cb77f58 --- /dev/null +++ b/fs/pkernfs/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# Makefile for persistent kernel filesystem +# + +obj-$(CONFIG_PKERNFS_FS) += pkernfs.o diff --git a/fs/pkernfs/pkernfs.c b/fs/pkernfs/pkernfs.c new file mode 100644 index 000000000000..4c476ddc35b6 --- /dev/null +++ b/fs/pkernfs/pkernfs.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" +#include +#include +#include +#include +#include + +static phys_addr_t pkernfs_base, pkernfs_size; +static void *pkernfs_mem; +static const struct super_operations pkernfs_super_ops = { }; + +static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) +{ + struct inode *inode; + struct dentry *dentry; + struct pkernfs_sb *psb; + + pkernfs_mem = memremap(pkernfs_base, pkernfs_size, MEMREMAP_WB); + psb = (struct pkernfs_sb *) pkernfs_mem; + + if (psb->magic_number == PKERNFS_MAGIC_NUMBER) { + pr_info("pkernfs: Restoring from super block\n"); + } else { + pr_info("pkernfs: Clean super block; initialising\n"); + psb->magic_number = PKERNFS_MAGIC_NUMBER; + } + + sb->s_op = &pkernfs_super_ops; + + inode = new_inode(sb); + if (!inode) + return -ENOMEM; + + inode->i_ino = 1; + inode->i_mode = S_IFDIR; + inode->i_op = &simple_dir_inode_operations; + inode->i_fop = &simple_dir_operations; + inode->i_atime = inode->i_mtime = current_time(inode); + inode_set_ctime_current(inode); + /* directory inodes start off with i_nlink == 2 (for "." entry) */ + inc_nlink(inode); + + dentry = d_make_root(inode); + if (!dentry) + return -ENOMEM; + sb->s_root = dentry; + + return 0; +} + +static int pkernfs_get_tree(struct fs_context *fc) +{ + return get_tree_nodev(fc, pkernfs_fill_super); +} + +static const struct fs_context_operations pkernfs_context_ops = { + .get_tree = pkernfs_get_tree, +}; + +static int pkernfs_init_fs_context(struct fs_context *const fc) +{ + fc->ops = &pkernfs_context_ops; + return 0; +} + +static struct file_system_type pkernfs_fs_type = { + .owner = THIS_MODULE, + .name = "pkernfs", + .init_fs_context = pkernfs_init_fs_context, + .kill_sb = kill_litter_super, + .fs_flags = FS_USERNS_MOUNT, +}; + +static int __init pkernfs_init(void) +{ + int ret; + + ret = register_filesystem(&pkernfs_fs_type); + return ret; +} + +/** + * Format: pkernfs=: + * Just like: memmap=nn[KMG]!ss[KMG] + */ +static int __init parse_pkernfs_extents(char *p) +{ + pkernfs_size = memparse(p, &p); + p++; /* Skip over ! char */ + pkernfs_base = memparse(p, &p); + return 0; +} + +early_param("pkernfs", parse_pkernfs_extents); + +MODULE_ALIAS_FS("pkernfs"); +module_init(pkernfs_init); diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h new file mode 100644 index 000000000000..bd1e2a6fd336 --- /dev/null +++ b/fs/pkernfs/pkernfs.h @@ -0,0 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#define PKERNFS_MAGIC_NUMBER 0x706b65726e6673 +struct pkernfs_sb { + unsigned long magic_number; +}; From patchwork Mon Feb 5 12:01:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BA58C4828D for ; Mon, 5 Feb 2024 12:02:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E7666B0083; Mon, 5 Feb 2024 07:02:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6482D6B0085; Mon, 5 Feb 2024 07:02:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B0226B0087; Mon, 5 Feb 2024 07:02:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 263DD6B0083 for ; Mon, 5 Feb 2024 07:02:33 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E66BF120A26 for ; Mon, 5 Feb 2024 12:02:32 +0000 (UTC) X-FDA: 81757612944.03.8AAB631 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) by imf17.hostedemail.com (Postfix) with ESMTP id A3D614002A for ; Mon, 5 Feb 2024 12:02:30 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=bM0yuFRD; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf17.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134550; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DNmTqGFg0B4wwqijKuh2wndJnn1CFCRShUGGQJLOmFw=; b=aiiqURDpqUkmArPh8LebP7FnVgxEwXgKqBzrjv0dlcrrMYotC+/chizP66LcRKDdLKc3MX 4Z3R4MpKGfP3ssWlWZjRmcTLcDdgNj5VHKhMWlyZU33oYUFODaFdAWkrDjQUOgccNpcKP3 ewezg8fwlUbP4yt5FGUkpaOhHDO/Rh4= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=bM0yuFRD; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf17.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134550; a=rsa-sha256; cv=none; b=kXbfcTxrp6YOZMd2oc12LUq/BqLLgiZ85BQlIEwvvZ15oCX2+MN6e3/7reRklyyY8AMwhO VmX2EUgW24248uPd0VspYFMD9hw8tNIyrzYQyUIn53+k/qhk3UyvUvgPnOKEV5JS/Q5CaZ W2Z1Oh8KTgbKs4LKI0/LTNZjisueZeE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134551; x=1738670551; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DNmTqGFg0B4wwqijKuh2wndJnn1CFCRShUGGQJLOmFw=; b=bM0yuFRDBiWNT2MOOI8oSru9VTSquuCy+7MAXq5cFITXK1AuScxzfsxS PoXHDdxX6la0frLdQbgc0jbkHzdafN7C3DEbN7grCAQ5ZZrIoibZ/NhyK 7mrz91QndjUs6YFnOW/dtRsvfcj26ydc6sZ1+8dECmCdS32RRuKb2zsZZ k=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="271936637" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:02:28 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.43.254:3818] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.33.186:2525] with esmtp (Farcaster) id e767a7b8-4373-41b8-af32-a81e33b618aa; Mon, 5 Feb 2024 12:02:27 +0000 (UTC) X-Farcaster-Flow-ID: e767a7b8-4373-41b8-af32-a81e33b618aa Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC001.ant.amazon.com (10.252.51.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:26 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:20 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 02/18] pkernfs: Add persistent inodes hooked into directies Date: Mon, 5 Feb 2024 12:01:47 +0000 Message-ID: <20240205120203.60312-3-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D045UWA002.ant.amazon.com (10.13.139.12) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A3D614002A X-Stat-Signature: gzoi9z5frqhzjt7gjetwd7izb5e6f78q X-Rspam-User: X-HE-Tag: 1707134550-20112 X-HE-Meta: U2FsdGVkX1/bcz5RjaEy3gpiqMBaoYtHumB8BX0PJK876NILnwFmEaNe8FjVBuWyI6OdYBNSgnK2hOQMISbuL9SKJ8CSXtiTpp/b6L0UdFt0JPKmQk0Z6njW7ueaJ33BmbZ6kwuzLNURBrtCMurQznrk4tIMgYOSD8fRWNsOT/JNUwfGAy8GEXpvD0nNSmxnKX2QFze5sd1ECKNsLzjsBPW1crpUmFeEm6xlCctPk2L84TJcASreSlb36T+nTNjS+Y6mcANPtaqRSElIAYvpFWOT7cp8ejILgDGEJipR/45nO8oYaH0NPyppgZqS9G5Ht5i5f3s8Wzmc8N3xZZQE+obJDWOcYuqgk0zjWVVAH1oyOQpaR0w/BrCUkmMCD8XrHnJ8WAtYvpTUYsy15lvTSM+yOCZ3cdPoRSgqIQyG5jRoI/KG4UtfGefarZK0dsUblDKbkardwMk/0kQJFeFyD7IGEWh/PRd5tIE/cyF/y+bma0w4GDis7v/6pf8rLEpDldeGgcihIDgN/Uo52neEnEDJg7aLWw8Ex5VSdCQZfNOWcEM1pK/pz+udXdBNlCwkAT5lsCi7ZgWbnAZnzeKyADXVuO5TRtRERRgw/NE0Q4UAc5nYcbI03in3/FCD0KKQ2Li7TXlIrdb3Xg0Z0/y36SHMI4HrTILIzlY3D9X7odwr09/m010D7ynFeaFhzJ/W5Nproriu4wPaeiYZb/Dgahjo3Kh31hm37MzGvwovRFFc5JuDrZW/vAy86ztfcPX2yjWKO9a7zr/CAF25bUGxGC76TD4QI3V/9P2GYnNJj5q+DUyKrJbwiwbeZqs7dSYGngWmaVtJ+TLqOaKg3SyB2m750msG5LMrChGmCR7ONIv8tVExNOIYYDGvnVeHgUnq1puSZPztNPRD4rFRHj3ypjw3ihO94MqkQiuP48cdMwfL85LJ0MfECyU5tblIsZTAdu91vnDWVZ0uO4ZrC5P jT9GmegJ O4HhBjlcHJOAPi27Ay6oF6LYPX77EuxvpJsHzyrZKmikzEQvfCHlnVApxbfDrofuec8bQgq1aq+fvSZZ/YU+ZAy/MQXklA0//LTZiqCBq5ym7exVHTAKPni5419NMAcZ2BoburJcegodlEK6hzI3dHBEkywEMSaKsUGGUjTo9PanEs2+baVlHaVmaXK/kzj/OP4sOjpynZVR003cvcTS84Rl60Gq0ZbiUZJhsXcE2i1E88B+Ggu3dFw+BOaeYpyi9Z90R2od29zXIn+QtDqj8gi4q+itYe8zWdfqqdLil2IhjdYWw4RgGLjNVRqpFqv5vFXoLaqmP1a2ghn558cvBw6dmFnmkBkEAaKbuJsU6vppqAx8sqr21/+b688wlSgsa6fha3gzAdB34rbeTnXeJYJ2z4D0iC+0xDQ/l+XwbT/1wPa7L+bYNVBHtdetYxcWqwgRhfOhLyQ4eayYlXVFZmGDFHgwJ1fbVntKy9bym3xi8hHLyMq/FGZbkbsopMSSVEQwjjFUdSM0OMyuAq1fh+Ff3zyHVuvczfWE+bCqyagFeLN/qotKAtmk/ig== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add the ability to create inodes for files and directories inside directories. Inodes are persistent in the in-memory filesystem; the second 2 MiB is used as an "inode store." The inode store is one big array of struct pkernfs_inodes and they use a linked list to point to the next sibling inode or in the case of a directory the child inode which is the first inode in that directory. Free inodese are similarly maintained in a linked list with the first free inode being pointed to by the super block. Directory file_operations are added to support iterating through the content of a directory. Simiarly inode operations are added to support creating a file inside a directory. This allocate the next free inode and makes it the head of tthe "child inode" linked list for the directory. Unlink is implemented to remove an inode from the linked list. This is a bit finicky as it is done differently depending on whether the inode is the first child of a directory or somewhere later in the linked list. --- fs/pkernfs/Makefile | 2 +- fs/pkernfs/dir.c | 43 +++++++++++++ fs/pkernfs/inode.c | 148 +++++++++++++++++++++++++++++++++++++++++++ fs/pkernfs/pkernfs.c | 13 ++-- fs/pkernfs/pkernfs.h | 34 ++++++++++ 5 files changed, 234 insertions(+), 6 deletions(-) create mode 100644 fs/pkernfs/dir.c create mode 100644 fs/pkernfs/inode.c diff --git a/fs/pkernfs/Makefile b/fs/pkernfs/Makefile index 17258cb77f58..0a66e98bda07 100644 --- a/fs/pkernfs/Makefile +++ b/fs/pkernfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-$(CONFIG_PKERNFS_FS) += pkernfs.o +obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o dir.o diff --git a/fs/pkernfs/dir.c b/fs/pkernfs/dir.c new file mode 100644 index 000000000000..b10ce745f19d --- /dev/null +++ b/fs/pkernfs/dir.c @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" + +static int pkernfs_dir_iterate(struct file *dir, struct dir_context *ctx) +{ + struct pkernfs_inode *pkernfs_inode; + struct super_block *sb = dir->f_inode->i_sb; + + /* Indication from previous invoke that there's no more to iterate. */ + if (ctx->pos == -1) + return 0; + + if (!dir_emit_dots(dir, ctx)) + return 0; + + /* + * Just emitted this dir; go to dir contents. Use pos to smuggle + * the next inode number to emit across iterations. + * -1 indicates no valid inode. Can't use 0 because first loop has pos=0 + */ + if (ctx->pos == 2) { + ctx->pos = pkernfs_get_persisted_inode(sb, dir->f_inode->i_ino)->child_ino; + /* Empty dir case. */ + if (ctx->pos == 0) + ctx->pos = -1; + } + + while (ctx->pos > 1) { + pkernfs_inode = pkernfs_get_persisted_inode(sb, ctx->pos); + dir_emit(ctx, pkernfs_inode->filename, PKERNFS_FILENAME_LEN, + ctx->pos, DT_UNKNOWN); + ctx->pos = pkernfs_inode->sibling_ino; + if (!ctx->pos) + ctx->pos = -1; + } + return 0; +} + +const struct file_operations pkernfs_dir_fops = { + .owner = THIS_MODULE, + .iterate_shared = pkernfs_dir_iterate, +}; diff --git a/fs/pkernfs/inode.c b/fs/pkernfs/inode.c new file mode 100644 index 000000000000..f6584c8b8804 --- /dev/null +++ b/fs/pkernfs/inode.c @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" +#include + +const struct inode_operations pkernfs_dir_inode_operations; + +struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino) +{ + /* + * Inode index starts at 1, so -1 to get memory index. + */ + return ((struct pkernfs_inode *) (pkernfs_mem + PMD_SIZE)) + ino - 1; +} + +struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) +{ + struct inode *inode = iget_locked(sb, ino); + + /* If this inode is cached it is already populated; just return */ + if (!(inode->i_state & I_NEW)) + return inode; + inode->i_op = &pkernfs_dir_inode_operations; + inode->i_sb = sb; + inode->i_mode = S_IFREG; + unlock_new_inode(inode); + return inode; +} + +static unsigned long pkernfs_allocate_inode(struct super_block *sb) +{ + + unsigned long next_free_ino; + struct pkernfs_sb *psb = (struct pkernfs_sb *) pkernfs_mem; + + next_free_ino = psb->next_free_ino; + if (!next_free_ino) + return -ENOMEM; + psb->next_free_ino = + pkernfs_get_persisted_inode(sb, next_free_ino)->sibling_ino; + return next_free_ino; +} + +/* + * Zeroes the inode and makes it the head of the free list. + */ +static void pkernfs_free_inode(struct super_block *sb, unsigned long ino) +{ + struct pkernfs_sb *psb = (struct pkernfs_sb *) pkernfs_mem; + struct pkernfs_inode *inode = pkernfs_get_persisted_inode(sb, ino); + + memset(inode, 0, sizeof(struct pkernfs_inode)); + inode->sibling_ino = psb->next_free_ino; + psb->next_free_ino = ino; +} + +void pkernfs_initialise_inode_store(struct super_block *sb) +{ + /* Inode store is a PMD sized (ie: 2 MiB) page */ + memset(pkernfs_get_persisted_inode(sb, 1), 0, PMD_SIZE); + /* Point each inode for the next one; linked-list initialisation. */ + for (unsigned long ino = 2; ino * sizeof(struct pkernfs_inode) < PMD_SIZE; ino++) + pkernfs_get_persisted_inode(sb, ino - 1)->sibling_ino = ino; +} + +static int pkernfs_create(struct mnt_idmap *id, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl) +{ + unsigned long free_inode; + struct pkernfs_inode *pkernfs_inode; + struct inode *vfs_inode; + + free_inode = pkernfs_allocate_inode(dir->i_sb); + if (free_inode <= 0) + return -ENOMEM; + + pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, free_inode); + pkernfs_inode->sibling_ino = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino; + pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = free_inode; + strscpy(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN); + pkernfs_inode->flags = PKERNFS_INODE_FLAG_FILE; + + vfs_inode = pkernfs_inode_get(dir->i_sb, free_inode); + d_instantiate(dentry, vfs_inode); + return 0; +} + +static struct dentry *pkernfs_lookup(struct inode *dir, + struct dentry *dentry, + unsigned int flags) +{ + struct pkernfs_inode *pkernfs_inode; + unsigned long ino; + + pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino); + ino = pkernfs_inode->child_ino; + while (ino) { + pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, ino); + if (!strncmp(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN)) { + d_add(dentry, pkernfs_inode_get(dir->i_sb, ino)); + break; + } + ino = pkernfs_inode->sibling_ino; + } + return NULL; +} + +static int pkernfs_unlink(struct inode *dir, struct dentry *dentry) +{ + unsigned long ino; + struct pkernfs_inode *inode; + + ino = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino; + + /* Special case for first file in dir */ + if (ino == dentry->d_inode->i_ino) { + pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = + pkernfs_get_persisted_inode(dir->i_sb, dentry->d_inode->i_ino)->sibling_ino; + pkernfs_free_inode(dir->i_sb, ino); + return 0; + } + + /* + * Although we know exactly the inode to free, because we maintain only + * a singly linked list we need to scan for it to find the previous + * element so it's "next" pointer can be updated. + */ + while (ino) { + inode = pkernfs_get_persisted_inode(dir->i_sb, ino); + /* We've found the one pointing to the one we want to delete */ + if (inode->sibling_ino == dentry->d_inode->i_ino) { + inode->sibling_ino = + pkernfs_get_persisted_inode(dir->i_sb, + dentry->d_inode->i_ino)->sibling_ino; + pkernfs_free_inode(dir->i_sb, dentry->d_inode->i_ino); + break; + } + ino = pkernfs_get_persisted_inode(dir->i_sb, ino)->sibling_ino; + } + + return 0; +} + +const struct inode_operations pkernfs_dir_inode_operations = { + .create = pkernfs_create, + .lookup = pkernfs_lookup, + .unlink = pkernfs_unlink, +}; diff --git a/fs/pkernfs/pkernfs.c b/fs/pkernfs/pkernfs.c index 4c476ddc35b6..518c610e3877 100644 --- a/fs/pkernfs/pkernfs.c +++ b/fs/pkernfs/pkernfs.c @@ -8,7 +8,7 @@ #include static phys_addr_t pkernfs_base, pkernfs_size; -static void *pkernfs_mem; +void *pkernfs_mem; static const struct super_operations pkernfs_super_ops = { }; static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) @@ -24,23 +24,26 @@ static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) pr_info("pkernfs: Restoring from super block\n"); } else { pr_info("pkernfs: Clean super block; initialising\n"); + pkernfs_initialise_inode_store(sb); psb->magic_number = PKERNFS_MAGIC_NUMBER; + pkernfs_get_persisted_inode(sb, 1)->flags = PKERNFS_INODE_FLAG_DIR; + strscpy(pkernfs_get_persisted_inode(sb, 1)->filename, ".", PKERNFS_FILENAME_LEN); + psb->next_free_ino = 2; } sb->s_op = &pkernfs_super_ops; - inode = new_inode(sb); + inode = pkernfs_inode_get(sb, 1); if (!inode) return -ENOMEM; - inode->i_ino = 1; inode->i_mode = S_IFDIR; - inode->i_op = &simple_dir_inode_operations; - inode->i_fop = &simple_dir_operations; + inode->i_fop = &pkernfs_dir_fops; inode->i_atime = inode->i_mtime = current_time(inode); inode_set_ctime_current(inode); /* directory inodes start off with i_nlink == 2 (for "." entry) */ inc_nlink(inode); + inode_init_owner(&nop_mnt_idmap, inode, NULL, inode->i_mode); dentry = d_make_root(inode); if (!dentry) diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index bd1e2a6fd336..192e089b3151 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -1,6 +1,40 @@ /* SPDX-License-Identifier: GPL-2.0-only */ +#include + #define PKERNFS_MAGIC_NUMBER 0x706b65726e6673 +#define PKERNFS_FILENAME_LEN 255 + +extern void *pkernfs_mem; + struct pkernfs_sb { unsigned long magic_number; + /* Inode number */ + unsigned long next_free_ino; }; + +// If neither of these are set the inode is not in use. +#define PKERNFS_INODE_FLAG_FILE (1 << 0) +#define PKERNFS_INODE_FLAG_DIR (1 << 1) +struct pkernfs_inode { + int flags; + /* + * Points to next inode in the same directory, or + * 0 if last file in directory. + */ + unsigned long sibling_ino; + /* + * If this inode is a directory, this points to the + * first inode *in* that directory. + */ + unsigned long child_ino; + char filename[PKERNFS_FILENAME_LEN]; + int mappings_block; + int num_mappings; +}; + +void pkernfs_initialise_inode_store(struct super_block *sb); +struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino); +struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); + +extern const struct file_operations pkernfs_dir_fops; From patchwork Mon Feb 5 12:01:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545316 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AC13C4828D for ; Mon, 5 Feb 2024 12:03:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8923A6B0088; Mon, 5 Feb 2024 07:03:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 842C56B0089; Mon, 5 Feb 2024 07:03:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 70A8C6B008A; Mon, 5 Feb 2024 07:03:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 613396B0088 for ; Mon, 5 Feb 2024 07:03:03 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3856BA19B6 for ; Mon, 5 Feb 2024 12:03:03 +0000 (UTC) X-FDA: 81757614246.28.AC180BA Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by imf27.hostedemail.com (Postfix) with ESMTP id 41CF940020 for ; Mon, 5 Feb 2024 12:03:01 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b="iSaSySX/"; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf27.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134581; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=09HM9kCtJQ5fhByhNyO/r0il26Ag7KNxaSp1FSa3SKw=; b=1mi+xH2b8UB2d7Y0CWM7tyzKNQLPhqYfTxvBcPB1DV+bpmZTphuToBgOsCMy8ljIfRsEya n8qUUKYlVqQ4gzLDvEgoip1+CsDAui5WK5h2uYSrmTJkwk5wpVKtd+IWalVMTjRDmkcp+h n9XWAid3WOJyBbBO9uQ6Ax9+fCKe7Vs= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b="iSaSySX/"; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf27.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134581; a=rsa-sha256; cv=none; b=OL8BzANwge1zylE4DlHgXfNVgJOoQ3arPomSlstLEBQI6ZqrQQTQub1OwB7NmZ6s3wVwYF wPN5za9QUCXAKE3IZv5cPkRK8FBOlMmFhPSAWuz37UxDnNV5a7pb3ZjozxP5jq1hYqqSIp bS0vqqH1/KEP/CBcWuG65fwJPxnSrIw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134581; x=1738670581; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=09HM9kCtJQ5fhByhNyO/r0il26Ag7KNxaSp1FSa3SKw=; b=iSaSySX/2VHK2qQ/n1P9KE1HzOSCGKtuHHAM3/aaRwU6Zb63zzsyw1iG 2esS60m9X5pHDEKbrgThqcAljI3kYQXKLvawewkiwdV704mcegegX1SG+ isdSOpZHjALWdI5M/eEv1jjhv0S363Vd3sBtzj/8FblKhUscDpnDPrR9p g=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63755246" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:02:58 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.10.100:37383] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.33.186:2525] with esmtp (Farcaster) id 0e22be1f-cb32-4c25-b8b9-6a540783cffe; Mon, 5 Feb 2024 12:02:57 +0000 (UTC) X-Farcaster-Flow-ID: 0e22be1f-cb32-4c25-b8b9-6a540783cffe Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA001.ant.amazon.com (10.252.50.192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:57 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:50 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 03/18] pkernfs: Define an allocator for persistent pages Date: Mon, 5 Feb 2024 12:01:48 +0000 Message-ID: <20240205120203.60312-4-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D044UWB004.ant.amazon.com (10.13.139.134) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspamd-Queue-Id: 41CF940020 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 3yc1ghpbbyoxcm4dk7xes49bqgkppk3e X-HE-Tag: 1707134581-644027 X-HE-Meta: U2FsdGVkX18KNfVVSN6MTEWEubtu3o8x3uXjX0ppb8zv436qtIuqb/UoTIIMrIQHp10oQSXlNI8n0svBU+O1xHgaUeubl9bjwgkSanKs1tVJwGJZuTOhxDV4aeQlKeNyI0WINpPHw43E8H0QUXkIDob+jsrBMAB3AoHxJs6WO1kWs83u1Uq2ni2eq0J5Otfk7aGEXkYDKxMrjJHd+x2jf6zo22RKxPZhjDR1/fAG/OV3NXh7jse4tyF3Qeae8lNQiBNozIRZo/3NPakOyb1TVcv/xaHCVYgHLnqbKyw8nvesodZdQs7qbCycmiqCqpoqfivT8+lCVyrAUCOvwGp744xH4q7kTN2ILm3qmWrfiO6KLfY9EAn8jMKjWFyrPXWuKpAm7UiQqtauq1FLgpUQf4QwH3hKrfcv8fh4l2/sPJTBFRlhkTbHHFBBRRt8LQKibhkejan0xNlJx+QoBIRF/ECEYa4dFioNAZaRDPvZcnyGmH1UEtNXAyaIfeb/WfOoNl+b7EQXBNkKbfDrGEIdtwWWp1xVd87RULlv5h356Q0UFXzY6YYuY1iKh2P9VONpCBwFsCZvhOw0jerjNMCiKyoqrZ/S5wa7Q6ZMoFY8eZyqXt3/zRGgW6LoH1ZjD9DWUhgNCOZPHPx71vAn3r2QYRe1NfJtLCFors54LS+rIdYjQ2QF4UQYI7R2Td6mvFPBC1FjVTIVlEsJZ9gptV4KBx+Ev0HpEXkkPWHW2Vt2S9rVUnuhOZi5WAGhyJR+ufhsdCA1DxDS33lc+3JZJeH8zlPhvfcTzv8qgnqIDyg2Wkr29l53kBNJBYVlGYK9Uu13gZehyRwPeS8K9EOJZZc5bhXD8vOOyQ73wnb6aifn/LkNE6Iug4h4ekT1d+U5y41MeG7gdvSHwy0N8czP1J+Bp4Ir6T7BooIfUrXhNILS1Z5zFIelEkL/ypuItdgDwnK+O/UNxXz1kM4al5au5lr vStXlXM3 o2J7UMvz1p8CBegAN2pEFZ2YIGmZC5iCwgoSCaJ/aoCk4UWZrRm+j1Fpcd1qujcAAzxZ1nXagtPoTKGnx1OnKD/f6XVT8kjHahlD0RUgiOxONNzpk/v3UlgAaEdxyVAXhoipCaZ3vhfjn/2KR8A8QA0mhvkhZZZ8K55FJulTtkBrqgXgj8ryjLwKZlfkKBM/TrGJE4iczo7FdJBvUlv7eNX7cCUYWFHvFxcZU7hvThh7/1gymymUvaAe3gfcggj8uJEvaoPXfyBtdcIJ+UeYuhE3yN2Xq1SKRB3s8tp5L3suUuoM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This introduces the concept of a bitmap allocator for pages from the pkernfs filesystem. The allocation bitmap is stored in the second half of the first page. This imposes an artificial limit of the maximum size of the filesystem; this needs to be made extensible. The allocations can be zeroed, that's it so far. The next commit will add the ability to allocate and use it. --- fs/pkernfs/Makefile | 2 +- fs/pkernfs/allocator.c | 27 +++++++++++++++++++++++++++ fs/pkernfs/pkernfs.c | 1 + fs/pkernfs/pkernfs.h | 1 + 4 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 fs/pkernfs/allocator.c diff --git a/fs/pkernfs/Makefile b/fs/pkernfs/Makefile index 0a66e98bda07..d8b92a74fbc6 100644 --- a/fs/pkernfs/Makefile +++ b/fs/pkernfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o dir.o +obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o allocator.o dir.o diff --git a/fs/pkernfs/allocator.c b/fs/pkernfs/allocator.c new file mode 100644 index 000000000000..1d4aac9c4545 --- /dev/null +++ b/fs/pkernfs/allocator.c @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" + +/** + * For allocating blocks from the pkernfs filesystem. + * The first two blocks are special: + * - the first block is persitent filesystme metadata and + * a bitmap of allocated blocks + * - the second block is an array of persisted inodes; the + * inode store. + */ + +void *pkernfs_allocations_bitmap(struct super_block *sb) +{ + /* Allocations is 2nd half of first block */ + return pkernfs_mem + (1 << 20); +} + +void pkernfs_zero_allocations(struct super_block *sb) +{ + memset(pkernfs_allocations_bitmap(sb), 0, (1 << 20)); + /* First page is persisted super block and allocator bitmap */ + set_bit(0, pkernfs_allocations_bitmap(sb)); + /* Second page is inode store */ + set_bit(1, pkernfs_allocations_bitmap(sb)); +} diff --git a/fs/pkernfs/pkernfs.c b/fs/pkernfs/pkernfs.c index 518c610e3877..199c2c648bca 100644 --- a/fs/pkernfs/pkernfs.c +++ b/fs/pkernfs/pkernfs.c @@ -25,6 +25,7 @@ static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) } else { pr_info("pkernfs: Clean super block; initialising\n"); pkernfs_initialise_inode_store(sb); + pkernfs_zero_allocations(sb); psb->magic_number = PKERNFS_MAGIC_NUMBER; pkernfs_get_persisted_inode(sb, 1)->flags = PKERNFS_INODE_FLAG_DIR; strscpy(pkernfs_get_persisted_inode(sb, 1)->filename, ".", PKERNFS_FILENAME_LEN); diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index 192e089b3151..4655780f31f2 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -34,6 +34,7 @@ struct pkernfs_inode { }; void pkernfs_initialise_inode_store(struct super_block *sb); +void pkernfs_zero_allocations(struct super_block *sb); struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino); struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); From patchwork Mon Feb 5 12:01:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86813C48295 for ; Mon, 5 Feb 2024 12:03:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B76A6B007E; Mon, 5 Feb 2024 07:03:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1650B6B008A; Mon, 5 Feb 2024 07:03:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02D066B008C; Mon, 5 Feb 2024 07:03:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E4AEB6B007E for ; Mon, 5 Feb 2024 07:03:07 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3D49D160A9B for ; Mon, 5 Feb 2024 12:03:07 +0000 (UTC) X-FDA: 81757614414.08.1EC29E8 Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by imf27.hostedemail.com (Postfix) with ESMTP id 352DA4001C for ; Mon, 5 Feb 2024 12:03:04 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=tRoTa+6N; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf27.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134585; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Am0Sels7NAPtGwBi87Q194PvOn79+zUBtD9YBYxO8Vc=; b=sTXfwevUtblKeiJW8/AqbAdyzNdQCOlKMhxezXkQhPkG4w1c9cf1IKqTnTsml1bqs5qQ1s p0nAZQxcC0dEA5kuvMYeF8U+VLWis4IZnSkpUNkK7tPCeIZw7LRMO9rVMmlQ9SVomAe/Az 4CiC/khSR0A5o13O1UqJ1Dj71jJGJ9o= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=tRoTa+6N; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf27.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134585; a=rsa-sha256; cv=none; b=zSaiUcPtiiLcyV1gwwTFX5PYtEMlgLB03RwZog4jROoSBnsQojHo6189azVLI24zVWhV4m 5vt1uAOfb3riUJeDm1xA6O4iBpI15B2Q+4rnO95JaZx++l6nVU59FqlaExnmfUXBhO1XYA DHMOqknB0gfybH5d1CxqC33y2Anh18I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134585; x=1738670585; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Am0Sels7NAPtGwBi87Q194PvOn79+zUBtD9YBYxO8Vc=; b=tRoTa+6NKF1TTmvU+S524n3sBkweAW2lRFo7R2Nd1951ePXPtp5iKLkI M0jhxHM66+/G9zFDQb8jbGOHRnw4BC7qaB/dAmVt2ochRIriJ/P5l8dq+ h3vFgYBLenYFvBsUtg3gK+cEoMb5iUIcJO3Mto/Yu3XFaTNU3eegf9qFL c=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63755258" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:03:04 +0000 Received: from EX19MTAEUA002.ant.amazon.com [10.0.43.254:35084] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id a0419078-eb19-4b6b-acd7-2633051ba0ca; Mon, 5 Feb 2024 12:03:03 +0000 (UTC) X-Farcaster-Flow-ID: a0419078-eb19-4b6b-acd7-2633051ba0ca Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA002.ant.amazon.com (10.252.50.124) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:03 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:57 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 04/18] pkernfs: support file truncation Date: Mon, 5 Feb 2024 12:01:49 +0000 Message-ID: <20240205120203.60312-5-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D044UWB004.ant.amazon.com (10.13.139.134) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspamd-Queue-Id: 352DA4001C X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 1xhrpi4xcj9s31shjyhca4h975x75rsb X-HE-Tag: 1707134584-921586 X-HE-Meta: U2FsdGVkX19hBk2IGaB9MxzZJc5zose11N6ykeGTSbKQ9FlMomJolB4iXPO8Kty/ZYMbyJ4n2audFqkAU/ugh/jFdgbXJVbC1pQTYmjOvdabRUebBLY/4+8b7U3LiOhSQeVcg5O4UIcILcQxWmpQuD05FVc4UJ9ZEkQ6OVpzOXmXf5wA8b1UqJsd6tWb0jCX33t/HZHVz3Nh4H54FkeyQXGP6brTPlpb5YIQe27rHb2Z4YYC7erJmVuKjQtn3oerYQhqp08AV/t+Kf2OD8B0G66X+Q1XzrXSGeM5FtTqiOJaNUNvl+XYDAsBvmL1eq+i2XmktC7MA6hVnxUwKD7tk1lPnosPzjvuFJqNASLxG9iIJO0cZr9DxJlCrx7YzOIU0Oiozdlrhzwl+anm3clZWjTS9SSubronpFbV+qnfEpVQopTAmBJLpOtLexHxgMYhkn81wpWL0H2w/6B1KvCmAJVsVm6ChFnDmNTshgPCQPd0UYtL2iO/gJD8QK4YliaefK8ztVtMlVneaXxnodQ+5GQAGJv8NGszmoN0ppTVZI+Ncbct2rDNTuZj5HTafLym/AVGWzCHK+9A7+upp7XMN5FdKwqI0ZqsR1/0FrdHJsGB1mEFWUXIlcp82znVwRqzmCMcDzkGobnZkAZ9FzFPReyLit4wx/cCrZ/aGu9ruxROum4NwLMcY1/ZTabPdZr8SVIBgsODfH7GKSz8NKIF7wkODiM7rui6yiD2Vu7eSN/45jRXRRpAF6AaT+OEPNnpySM58M2A3C51w4z+1sstLc3RHNJxDXO7NCxAtxraZG/GPkP/UgIQ5a5wGvofP77Xwk67J70ipr2j0CzaJeB7L2fQZVvs/ZZ3S+upDOGC3aeo2piU5J/hRDc8sxgT4LDdub7vcVFY0dwKbSkc5YJbPJCfKgb3KfNw/H6MS0u74tvKTC81tw/CqUorXvauRceLCHs7aBPsjAul+6mbrOQ FE3O/we+ 1PvTg3qJmzuKuSwLKsMZFvUTniyr19FjmXukQbTXbOLty5pFQGzcnmMPAVyvmwmJP5vcshZZeRRZ7pX1ga1nHpCWhYW18qX9WTE1gyt4DobdamLvjLyqqc2nDXopa4oDWywav8p1Z5jbLowlY6fYlqBWiVcC6AzXzMEeh8ha5g/T+DXIBnpLJyVxB9Gm2Jmvm5uEySANw2B+0DT9tRhH217sVev5SgOYKGrUzEWOLG8JNNXzIm6fmyYf9KaRfMBjPlpAi6ptLrl8E39uI1ls8BeXJpHYNwQgrz9ubflyvSu3DZqE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In the previous commit a block allocator was added. Now use that block allocator to allocate blocks for files when ftruncate is run on them. To do that a inode_operations is added on the file inodes with a getattr callback handling the ATTR_SIZE attribute. When this is invoked pages are allocated, the indexes of which are put into a mappings block. The mappings block is an array with the index being the file offset block and the value at that index being the pkernfs block backign that file offset. --- fs/pkernfs/Makefile | 2 +- fs/pkernfs/allocator.c | 24 +++++++++++++++++++ fs/pkernfs/file.c | 53 ++++++++++++++++++++++++++++++++++++++++++ fs/pkernfs/inode.c | 27 ++++++++++++++++++--- fs/pkernfs/pkernfs.h | 7 ++++++ 5 files changed, 109 insertions(+), 4 deletions(-) create mode 100644 fs/pkernfs/file.c diff --git a/fs/pkernfs/Makefile b/fs/pkernfs/Makefile index d8b92a74fbc6..e41f06cc490f 100644 --- a/fs/pkernfs/Makefile +++ b/fs/pkernfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o allocator.o dir.o +obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o allocator.o dir.o file.o diff --git a/fs/pkernfs/allocator.c b/fs/pkernfs/allocator.c index 1d4aac9c4545..3905ce92b4a9 100644 --- a/fs/pkernfs/allocator.c +++ b/fs/pkernfs/allocator.c @@ -25,3 +25,27 @@ void pkernfs_zero_allocations(struct super_block *sb) /* Second page is inode store */ set_bit(1, pkernfs_allocations_bitmap(sb)); } + +/* + * Allocs one 2 MiB block, and returns the block index. + * Index is 2 MiB chunk index. + */ +unsigned long pkernfs_alloc_block(struct super_block *sb) +{ + unsigned long free_bit; + + /* Allocations is 2nd half of first page */ + void *allocations_mem = pkernfs_allocations_bitmap(sb); + free_bit = bitmap_find_next_zero_area(allocations_mem, + PMD_SIZE / 2, /* Size */ + 0, /* Start */ + 1, /* Number of zeroed bits to look for */ + 0); /* Alignment mask - none required. */ + bitmap_set(allocations_mem, free_bit, 1); + return free_bit; +} + +void *pkernfs_addr_for_block(struct super_block *sb, int block_idx) +{ + return pkernfs_mem + (block_idx * PMD_SIZE); +} diff --git a/fs/pkernfs/file.c b/fs/pkernfs/file.c new file mode 100644 index 000000000000..27a637423178 --- /dev/null +++ b/fs/pkernfs/file.c @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" + +static int truncate(struct inode *inode, loff_t newsize) +{ + unsigned long free_block; + struct pkernfs_inode *pkernfs_inode; + unsigned long *mappings; + + pkernfs_inode = pkernfs_get_persisted_inode(inode->i_sb, inode->i_ino); + mappings = (unsigned long *)pkernfs_addr_for_block(inode->i_sb, + pkernfs_inode->mappings_block); + i_size_write(inode, newsize); + for (int block_idx = 0; block_idx * PMD_SIZE < newsize; ++block_idx) { + free_block = pkernfs_alloc_block(inode->i_sb); + if (free_block <= 0) + /* TODO: roll back allocations. */ + return -ENOMEM; + *(mappings + block_idx) = free_block; + ++pkernfs_inode->num_mappings; + } + return 0; +} + +static int inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *iattr) +{ + struct inode *inode = dentry->d_inode; + int error; + + error = setattr_prepare(idmap, dentry, iattr); + if (error) + return error; + + if (iattr->ia_valid & ATTR_SIZE) { + error = truncate(inode, iattr->ia_size); + if (error) + return error; + } + setattr_copy(idmap, inode, iattr); + mark_inode_dirty(inode); + return 0; +} + +const struct inode_operations pkernfs_file_inode_operations = { + .setattr = inode_setattr, + .getattr = simple_getattr, +}; + +const struct file_operations pkernfs_file_fops = { + .owner = THIS_MODULE, + .iterate_shared = NULL, +}; diff --git a/fs/pkernfs/inode.c b/fs/pkernfs/inode.c index f6584c8b8804..7fe4e7b220cc 100644 --- a/fs/pkernfs/inode.c +++ b/fs/pkernfs/inode.c @@ -15,14 +15,28 @@ struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int in struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) { + struct pkernfs_inode *pkernfs_inode; struct inode *inode = iget_locked(sb, ino); /* If this inode is cached it is already populated; just return */ if (!(inode->i_state & I_NEW)) return inode; - inode->i_op = &pkernfs_dir_inode_operations; + pkernfs_inode = pkernfs_get_persisted_inode(sb, ino); inode->i_sb = sb; - inode->i_mode = S_IFREG; + if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_DIR) { + inode->i_op = &pkernfs_dir_inode_operations; + inode->i_mode = S_IFDIR; + } else { + inode->i_op = &pkernfs_file_inode_operations; + inode->i_mode = S_IFREG; + inode->i_fop = &pkernfs_file_fops; + } + + inode->i_atime = inode->i_mtime = current_time(inode); + inode_set_ctime_current(inode); + set_nlink(inode, 1); + + /* Switch based on file type */ unlock_new_inode(inode); return inode; } @@ -79,6 +93,8 @@ static int pkernfs_create(struct mnt_idmap *id, struct inode *dir, pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = free_inode; strscpy(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN); pkernfs_inode->flags = PKERNFS_INODE_FLAG_FILE; + pkernfs_inode->mappings_block = pkernfs_alloc_block(dir->i_sb); + memset(pkernfs_addr_for_block(dir->i_sb, pkernfs_inode->mappings_block), 0, (2 << 20)); vfs_inode = pkernfs_inode_get(dir->i_sb, free_inode); d_instantiate(dentry, vfs_inode); @@ -90,6 +106,7 @@ static struct dentry *pkernfs_lookup(struct inode *dir, unsigned int flags) { struct pkernfs_inode *pkernfs_inode; + struct inode *vfs_inode; unsigned long ino; pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino); @@ -97,7 +114,10 @@ static struct dentry *pkernfs_lookup(struct inode *dir, while (ino) { pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, ino); if (!strncmp(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN)) { - d_add(dentry, pkernfs_inode_get(dir->i_sb, ino)); + vfs_inode = pkernfs_inode_get(dir->i_sb, ino); + mark_inode_dirty(dir); + dir->i_atime = current_time(dir); + d_add(dentry, vfs_inode); break; } ino = pkernfs_inode->sibling_ino; @@ -146,3 +166,4 @@ const struct inode_operations pkernfs_dir_inode_operations = { .lookup = pkernfs_lookup, .unlink = pkernfs_unlink, }; + diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index 4655780f31f2..8b4fee8c5b2e 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -34,8 +34,15 @@ struct pkernfs_inode { }; void pkernfs_initialise_inode_store(struct super_block *sb); + void pkernfs_zero_allocations(struct super_block *sb); +unsigned long pkernfs_alloc_block(struct super_block *sb); struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino); +void *pkernfs_addr_for_block(struct super_block *sb, int block_idx); + struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); + extern const struct file_operations pkernfs_dir_fops; +extern const struct file_operations pkernfs_file_fops; +extern const struct inode_operations pkernfs_file_inode_operations; From patchwork Mon Feb 5 12:01:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545318 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71869C4828D for ; Mon, 5 Feb 2024 12:03:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F23996B0092; Mon, 5 Feb 2024 07:03:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ED1E16B0093; Mon, 5 Feb 2024 07:03:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D74346B0095; Mon, 5 Feb 2024 07:03:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C438E6B0092 for ; Mon, 5 Feb 2024 07:03:15 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A6E8C140A7F for ; Mon, 5 Feb 2024 12:03:15 +0000 (UTC) X-FDA: 81757614750.29.3451A5A Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) by imf17.hostedemail.com (Postfix) with ESMTP id 269DB40009 for ; Mon, 5 Feb 2024 12:03:12 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=Bo62Hq4w; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf17.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.156 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134593; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0/oASfY5H1JyciZlNLWYHif+Zbb5czst+5Ds70M97DI=; b=ck4AG3tC9Z9dcgudZn2OaftXdiDZ3a3xrPcSHWGMPnRW+CqaVZPWN0GyLMBm3yQFtpdhXZ /yA2bRYgvivOmVfOXOWcTz65seldsy7DWkRgmIpshZGtdBDWF71KqQQBAhWupTUAd45shX W3LY93agwCp/DjbelgtGHcpvknASRIQ= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=Bo62Hq4w; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf17.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.156 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134593; a=rsa-sha256; cv=none; b=dKtqya5SnJvmdisbBb7z9wpbvv4cmJmHy6ougp3snOs4B/PBlhpPGr1SiQmf+ISFdhOkcT MOBMB67Wee5xpZIgRxeOTjrkjC39A+cBFpJIOPO3dbQuVjyuDDp/FPit5L2cFt2ULlW46O 9DQpLyYe/ggN9dn9chqnbLjk0vWwCS0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134594; x=1738670594; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0/oASfY5H1JyciZlNLWYHif+Zbb5czst+5Ds70M97DI=; b=Bo62Hq4wZkiJjgZ4E8Art4Q5jS9hXwFz/gPSBAwIZ/MeOo9pcaoKWuCH OxyO1VwyVw+B4tTI0M7vt3288m2mDKFdDtq56Wc3Eh3GaorD92eDgQ2vV BU8RTrl0enJ4PYciaw55awgS/kEHlLTXXPMfWqugIF6gTWf+ytZap3iuJ g=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="632119672" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:03:11 +0000 Received: from EX19MTAEUB001.ant.amazon.com [10.0.17.79:5859] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.31.207:2525] with esmtp (Farcaster) id 80b5446f-c47d-4e7f-ae7e-e8668aadf81b; Mon, 5 Feb 2024 12:03:10 +0000 (UTC) X-Farcaster-Flow-ID: 80b5446f-c47d-4e7f-ae7e-e8668aadf81b Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB001.ant.amazon.com (10.252.51.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:10 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:03 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 05/18] pkernfs: add file mmap callback Date: Mon, 5 Feb 2024 12:01:50 +0000 Message-ID: <20240205120203.60312-6-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D044UWB004.ant.amazon.com (10.13.139.134) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 269DB40009 X-Stat-Signature: ihpow9a569fbha94gnjpck7uaxrinoe7 X-Rspam-User: X-HE-Tag: 1707134592-85510 X-HE-Meta: U2FsdGVkX19/jMZoLiWK4v2gezb4NkgOvZAY8oKOnlEKauXwmBbiH/SADTma7rjsFyJhTTYKyuPtsOe6osfT/Q21mOKvVyUSqDkJu/63upC4lJIr7EIp5jGKU0iW5MRc38nuEr7257ugw0dwZhyYs05t6+FlPQ1vK4w905E4hxawbENUrkwBOjd9uu3k5zdwKDTctPid+x5o41UPW6SJm8ccuutNPIpCjTkobhwPfQvZzc4nCUkjRuHGDQsN0h4Gdg0YkqjXAwUzKhda2g6wmb5WoxfK1xKZGE5TvsCTYPRhmIwonvIEx94BiS2bhxJ8PeYGsrTcPSSSmCzw5JHognFuQVinXWGDVo3xgOfQY4HowtD9wtUP1rZUnhue88b2If9P+bIaJrAVeB2QVjTbErR9DVNxjudiVdXnCPPqJdOUDxk8yu82qiJ2Tdzec8y/rYyUbzw5pxCXLRgiix3VG45w0BvjWQk9HXenTy1xsD3Wci/ypAHSCKNQlsEGP9hdvWDkXBm14nD7OT5smEPiGDYZY7LIRLEv12NV0z7Tw9ezdoNjYwatNDvMvtvOYtuDKHjYlhybAiwKXJyINeevBlzYcTLB7f59vkX3fBoAW9ICd5gS8yIzIAIoWnExn3Sq0QrXk/gQY5EQzflZRxkGCZRA6Wu3odavDQ24qEflul1DFeNYMVFAMNlmL7P7I4quSU9OuCeb6oJHgKytn2PWQ2MA68vQNauZuk4HmCpedzU3GOpAg/7Vz5c6fz+qj18c40JZ50X74nNWuECia8wpie57p1hsZQs0DH/ev4UqOJ+7WOq7DFhDY9rLlg41BL/PcHSdNCKeIkSFKgk5sIj4dwFAoPaTI+86raYblFhiyxCKjdFbdfl7dKvDs3N3PvYt6Ix9w+QZrUdS/cDIyvO9zGVFhz3r7R/YOlF+enC5d11L4K/we72/YdqNYF90mIji7E7c1jTNe99WTa0FaiH smR0AQPp LRK0+fU2Rymo3oA9pUQTHG+Qd+ULL88tI5dB+nkSEzLDpllS4WNnZA0rwg74GKvCtT8ENEC+QM45JxO2YfFB+Ta6yxrMLHGfspISx7RUT8as1v8M7Wd0Icu4LGpIbK/ZDgC6s00ZkmzeInnVOStoKTiY5Ix9tT0PDYQY5vE7+/WlBU1ZB6Xo3bzEPFROQnbNUkHExCPfFWkDMkapyH2r5jpOEpQAAWlXA5esRRE0R3RgdGEnY75/ze44uCehLqT0KKrooh/v2TaUwffonmyCm6w53n9OjeWDmA5P6m5mEemZfS4PUNqLROiBen5eyEj6LO6BNioBa5RXS5qRzoYf9SClvcCSdwId2KC+3bJWtvKg1WoHO/jzXXGJvj6IC9GcE1h4C1XczMJjWY9aNaZ9M8+NcOqqHmRmMAzFX1zKZJWtqed/SA21L7w0FpahAKhjXXW8gRQi9e/NLHu1tp4mgIT9dy9M/mKM22PmXDSIA/l09yeKYuAqTUVMBy34thv7Ze3TCULCjmFY32cD1MOKcx9Z9BOX7dn6czncuVQKQJVKca48= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Make the file data useable to userspace by adding mmap. That's all that QEMU needs for guest RAM, so that's all be bother implementing for now. When mmaping the file the VMA is marked as PFNMAP to indicate that there are no struct pages for the memory in this VMA. Remap_pfn_range() is used to actually populate the page tables. All PTEs are pre-faulted into the pgtables at mmap time so that the pgtables are useable when this virtual address range is given to VFIO's MAP_DMA. --- fs/pkernfs/file.c | 42 +++++++++++++++++++++++++++++++++++++++++- fs/pkernfs/pkernfs.c | 2 +- fs/pkernfs/pkernfs.h | 2 ++ 3 files changed, 44 insertions(+), 2 deletions(-) diff --git a/fs/pkernfs/file.c b/fs/pkernfs/file.c index 27a637423178..844b6cc63840 100644 --- a/fs/pkernfs/file.c +++ b/fs/pkernfs/file.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only #include "pkernfs.h" +#include static int truncate(struct inode *inode, loff_t newsize) { @@ -42,6 +43,45 @@ static int inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct return 0; } +/* + * To be able to use PFNMAP VMAs for VFIO DMA mapping we need the page tables + * populated with mappings. Pre-fault everything. + */ +static int mmap(struct file *filp, struct vm_area_struct *vma) +{ + int rc; + unsigned long *mappings_block; + struct pkernfs_inode *pkernfs_inode; + + pkernfs_inode = pkernfs_get_persisted_inode(filp->f_inode->i_sb, filp->f_inode->i_ino); + + mappings_block = (unsigned long *)pkernfs_addr_for_block(filp->f_inode->i_sb, + pkernfs_inode->mappings_block); + + /* Remap-pfn-range will mark the range VM_IO */ + for (unsigned long vma_addr_offset = vma->vm_start; + vma_addr_offset < vma->vm_end; + vma_addr_offset += PMD_SIZE) { + int block, mapped_block; + + block = (vma_addr_offset - vma->vm_start) / PMD_SIZE; + mapped_block = *(mappings_block + block); + /* + * It's wrong to use rempa_pfn_range; this will install PTE-level entries. + * The whole point of 2 MiB allocs is to improve TLB perf! + * We should use something like mm/huge_memory.c#insert_pfn_pmd + * but that is currently static. + * TODO: figure out the best way to install PMDs. + */ + rc = remap_pfn_range(vma, + vma_addr_offset, + (pkernfs_base >> PAGE_SHIFT) + (mapped_block * 512), + PMD_SIZE, + vma->vm_page_prot); + } + return 0; +} + const struct inode_operations pkernfs_file_inode_operations = { .setattr = inode_setattr, .getattr = simple_getattr, @@ -49,5 +89,5 @@ const struct inode_operations pkernfs_file_inode_operations = { const struct file_operations pkernfs_file_fops = { .owner = THIS_MODULE, - .iterate_shared = NULL, + .mmap = mmap, }; diff --git a/fs/pkernfs/pkernfs.c b/fs/pkernfs/pkernfs.c index 199c2c648bca..f010c2d76c76 100644 --- a/fs/pkernfs/pkernfs.c +++ b/fs/pkernfs/pkernfs.c @@ -7,7 +7,7 @@ #include #include -static phys_addr_t pkernfs_base, pkernfs_size; +phys_addr_t pkernfs_base, pkernfs_size; void *pkernfs_mem; static const struct super_operations pkernfs_super_ops = { }; diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index 8b4fee8c5b2e..1a7aa783a9be 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -6,6 +6,8 @@ #define PKERNFS_FILENAME_LEN 255 extern void *pkernfs_mem; +/* Units of bytes */ +extern phys_addr_t pkernfs_base, pkernfs_size; struct pkernfs_sb { unsigned long magic_number; From patchwork Mon Feb 5 12:01:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2290C4828D for ; Mon, 5 Feb 2024 12:03:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D4316B0071; Mon, 5 Feb 2024 07:03:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 584A16B007D; Mon, 5 Feb 2024 07:03:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44CB46B0095; Mon, 5 Feb 2024 07:03:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 31A646B0071 for ; Mon, 5 Feb 2024 07:03:48 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 12C9540A1A for ; Mon, 5 Feb 2024 12:03:48 +0000 (UTC) X-FDA: 81757616136.08.B5DAA2C Received: from smtp-fw-9106.amazon.com (smtp-fw-9106.amazon.com [207.171.188.206]) by imf30.hostedemail.com (Postfix) with ESMTP id DE87380025 for ; Mon, 5 Feb 2024 12:03:45 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=ME9B1obe; spf=pass (imf30.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 207.171.188.206 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134626; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GuAtgRmAwOIaihgufqnss82d+hBkVwofjU/pF2m60/o=; b=uDxmjgdxgExJq1N/UHVYYmLjztFXk+x3yq1GJvgI4qUh1QgZuIYt0PZO6vo+HTzJM4O3FM cK7Ur+gnuDBX1Sanur67BW/mpJNoo5iYtkb0jscmDKk+yI8njEOq73htPRvEw65jvokwV0 B6jGg+jM11XF1AVVgyGGw/qVmlKOoWw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134626; a=rsa-sha256; cv=none; b=bpmCfj2MGVL/GQeVHZKdKEcPYAmQbKGKFjHfT3pyAQkvKu7s2AgtH6B7zD6PeMU7jGAZHs HyEke1r5MwoxE/NxPycdZW+yteexhOJ5iCkseb/fRJEoUaGyharCNetFYKQYnfBI/ZwRDA wJQVe5M0SSXHfJFXXRwKK8iOmgi6W9U= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=ME9B1obe; spf=pass (imf30.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 207.171.188.206 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com"; dmarc=pass (policy=quarantine) header.from=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134627; x=1738670627; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GuAtgRmAwOIaihgufqnss82d+hBkVwofjU/pF2m60/o=; b=ME9B1obeUmQM9MkN6Op0qWCk6d1TisM3ZFjvsc+fV9xRk+cP71AItSw2 UWeUy7Ba6irWTxoYF9pWq124a7w42680MvPgAscD49ZauKwN1amktWRQZ 475YFAiHs/RHR3W8EvTtn8GsN166o933cjfp3nsQkypmRQgIImSRXNUgt 8=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="702146151" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9106.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:03:45 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.43.254:55484] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.32.190:2525] with esmtp (Farcaster) id 54deb5d5-b17f-4ae2-b6be-2dc346f855b1; Mon, 5 Feb 2024 12:03:43 +0000 (UTC) X-Farcaster-Flow-ID: 54deb5d5-b17f-4ae2-b6be-2dc346f855b1 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:40 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:33 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 06/18] init: Add liveupdate cmdline param Date: Mon, 5 Feb 2024 12:01:51 +0000 Message-ID: <20240205120203.60312-7-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D033UWC001.ant.amazon.com (10.13.139.218) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspamd-Queue-Id: DE87380025 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: yi4bhqsw1k8td7eh47pk9s64xbs9qbhx X-HE-Tag: 1707134625-383314 X-HE-Meta: U2FsdGVkX1/Fk+52ENJCBngGMMkJypxiAoO6PnfFCH1VeAf9jirbMK1Md0ptZOtZeonrwcy4NXjv/03tBqm/cQlQR51BtPMMsuKd8dSwcpZSp24Wr0hW/GxrNTyK3S91iN2qdX4h2ctR4CQoVxvt/QCbr/L8E1diTw/UZa41oj4OUaNo5tYLK8A0BP3rXX1cbbDJSvq9WR22n4MGubScc2RlJFI0gu5FcJTMXaVdXw+eLVgmLKIIoQXak9cYSdaZBimViDPgFvUvwpwqh3gHSHTkY/WskyeJSwduVIzwEqgx03yoTz936r++1KbNbeSPWzlaE4X0MkdJnxJ/zywxHR+GVg7XDcKen/CkjibJsDHhb/IlAmj/qz/70LibN+LthRsFcwWBdg9nIfvK+76nMktAcXMzdtRWsH626NKTXUicd1FnQWGnXrdISRJI230v6KnQlBaB44wMFhzod+VeGvcXS323poJF+NPhBFBQL/kp+eEtJZ7DRSk1kkp3jhKZq1fEg5hJ9gAiESLs2XPQx4tkVltBaJto79SH/gZM1/Fu7kzZRWv433ahc9rV/OVzmVKmbooYhy9F4eXPlnunZ0+DOx0RpN3sNGrZ4Am0hoStFRFhNhZxpgG3yPwa6mfkWW2bmGfHqULU78i78LwTWiVonGL9XXWxyucHoHTuW4WiRT7xMX+7r5VbcrupRi4HyK0UVolP8vQtqrcp+OlRx6rgh3jln0UlcCfRenkPS8fuxod9J/e37bxRkD1YtFuESjN7kz5KMWZvHYagRnKs6kWziiUAqjCsswJT91Ly6sN1Hmq+1LTLNQx/+mCRT1G9IJiYKn5MzR6PdBZq9dvHT5nm3O4BZJmR2SR8W+GfIInpKHpUhSRH7+JGynIG8QKB4gqCJFjR/n71rdGbncCcx1tkc+4dmgdKyG+TfB2wY/G+Gt8eXVL5qNPSJU/pqeh9FP6ZMn9qVqi7vaSIEGA SUNurbtw SKpLZbgLtT3Nb11wYZDc1EKtjIUNGudecORUWBGGVI/TDvj9HLJvt0JpNh+pwBakHwR1lR++UNED9AgbaPsrztmSFXd09dMPRHlqTlEKyeImMGQ2cgRYR5vWx6VMSHPWngcHlc7jEV2LeLnHjVZjDdcU6bnyTr6W1iKDAbUuLjrq81jzYT9Ckl4mBpXgl2D38hY0TuBgko/dJHfDFk7QW2rncPgb4eEPxZrIPvBWIe5Zvps3VHU6iqxts8WWCrUiKt+Dw9vVUYYzu3CK4HIXLLygcQkw1qcVXybbf1ncb77a02Zyz1dDaEyvixnVVxkciCKdZ8CkI+wjpa/ew7y/Q91LgJxhTQJ969+0tJPXUp67P7gnKdESssAMZ1okJaGJhHYbUzOXMAn/C36Vja0MgzYD8MwpT3JNzZWTPWjANSotygSgbspWGCQIFP/0ANBV9GILfULUQIl/m5Dh8ASsfEbWU22kcH6aE+MYpkzdlc6spbzotgBHNIIhArxckKiQ3H5ZKudpI0s5nybyTwx7I1jKEkS0pgfMNQpbJyi2uwoSmAx0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This will allow other subsystems to know when we're going a LU and hence when they should be restoring rather than reinitialising state. --- include/linux/init.h | 1 + init/main.c | 10 ++++++++++ 2 files changed, 11 insertions(+) diff --git a/include/linux/init.h b/include/linux/init.h index 266c3e1640d4..d7c68c7bfaf0 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -146,6 +146,7 @@ extern int do_one_initcall(initcall_t fn); extern char __initdata boot_command_line[]; extern char *saved_command_line; extern unsigned int saved_command_line_len; +extern bool liveupdate; extern unsigned int reset_devices; /* used by init/main.c */ diff --git a/init/main.c b/init/main.c index e24b0780fdff..7807a56c3473 100644 --- a/init/main.c +++ b/init/main.c @@ -165,6 +165,16 @@ static char *ramdisk_execute_command = "/init"; bool static_key_initialized __read_mostly; EXPORT_SYMBOL_GPL(static_key_initialized); +bool liveupdate __read_mostly; +EXPORT_SYMBOL(liveupdate); + +static int __init set_liveupdate(char *param) +{ + liveupdate = true; + return 0; +} +early_param("liveupdate", set_liveupdate); + /* * If set, this is an indication to the drivers that reset the underlying * device before going ahead with the initialization otherwise driver might From patchwork Mon Feb 5 12:01:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545354 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94529C4828D for ; Mon, 5 Feb 2024 12:04:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7C996B0096; Mon, 5 Feb 2024 07:04:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A8DF56B0098; Mon, 5 Feb 2024 07:04:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9072E6B0099; Mon, 5 Feb 2024 07:04:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7E07F6B0096 for ; Mon, 5 Feb 2024 07:04:00 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 27FC6120A3C for ; Mon, 5 Feb 2024 12:04:00 +0000 (UTC) X-FDA: 81757616640.19.9039895 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by imf30.hostedemail.com (Postfix) with ESMTP id C99608001C for ; Mon, 5 Feb 2024 12:03:57 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=oVzsA6nA; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf30.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 207.171.184.29 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134638; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TzDhoCjRqYjwB6Ud3BxBgq+3tlmLodP3ZHF9TRx+87E=; b=eV8tXhqfIZSZyIm8TEcH6cZpTT8hfBTPHE9GM+MHA48DNTohchktKzpC6PsO9CBmH/P0Qu fCAUlcrlExz9ywdlTHZuOUOlr307OZeWK5RQmy81XXQpT1uIv+C5V+hWSGSA5ViF1j6Y9E VdPhE/eQOxVPnTWXKxfh7ed5T1Mg8pc= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=oVzsA6nA; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf30.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 207.171.184.29 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134638; a=rsa-sha256; cv=none; b=SeK3G3LihF6/AwCjRVb59rahHDkmP7Nqwc6CZN29XmfGmaFvQXRFLSRxCntQ7FBjIdxfyc JnrJTswjJJWUBuJfTk13VGk2FbIN9h8JlZY3njbGB0rXt+5dy1gE4DA2cRw4aWTj2S8ygB k4OkTdjgp1fkAk6UfDotP+5AQszOA44= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134638; x=1738670638; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TzDhoCjRqYjwB6Ud3BxBgq+3tlmLodP3ZHF9TRx+87E=; b=oVzsA6nAcfsX2N9xvk+a6q/VX6J3iUMxPcWj8ZReekVCI1QcTwQpPgY1 7pufPRraCq/vpoLSpm9GWvVjHX/gwtpQk3b260sTs+RCcqKuFfnY5SWJX KPF0eYZXHHULCJq1p4kiXiEsmaqrWgvxagCOCHboX6GAzuQ7jC2dxTgOG Q=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="394883262" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:03:52 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.17.79:26775] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.8.155:2525] with esmtp (Farcaster) id 37d56f4d-adef-4cf1-bb52-948bac8bacd3; Mon, 5 Feb 2024 12:03:50 +0000 (UTC) X-Farcaster-Flow-ID: 37d56f4d-adef-4cf1-bb52-948bac8bacd3 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:46 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:40 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 07/18] pkernfs: Add file type for IOMMU root pgtables Date: Mon, 5 Feb 2024 12:01:52 +0000 Message-ID: <20240205120203.60312-8-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D033UWC001.ant.amazon.com (10.13.139.218) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspamd-Queue-Id: C99608001C X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: eeg5da196rw1s4md8jbs46tkxueeywza X-HE-Tag: 1707134637-966864 X-HE-Meta: U2FsdGVkX1/CFJ9rqOtiriqv/ChdHsACV2MQi/rwD0nWnJ16Z+mmNfPdWu1pqzOJMKUSxL/A4rgzLLzpolylArDJCVcmn1gF4/e2sN2qCAm/Jxc+IQ0eclR5wH7JcpW4i1MgucJHSjjMOjJ6WmUdwaG/iCX/821ZJCU5w9CKXmuQvCPa3IJ5G6gkFeZC+9psuc/bEcVszfx8RGznvEAMkasuqRBNljbD6MfdTTyQoCMpR7FuUEK4T8mk7gJCzX4OcHqiQhF/AOfDAexJC4vRkYdiHy7y6X2DKvNsAN3jvkzgfoa12DqTIZNBr4Bw6KzAA/jV8saBnicQ0QSQSQR51iPT/cNRLqUKKvCO8N4/Mw1+jtmunyKeV6XnJF8e1RtTfq9bQPUooHRZ0p3VgjfU/l+iXZE9IfUkcFeYSMkoQnhrdUm8MW/JjJjd4J3v9EpAofPcSU+Yy7VOdFZmZ2aLyVmueasbFCWd7VJZVkRxQ7AUoIrC+UnLlZv0qkJ+x29niwLFAGNr1Ngm5GOuBoe/PBOpsODjibF+VlRShtdV8K4lejNCGrxvrIimlFTfCLpmN2feLrT2ZHEALfUYlhoPBmA95QcB8qJs9Yj6ip+ntjIdvgIN5b3o85io3L6KRHkOFMB76quW7lzD+FU8fXznHz0oT/1XQ4SQ7lxSt5Y1EkQ0P/XIN+gOzeTwMsYFOAQJWgkqtabt2adaT9Qcg7YV2rIhU82f6/c5yBXeVwnA+abVehLkcz/thNWv3G+SQjyzYUqCh3xoYIC+2W7Nu0Rkg49Ac9GDuJGoEfiYkGTFqK2COXsVJ1VX3P72LR4I4LR9eTFTeQtiXeLn4DdoHWYDnDMG+GvXOqenpLJuHjd/G8KNaudpC5QvzQwGKEOQfcbgHDii9SC7e6oI4a9G9h9qgmtFNRQXqts2+GUGbOZlZ7B3Ee89CgHhXklbdKPF6v/WBLR843MQbuz3p3ijTgT PACXmbAT f2770L1POgLDMKBvpIGnWt1TSDiEzXlQAntO2L/udWVwUutTOq5B6HbvSxO4AJ5weyXbhueNQE6QY/3w644knB5xDGbRysveeyyfF/CmItjdhtE9J6sntjkECJvwxnV52w9heDVYjTg/1d364RuvmYJv4jZ7zrrxhf/VboTaOePFMPd4uW8+UjG3+ystpE6tMKEGYtHIaeur+PElopHn2qAi9+1xAqcWHA2Ze3OBl7RgJru8vrR8CfFBG6PBKKu66VLuNK30Ngtmv2kyp2+YHozlZbXPF/OoLh6guXk/tjhNgSnufJ9qvJi1nK9+FFTQiKzsMEVBihquGDFAsBjduA7jrOTS5R12uMWErEDKfL802PbRnvqZum008j0IgODadx1ad4qvBEAt1Q0pbr7lS02CeymkaI/Bi1oTzIVAyr1DgZaVqi1Yzyymq6vWOI47WJjqwjhArcGWg5IJEnfFDlfeg+7nbIRBWC68yyDNshWzwwMPcDEdA6IQQNVNgQPN61xIUB4oxgzrxyvfD3wNsRoHMrQnNmNdx7jpcCa3wkvTeExE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: So far pkernfs is able to hold regular files for userspace to mmap and in which store persisted data. Now begin the IOMMU integration for persistent IOMMU pgtables. A new type of inode is created for an IOMMU data directory. A new type of inode is also created for a file which holds the IOMMU root pgtables. The inode types are specified by flags on the inodes. Different inode ops are also registed on the IOMMU pgtables file to ensure that userspace can't access it. These IOMMU directory and data inodes are created lazily: pkernfs_alloc_iommu_root_pgtables() scans for these and returns them if they already exist (ie: after kexec) or creates them if they don't exist (ie: cold boot). The data in the IOMMU root pgtables file needs to be accessible early in system boot: before filesystems are initialised and before anything is mounted. To support this the pkernfs initialisation code is split out into an pkernfs_init() function which is responsible for making the pkernfs memory available. Here the filesystem abstraction starts to creak: the pkernfs functions responsible for the IOMMU pgtables files manipulated persisted inodes directly. It may be preferable to somehow get pkernfs mounted early in system boot before it's needed by the IOMMU so that filesystem paths can be used, but it is unclear if that's possible. The need for super blocks in the pkernfs functions has been limited so far, super blocks are barely used because the pkernfs extents are stored as global variables in pkernfs.c. Now NULLs are actually supplied to functions which take a super block. This is also not pretty and this code should probably rather be plumbing some sort of wrapper around the persisted super block which would allow supporting multiple mount moints. Additionally, the memory backing the IOMMU root pgtable file is mapped into the direct map by registering it as a device. This is needed because the IOMMU does phys_to_virt in a few places when traversing the pgtables so the direct map virtual address should be populated. The alternative would be to replace all of the phy_to_virt calls in the IOMMU driver with wrappers which understand if the phys_addr is part of a pkernfs file. The next commit will use this pkernfs file for root pgtables. --- fs/pkernfs/Makefile | 2 +- fs/pkernfs/inode.c | 17 +++++-- fs/pkernfs/iommu.c | 98 +++++++++++++++++++++++++++++++++++++++++ fs/pkernfs/pkernfs.c | 38 ++++++++++------ fs/pkernfs/pkernfs.h | 7 +++ include/linux/pkernfs.h | 36 +++++++++++++++ 6 files changed, 181 insertions(+), 17 deletions(-) create mode 100644 fs/pkernfs/iommu.c create mode 100644 include/linux/pkernfs.h diff --git a/fs/pkernfs/Makefile b/fs/pkernfs/Makefile index e41f06cc490f..7f0f7a4cd3a1 100644 --- a/fs/pkernfs/Makefile +++ b/fs/pkernfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o allocator.o dir.o file.o +obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o allocator.o dir.o file.o iommu.o diff --git a/fs/pkernfs/inode.c b/fs/pkernfs/inode.c index 7fe4e7b220cc..1d712e0a82a1 100644 --- a/fs/pkernfs/inode.c +++ b/fs/pkernfs/inode.c @@ -25,11 +25,18 @@ struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) inode->i_sb = sb; if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_DIR) { inode->i_op = &pkernfs_dir_inode_operations; + inode->i_fop = &pkernfs_dir_fops; inode->i_mode = S_IFDIR; - } else { + } else if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_FILE) { inode->i_op = &pkernfs_file_inode_operations; - inode->i_mode = S_IFREG; inode->i_fop = &pkernfs_file_fops; + inode->i_mode = S_IFREG; + } else if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_IOMMU_DIR) { + inode->i_op = &pkernfs_iommu_dir_inode_operations; + inode->i_fop = &pkernfs_dir_fops; + inode->i_mode = S_IFDIR; + } else if (pkernfs_inode->flags | PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES) { + inode->i_mode = S_IFREG; } inode->i_atime = inode->i_mtime = current_time(inode); @@ -41,7 +48,7 @@ struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) return inode; } -static unsigned long pkernfs_allocate_inode(struct super_block *sb) +unsigned long pkernfs_allocate_inode(struct super_block *sb) { unsigned long next_free_ino; @@ -167,3 +174,7 @@ const struct inode_operations pkernfs_dir_inode_operations = { .unlink = pkernfs_unlink, }; +const struct inode_operations pkernfs_iommu_dir_inode_operations = { + .lookup = pkernfs_lookup, +}; + diff --git a/fs/pkernfs/iommu.c b/fs/pkernfs/iommu.c new file mode 100644 index 000000000000..5bce8146d7bb --- /dev/null +++ b/fs/pkernfs/iommu.c @@ -0,0 +1,98 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" +#include + + +void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) +{ + unsigned long *mappings_block_vaddr; + unsigned long inode_idx; + struct pkernfs_inode *iommu_pgtables, *iommu_dir = NULL; + int rc; + + pkernfs_init(); + + /* Try find a 'iommu' directory */ + inode_idx = pkernfs_get_persisted_inode(NULL, 1)->child_ino; + while (inode_idx) { + if (!strncmp(pkernfs_get_persisted_inode(NULL, inode_idx)->filename, + "iommu", PKERNFS_FILENAME_LEN)) { + iommu_dir = pkernfs_get_persisted_inode(NULL, inode_idx); + break; + } + inode_idx = pkernfs_get_persisted_inode(NULL, inode_idx)->sibling_ino; + } + + if (!iommu_dir) { + unsigned long root_pgtables_ino = 0; + unsigned long iommu_dir_ino = pkernfs_allocate_inode(NULL); + + iommu_dir = pkernfs_get_persisted_inode(NULL, iommu_dir_ino); + strscpy(iommu_dir->filename, "iommu", PKERNFS_FILENAME_LEN); + iommu_dir->flags = PKERNFS_INODE_FLAG_IOMMU_DIR; + + /* Make this the head of the list. */ + iommu_dir->sibling_ino = pkernfs_get_persisted_inode(NULL, 1)->child_ino; + pkernfs_get_persisted_inode(NULL, 1)->child_ino = iommu_dir_ino; + + /* Add a child file for pgtables. */ + root_pgtables_ino = pkernfs_allocate_inode(NULL); + iommu_pgtables = pkernfs_get_persisted_inode(NULL, root_pgtables_ino); + strscpy(iommu_pgtables->filename, "root-pgtables", PKERNFS_FILENAME_LEN); + iommu_pgtables->sibling_ino = iommu_dir->child_ino; + iommu_dir->child_ino = root_pgtables_ino; + iommu_pgtables->flags = PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES; + iommu_pgtables->mappings_block = pkernfs_alloc_block(NULL); + /* TODO: make alloc zero. */ + memset(pkernfs_addr_for_block(NULL, iommu_pgtables->mappings_block), 0, (2 << 20)); + } else { + inode_idx = iommu_dir->child_ino; + while (inode_idx) { + if (!strncmp(pkernfs_get_persisted_inode(NULL, inode_idx)->filename, + "root-pgtables", PKERNFS_FILENAME_LEN)) { + iommu_pgtables = pkernfs_get_persisted_inode(NULL, inode_idx); + break; + } + inode_idx = pkernfs_get_persisted_inode(NULL, inode_idx)->sibling_ino; + } + } + + /* + * For a pkernfs region block, the "mappings_block" field is still + * just a block index, but that block doesn't actually contain mappings + * it contains the pkernfs_region data + */ + + mappings_block_vaddr = (unsigned long *)pkernfs_addr_for_block(NULL, + iommu_pgtables->mappings_block); + set_bit(0, mappings_block_vaddr); + pkernfs_region->vaddr = mappings_block_vaddr; + pkernfs_region->paddr = pkernfs_base + (iommu_pgtables->mappings_block * PMD_SIZE); + pkernfs_region->bytes = PMD_SIZE; + + dev_set_name(&pkernfs_region->dev, "iommu_root_pgtables"); + rc = device_register(&pkernfs_region->dev); + if (rc) + pr_err("device_register failed: %i\n", rc); + + pkernfs_region->pgmap.range.start = pkernfs_base + + (iommu_pgtables->mappings_block * PMD_SIZE); + pkernfs_region->pgmap.range.end = + pkernfs_region->pgmap.range.start + PMD_SIZE - 1; + pkernfs_region->pgmap.nr_range = 1; + pkernfs_region->pgmap.type = MEMORY_DEVICE_GENERIC; + pkernfs_region->vaddr = + devm_memremap_pages(&pkernfs_region->dev, &pkernfs_region->pgmap); + pkernfs_region->paddr = pkernfs_base + + (iommu_pgtables->mappings_block * PMD_SIZE); +} + +void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr) +{ + if (WARN_ON(paddr >= region->paddr + region->bytes)) + return NULL; + if (WARN_ON(paddr < region->paddr)) + return NULL; + return region->vaddr + (paddr - region->paddr); +} diff --git a/fs/pkernfs/pkernfs.c b/fs/pkernfs/pkernfs.c index f010c2d76c76..2e8c4b0a5807 100644 --- a/fs/pkernfs/pkernfs.c +++ b/fs/pkernfs/pkernfs.c @@ -11,12 +11,14 @@ phys_addr_t pkernfs_base, pkernfs_size; void *pkernfs_mem; static const struct super_operations pkernfs_super_ops = { }; -static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) +void pkernfs_init(void) { - struct inode *inode; - struct dentry *dentry; + static int inited; struct pkernfs_sb *psb; + if (inited++) + return; + pkernfs_mem = memremap(pkernfs_base, pkernfs_size, MEMREMAP_WB); psb = (struct pkernfs_sb *) pkernfs_mem; @@ -24,13 +26,21 @@ static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) pr_info("pkernfs: Restoring from super block\n"); } else { pr_info("pkernfs: Clean super block; initialising\n"); - pkernfs_initialise_inode_store(sb); - pkernfs_zero_allocations(sb); + pkernfs_initialise_inode_store(NULL); + pkernfs_zero_allocations(NULL); psb->magic_number = PKERNFS_MAGIC_NUMBER; - pkernfs_get_persisted_inode(sb, 1)->flags = PKERNFS_INODE_FLAG_DIR; - strscpy(pkernfs_get_persisted_inode(sb, 1)->filename, ".", PKERNFS_FILENAME_LEN); + pkernfs_get_persisted_inode(NULL, 1)->flags = PKERNFS_INODE_FLAG_DIR; + strscpy(pkernfs_get_persisted_inode(NULL, 1)->filename, ".", PKERNFS_FILENAME_LEN); psb->next_free_ino = 2; } +} + +static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) +{ + struct inode *inode; + struct dentry *dentry; + + pkernfs_init(); sb->s_op = &pkernfs_super_ops; @@ -77,12 +87,9 @@ static struct file_system_type pkernfs_fs_type = { .fs_flags = FS_USERNS_MOUNT, }; -static int __init pkernfs_init(void) +static int __init pkernfs_fs_init(void) { - int ret; - - ret = register_filesystem(&pkernfs_fs_type); - return ret; + return register_filesystem(&pkernfs_fs_type); } /** @@ -97,7 +104,12 @@ static int __init parse_pkernfs_extents(char *p) return 0; } +bool pkernfs_enabled(void) +{ + return !!pkernfs_base; +} + early_param("pkernfs", parse_pkernfs_extents); MODULE_ALIAS_FS("pkernfs"); -module_init(pkernfs_init); +module_init(pkernfs_fs_init); diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index 1a7aa783a9be..e1b7ae3fe7f1 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0-only */ #include +#include #define PKERNFS_MAGIC_NUMBER 0x706b65726e6673 #define PKERNFS_FILENAME_LEN 255 @@ -18,6 +19,8 @@ struct pkernfs_sb { // If neither of these are set the inode is not in use. #define PKERNFS_INODE_FLAG_FILE (1 << 0) #define PKERNFS_INODE_FLAG_DIR (1 << 1) +#define PKERNFS_INODE_FLAG_IOMMU_DIR (1 << 2) +#define PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES (1 << 3) struct pkernfs_inode { int flags; /* @@ -31,20 +34,24 @@ struct pkernfs_inode { */ unsigned long child_ino; char filename[PKERNFS_FILENAME_LEN]; + /* Block index for where the mappings live. */ int mappings_block; int num_mappings; }; void pkernfs_initialise_inode_store(struct super_block *sb); +void pkernfs_init(void); void pkernfs_zero_allocations(struct super_block *sb); unsigned long pkernfs_alloc_block(struct super_block *sb); struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino); void *pkernfs_addr_for_block(struct super_block *sb, int block_idx); +unsigned long pkernfs_allocate_inode(struct super_block *sb); struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); extern const struct file_operations pkernfs_dir_fops; extern const struct file_operations pkernfs_file_fops; extern const struct inode_operations pkernfs_file_inode_operations; +extern const struct inode_operations pkernfs_iommu_dir_inode_operations; diff --git a/include/linux/pkernfs.h b/include/linux/pkernfs.h new file mode 100644 index 000000000000..0110e4784109 --- /dev/null +++ b/include/linux/pkernfs.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: MIT */ + +#ifndef _LINUX_PKERNFS_H +#define _LINUX_PKERNFS_H + +#include +#include + +#ifdef CONFIG_PKERNFS_FS +extern bool pkernfs_enabled(void); +#else +static inline bool pkernfs_enabled(void) +{ + return false; +} +#endif + +/* + * This is a light wrapper around the data behind a pkernfs + * file. Really it should be a file but the filesystem comes + * up too late: IOMMU needs root pgtables before fs is up. + */ +struct pkernfs_region { + void *vaddr; + unsigned long paddr; + unsigned long bytes; + struct dev_pagemap pgmap; + struct device dev; +}; + +void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region); +void pkernfs_alloc_page_from_region(struct pkernfs_region *pkernfs_region, + void **vaddr, unsigned long *paddr); +void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr); + +#endif /* _LINUX_PKERNFS_H */ From patchwork Mon Feb 5 12:01:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545353 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77A8CC48291 for ; Mon, 5 Feb 2024 12:04:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18EEE6B0095; Mon, 5 Feb 2024 07:04:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 166926B0096; Mon, 5 Feb 2024 07:04:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02E246B0098; Mon, 5 Feb 2024 07:03:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E60BA6B0095 for ; Mon, 5 Feb 2024 07:03:59 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AA40BA0A75 for ; Mon, 5 Feb 2024 12:03:59 +0000 (UTC) X-FDA: 81757616598.30.17B0D23 Received: from smtp-fw-80009.amazon.com (smtp-fw-80009.amazon.com [99.78.197.220]) by imf26.hostedemail.com (Postfix) with ESMTP id 7E179140002 for ; Mon, 5 Feb 2024 12:03:57 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=ptSRFBg0; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf26.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.220 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134637; a=rsa-sha256; cv=none; b=nKezBeS7ptKciWnQGf1YAqhIyYXTKhC4ebNcjiotGTQAuji9pvP27SWRBd4g3XXRXeKM29 gJYw1FZ4SFVcSMkyr6T2zSNIF3VdMJbCbQuHTOCO/b2P9cFPPlN8doevqOvcYti7ts+0EN YMMS0L4rumfsvycuNeIZOT69f+DNqHs= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=ptSRFBg0; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf26.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.220 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134637; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iY0J0IZ5qTqLG1CI5oIteOfffrC6hlfw7IE0MCx8M3k=; b=taeFBKoCHXKdQfGzwcREYF+hRhhBSkcUXTFfsAkSKLyBehjajh0hU5qt/IOEg32mihp/cc Dv7GtiJEWIF2w/1b2fvr9ojwlFZE1G79mv5bF2QGBC7BJOZpUoqFDtWlm7+7TgrbiOEY8T 58Ws7jB5RQoMBELlvObzA/EHzjjpriY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134638; x=1738670638; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iY0J0IZ5qTqLG1CI5oIteOfffrC6hlfw7IE0MCx8M3k=; b=ptSRFBg0C/qJvi/Fz97DIkGMS40huEmuwtGGN1sPJK6E9721GxqS63RF F0aTAEQJJJ3WIm4T+Xa3kzPxIvzaAi0Y0k/NNAb9r4RljJGn9xD2FhP18 KjZkvtWO2n3QfyvPkdzi13eI133/3pAxeeqCjkKLHe7KCGSqPZYhIS3Jl U=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63724432" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:03:55 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.10.100:51867] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id 0b48539d-2334-4d72-bccb-1f938dcbdb04; Mon, 5 Feb 2024 12:03:53 +0000 (UTC) X-Farcaster-Flow-ID: 0b48539d-2334-4d72-bccb-1f938dcbdb04 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:53 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:46 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 08/18] iommu: Add allocator for pgtables from persistent region Date: Mon, 5 Feb 2024 12:01:53 +0000 Message-ID: <20240205120203.60312-9-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D033UWC001.ant.amazon.com (10.13.139.218) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7E179140002 X-Stat-Signature: n7nnp9bkc4ksu5n8it56pf4nfouas3ug X-HE-Tag: 1707134637-178402 X-HE-Meta: U2FsdGVkX1/aYxT4EZLjs+qnVdJMdRqEmdB7UTW9rV8WLne9bcnVoYLlISfYIuEFWqKG36HwNKjNQwE5YqXtcDz/QWhSAJwW8P1ASMzS7hnFo9dYUQJka6BPlbqmYUQ8Ezk6DUHv35ccVrGr0xgFv0Z18erHiRC01h60ul2fv6K+bS75wunxQbYzLtVNedB1m5CpkzSQBzrtY2EUCJmcpH07iiMlpOHBlQSOSNDzj7/WMyJBHgT8mJQ1YlkSIouRbqoZaIvmJ1MRrz7dBQBpkn3vrDePlecnaR6jj5P8r/DMhjd0sf3n/jf2bff0pbTvZ85GT0MWubMydc2ZZysJfwYA2VrwXtVXchWQytRxV5WpnNMVOLdDSqBitcELxH3RWBIvSOsnywuPvYYgcYPL4YpekHo+P5CwJCFuxHK2kfSGUSr3X4MxItyQNXjPGUZXkJuzi/8W7t7/xhsuHTgN2XN0CHnCyCusJtwIzK4XezMmpAI4HDRyNcojcLr+g7yEDwFAI8k02DCziFO1RSNUmEZXSEsjeopNhJmGaw85UlOISuLcPFCdngaYUay+urry6YmFYsqbw8xdUe2SSrTq2Mp55fhj3U3neJH3qWvfQqy649xZeJsySRpdQKl9LNQQcTFL9A6Wm8EjZQYZTpnc60Bbk7ce/daR4neHwsiXnKqx/MP+xNpKdnd2uUiQrKsV8Jqv0jKUWx0aaWS+T+TIOFDAaWcjRcNjt3joQDVBkfNeYxaw67DMM1PD5dcp6BR2kQdg7gq4oJVTJBHlcsLMTbw2dsL03cd8SmYRDEqXk+MEdb7YyNciwbKVBP0n4PuhyOgH8H+6RG2X9335TBTLqqTYTF6G2sa02bz6BovK3RQPVNygLr0NYwPwbaeb+gz2N7fP827E0FtSrbdax+77Zphhe4fIB19teG0xcGo7jimBlHs5QIam4lIC6VezpSey71AyhuJyatoNZy3hz/Z +zpr0gUA ovHl0e8E02jJbwSikwMHvkRcLBl3bZj7ORygVieFrGsvycWk+XOE2Pk/jMC2jl/w/dOdku86mmr8E1x03jjnZ+xYawKrbmnCkBVFw3UzbBuarLXEFjXU6WpDV9is64P/zlZP+jtI5UT8vICO5jXnNt7OjSDXUiW6G9CLtkjxZ8pUk/lhz/gy/Zb7FCI6JB01nPIaMFMVSVzhicx3HnPd+QG8jKxaGu/XHFVg7FmE+JWB+vyKQLvjXBBu4x8VNMA3VAgWFAuahAqv7T9GUS0jaqWM7BQcP81Tp1Eoey4vPgSNKABAxpjGKExzuJtTsPxKW7gQptO4QUTItVk3J5f5HWm7FTG81iIh70jr2o5fnqwY9+ruGbztop6VmNNmeVHOD4n4kx8IF5/zyPmPLji2nOAe1rv1s85IyGx/61QnrCYheRmUs3JTlcTGFdhX0d9PSJICnjh6AgCEfZkmphpxfHsJXZy1oI6ccgOA8Eo/ZCsENZBx30i19HZfMFQg/KBQ2epIf746f1h4+nXytQSQUQLFXfIKCyXYcKWjUguD8zyZJIL8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The specific IOMMU drivers will need to ability to allocate pages from a pkernfs IOMMU pgtable file for their pgtables. Also, the IOMMU drivers will need to ability to consistent get the same page for the root PGD page - add a specific function to get this PGD "root" page. This is different to allocating regular pgtable pages because the exact same page needs to be *restored* after kexec into the pgd pointer on the IOMMU domain struct. To support this sort of allocation the pkernfs region is treated as an array of 512 4 KiB pages, the first of which is an allocation bitmap. --- drivers/iommu/Makefile | 1 + drivers/iommu/pgtable_alloc.c | 36 +++++++++++++++++++++++++++++++++++ drivers/iommu/pgtable_alloc.h | 9 +++++++++ 3 files changed, 46 insertions(+) create mode 100644 drivers/iommu/pgtable_alloc.c create mode 100644 drivers/iommu/pgtable_alloc.h diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 769e43d780ce..cadebabe9581 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 obj-y += amd/ intel/ arm/ iommufd/ +obj-y += pgtable_alloc.o obj-$(CONFIG_IOMMU_API) += iommu.o obj-$(CONFIG_IOMMU_API) += iommu-traces.o obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o diff --git a/drivers/iommu/pgtable_alloc.c b/drivers/iommu/pgtable_alloc.c new file mode 100644 index 000000000000..f0c2e12f8a8b --- /dev/null +++ b/drivers/iommu/pgtable_alloc.c @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pgtable_alloc.h" +#include + +/* + * The first 4 KiB is the bitmap - set the first bit in the bitmap. + * Scan bitmap to find next free bits - it's next free page. + */ + +void iommu_alloc_page_from_region(struct pkernfs_region *region, void **vaddr, unsigned long *paddr) +{ + int page_idx; + + page_idx = bitmap_find_free_region(region->vaddr, 512, 0); + *vaddr = region->vaddr + (page_idx << PAGE_SHIFT); + if (paddr) + *paddr = region->paddr + (page_idx << PAGE_SHIFT); +} + + +void *pgtable_get_root_page(struct pkernfs_region *region, bool liveupdate) +{ + /* + * The page immediately after the bitmap is the root page. + * It would be wrong for the page to be allocated if we're + * NOT doing a liveupdate, or for a liveupdate to happen + * with no allocated page. Detect this mismatch. + */ + if (test_bit(1, region->vaddr) ^ liveupdate) { + pr_err("%sdoing a liveupdate but root pg bit incorrect", + liveupdate ? "" : "NOT "); + } + set_bit(1, region->vaddr); + return region->vaddr + PAGE_SIZE; +} diff --git a/drivers/iommu/pgtable_alloc.h b/drivers/iommu/pgtable_alloc.h new file mode 100644 index 000000000000..c1666a7be3d3 --- /dev/null +++ b/drivers/iommu/pgtable_alloc.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#include +#include + +void iommu_alloc_page_from_region(struct pkernfs_region *region, + void **vaddr, unsigned long *paddr); + +void *pgtable_get_root_page(struct pkernfs_region *region, bool liveupdate); From patchwork Mon Feb 5 12:01:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545355 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70237C48291 for ; Mon, 5 Feb 2024 12:04:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0812F6B0081; Mon, 5 Feb 2024 07:04:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 008D66B009A; Mon, 5 Feb 2024 07:04:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E12966B009B; Mon, 5 Feb 2024 07:04:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CF4F36B0081 for ; Mon, 5 Feb 2024 07:04:37 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9A3EC40A1C for ; Mon, 5 Feb 2024 12:04:37 +0000 (UTC) X-FDA: 81757618194.12.941DD94 Received: from smtp-fw-9105.amazon.com (smtp-fw-9105.amazon.com [207.171.188.204]) by imf22.hostedemail.com (Postfix) with ESMTP id 88716C0024 for ; Mon, 5 Feb 2024 12:04:35 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=hy6XIzRS; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf22.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 207.171.188.204 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134675; a=rsa-sha256; cv=none; b=X9Q0N2zRzBPpC36Y9PBzZxv22VdAeyvgPqonTgHYVEcdRwpM1GdRDogWwKjncfF4uJ5iT/ SBW1j4Op4/Uhd+ZXWHDO/VfUxf7pC3fM9lEf0VdWAKDs2YlaiweNs69la4s9Oe/+6MA7TE XhAJiPGAJMpI/qrhF/jNHt0e0NtVrJs= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=hy6XIzRS; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf22.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 207.171.188.204 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134675; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TxdyyN106bvAzVLll6ZOpOdKPjlViDjzUv1y30GOivQ=; b=cgPDrfC6oIF6bS5JxzUlQPe8e4sGXEA4QTtBtG0D0c2MN9KvVvEXlJmJQTJrOk7tZSzbXL ihVJnC7cpV5/vVzOX7QT86MZRsAvaEdEFFobs58TSQWeS4DExrNTAq2LKHfcEfUHGcdi91 groa8mG2xKWZOyutUgwKxC0sIkq7CmA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134675; x=1738670675; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TxdyyN106bvAzVLll6ZOpOdKPjlViDjzUv1y30GOivQ=; b=hy6XIzRScBH6TrmT7Oet0qjFGlbrDm9xVjz9St+6P9stWHiQ7/575o3A cL/90b2XJrWkWy50z+JirX0k5NxbuFPMennPaoyU8W68p6ihmtTO78OKo 0cCYfQc352c7SPIiQvg51/C87UguzBLH5F+LQ7VozBNfPTTq2VWGc41Le 8=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="702759804" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9105.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:04:26 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.10.100:58296] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.14.129:2525] with esmtp (Farcaster) id e132c2ad-2d34-4ccb-890d-621a7e0c08cd; Mon, 5 Feb 2024 12:04:25 +0000 (UTC) X-Farcaster-Flow-ID: e132c2ad-2d34-4ccb-890d-621a7e0c08cd Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:23 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:17 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 09/18] intel-iommu: Use pkernfs for root/context pgtable pages Date: Mon, 5 Feb 2024 12:01:54 +0000 Message-ID: <20240205120203.60312-10-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D041UWA001.ant.amazon.com (10.13.139.124) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 88716C0024 X-Stat-Signature: xukqs65mjjscthgmb9cjctcoxo6zw3wp X-HE-Tag: 1707134675-980747 X-HE-Meta: U2FsdGVkX19pgiXELRSH+Ml3BCAaZl6FQGvVtQUMzxA5wNg4LcPaH3uQn3tfn3pXjNNwRhIOP8oeHiLUNaYUsgx9hZz7MPmxf4Im6c3zjeNCcsSTxAb0F5qhaf3De8m5eSWO6GEDOc4mECMaIi7FLAtEaOPt1KiSJ+mCioOwyb3X+qTBKpipgGb6p2/8W8ly/TV9X2ZafTWUtHwoV9ECo9vwDuP5r/DCtNZiroDqoIUrSk7h9lRpGhUkMWp3dooxieMYKA1QmYGt0YSBhRamyy42i0fDWg+DSrezEaMxsx748rRQ1jhSsSwZrNmySRgsVoSV3L3lrMmpbt5FQWAZfoEs4Vl3EMKFnaWru6aucNfY6cnzFAKJgxwl8nkLasQc824+kbbpKQUmHg+66dCTsr3c7L7nd5BOZYbW5sAxjdsWYaiyMagJK4Bm3GUT4Yo7jqaMoGmInM1Vh0Eb9e2ImLW+q4l60IDpMVRfeYRklCXJB6PY5Gb87L+cvR3kIhtXXHgqkqeRjnXHzPpJuLsISXfNxPKw46NIuTrrXAof2K35FiA5//Na29MBq4ziTmYwR0WptaRlJ6dYSFfGU4/s1gj4US9nwOHtFEO878M2d1aem+Cu2DM7gTMegenh0PfMqSK548QHSqGU5fyiMpIbZslABs73IgmHceaonwXTc66EikV7HViOyNqUhewAYvTsS3PMrg0Qpe7qL7N0n4nj49TBv1fD4sSA90bO/NB3+6H8garE+jjkDlkrkzoYq6lJcrsVBFcbFzhQc5B75ziLuzGV+Upv86e6TIdt45WmpjNQ3K5nlxVtMlEnHKd+APArdHTtF3ZLU4kDmrKKRNvWwvTCtfBfOAiBnt86hzSFOxO3fdPxZnDS2YECddAbj3uKY79fXPWA/973d5F71ZchEdIOtyDXTT2mgzHm1vuSOMe23AgdYzldQj25zB+PZR6ScS65WWvhNLOEEx0uMvs HB+/SG+K Ux7gwYEzl9YzzjoU+wCZhs6RGzfB/ygbqPI0+LhWIH0ggHrgW/ckzZSamfBLsM25XRSxh60fDRQxSRsvCxRC0GjPfGwy8D383RXHuObJkfqS5co9lR3BPF+aS+yJVYVj2GWwtAH4F73wkwUxoczyJzUHO8oVhJ3KqzUycJxyT5tTwkg8DkjiER6f+FHQpkVGPLHTOpSTis1MZJVnyw1bpjCobiD+ijzHrxw7o27oEWv3MlFn0Y785vjwsI6ameTAXKpHgq2Fpby+5zBuwTz83MBlv89ANmGPC3ULn/rS97kQQQxk2uYvxCeqCOAvetMw84pBmBF1dC+oq1KN94wPzqgjAoXdqCYKeM+fnACoxjO+dR9Bi6depLNKlAIyMjquHMK4vgZzv6++dKzFZ6aGdWVIwWnehFTAlPDZP2JETJNYZVy6NjIbV9BFY7Cuj+qya7pd9MCD7fiP1GVkg7Pa+FIQoBzXRq366yBxdbjHTgOw3Rdt1C6HkgQVuv0aTgWEvDD+1zdSuCIV3ej3+NKZVc/CXkiyh/j71p6eoe0rLNBgBp/4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The previous commits were preparation for using pkernfs memory for IOMMU pgtables: a file in the filesystem is available and an allocator to allocate 4-KiB pages from that file is available. Now use those to actually use pkernfs memory for root and context pgtable pages. If pkernfs is enabled then a "region" (physical and virtual memory chunk) is fetch from pkernfs and used to drive the allocator. Should this rather just be a pointer to a pkernfs inode? That abstraction seems leaky but without having the ability to store struct files at this point it's probably the more accurate. The freeing still needs to be hooked into the allocator... --- drivers/iommu/intel/iommu.c | 24 ++++++++++++++++++++---- drivers/iommu/intel/iommu.h | 2 ++ 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 744e4e6b8d72..2dd3f055dbce 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,7 @@ #include "../dma-iommu.h" #include "../irq_remapping.h" #include "../iommu-sva.h" +#include "../pgtable_alloc.h" #include "pasid.h" #include "cap_audit.h" #include "perfmon.h" @@ -617,7 +619,12 @@ struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus, if (!alloc) return NULL; - context = alloc_pgtable_page(iommu->node, GFP_ATOMIC); + if (pkernfs_enabled()) + iommu_alloc_page_from_region( + &iommu->pkernfs_region, + (void **) &context, NULL); + else + context = alloc_pgtable_page(iommu->node, GFP_ATOMIC); if (!context) return NULL; @@ -1190,7 +1197,15 @@ static int iommu_alloc_root_entry(struct intel_iommu *iommu) { struct root_entry *root; - root = alloc_pgtable_page(iommu->node, GFP_ATOMIC); + if (pkernfs_enabled()) { + pkernfs_alloc_iommu_root_pgtables(&iommu->pkernfs_region); + root = pgtable_get_root_page( + &iommu->pkernfs_region, + liveupdate); + } else { + root = alloc_pgtable_page(iommu->node, GFP_ATOMIC); + } + if (!root) { pr_err("Allocating root entry for %s failed\n", iommu->name); @@ -2790,7 +2805,7 @@ static int __init init_dmars(void) init_translation_status(iommu); - if (translation_pre_enabled(iommu) && !is_kdump_kernel()) { + if (translation_pre_enabled(iommu) && !is_kdump_kernel() && !liveupdate) { iommu_disable_translation(iommu); clear_translation_pre_enabled(iommu); pr_warn("Translation was enabled for %s but we are not in kdump mode\n", @@ -2806,7 +2821,8 @@ static int __init init_dmars(void) if (ret) goto free_iommu; - if (translation_pre_enabled(iommu)) { + /* For the live update case restore pgtables, don't copy */ + if (translation_pre_enabled(iommu) && !liveupdate) { pr_info("Translation already enabled - trying to copy translation structures\n"); ret = copy_translation_tables(iommu); diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index e6a3e7065616..a2338e398ba3 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -672,6 +673,7 @@ struct intel_iommu { unsigned long *copied_tables; /* bitmap of copied tables */ spinlock_t lock; /* protect context, domain ids */ struct root_entry *root_entry; /* virtual address */ + struct pkernfs_region pkernfs_region; struct iommu_flush flush; #endif From patchwork Mon Feb 5 12:01:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545356 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AF82C4828D for ; Mon, 5 Feb 2024 12:04:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EBBB6B009A; Mon, 5 Feb 2024 07:04:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 49AF56B009B; Mon, 5 Feb 2024 07:04:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33C2C6B009C; Mon, 5 Feb 2024 07:04:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 174EC6B009B for ; Mon, 5 Feb 2024 07:04:38 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id ED822A0A77 for ; Mon, 5 Feb 2024 12:04:37 +0000 (UTC) X-FDA: 81757618194.08.5725743 Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) by imf18.hostedemail.com (Postfix) with ESMTP id F34BC1C0014 for ; Mon, 5 Feb 2024 12:04:35 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=ZpHiYYS7; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf18.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.154 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134676; a=rsa-sha256; cv=none; b=tCppRAWuoEip17YICTuCgKlYLA6S6p8CcTk1gSIlprPcKNuAgcHlfkFTcHByhOaDSDQm0a BIDO2nxR8+oTJ5AxeA13I2f30L2fN31wVYXglCD/6F3TBLrAa7Z8IsCww0TJ7InjfsXMzD RdgQfOoc2iW/elAFp1XGa5bwFsIo/yU= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=ZpHiYYS7; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf18.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.154 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134676; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R61AUSOacFWGGPMbVgCPWnIrJ1KmkLbm3r8EEWKzIHg=; b=3QzxeHjeA99jqY3dpMplJR0rffkuE5iU3klZto989uztw37PsmysfiY49SY/g29j9azhHs gX1rI0doJMJTxnY7Hg2bkqjgjH5rHCc3qG1zZH0xElSzudNrXtJ6mBaTxHnOEdMBhtmVIr ulQ20nrl4bjlhBoi8j5RzQnAQMIRucg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134676; x=1738670676; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=R61AUSOacFWGGPMbVgCPWnIrJ1KmkLbm3r8EEWKzIHg=; b=ZpHiYYS7bX7Ee49iWVwgn6tXTcpzrQg/OJnRaSUU+v0EuUDGFcIda7LO mY2xzJKofyFUzM1DcCoHChqTwj4RQkUcyhJHjPRSfxKtVcqRRmNS7Y+ay ACpFwNad/ZcuF1gsLqjjysVMrpa1wm5F71PkDijz43Vm0nZt15Jgnizj+ g=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="182633292" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:04:31 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.43.254:29990] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.14.129:2525] with esmtp (Farcaster) id 479437ca-efef-4c96-b04c-24208e90e4af; Mon, 5 Feb 2024 12:04:30 +0000 (UTC) X-Farcaster-Flow-ID: 479437ca-efef-4c96-b04c-24208e90e4af Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA001.ant.amazon.com (10.252.50.192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:29 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:23 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 10/18] iommu/intel: zap context table entries on kexec Date: Mon, 5 Feb 2024 12:01:55 +0000 Message-ID: <20240205120203.60312-11-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D041UWA001.ant.amazon.com (10.13.139.124) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: F34BC1C0014 X-Stat-Signature: ohcy89tjc9kr184nok6i7134hs6kwtuf X-HE-Tag: 1707134675-543707 X-HE-Meta: U2FsdGVkX19oRc0i7yvsfLQrbiGJVL1o6u0U2CUK0ZBwXOGfzPQliDmbw3QGofAEIoYX6C5TWSwP/heE6JpJARJgvdN9pIuE5mYpwqtpvea7YkJmNC1R6TiWCvnEgvh6DQXil6bi6drebytTUGRugYQ34V2OeN2wtU7r1+nEWomjLGPSbMlMwqQwenidLkEzO60rQT959PHtJR8aA6VeHs4OU2+MYDmHC1MEe4xWvRoxJIHp1S2mFxXG6t1Topfu+Wasw81+VYQQfawWjzIYeZ6sFoqMLoFOlhjjB1TA5Jb1eeBGi8PaRjITHUBJdaJZKnZenqocQ42AtNs0qAYjY8buSwbrY+4+oHjCR+Qzn7wDeLgeTri/7R87oXU0Eyfwqlxbu63UqtoMksn+0VZ45kF/pJM4HS6ZXiWXj8+u1wkUwWlYStr7kvH+r5S9+b2BSGmxrIjLhH0OafTWf5N7aIn8y41VO7ETREmino7tVppA/ZCwWBRwD0vG5RIBrV3+9XZUc/ukADq8hNY9BF66C3Pj2+POPQAIfN3i9qtb/TXTVLqfazDzn3MEqqtIOEL3LrYPn29ZPFo9f5NVFoC1tM6yOz/zjReDAgQYy3ZP5U4wFCMagwFaMQ8TQZxpaxF/v4lBnuQ351hmmEg47TpmI1JVeRBw7omD3ePIQtiV+M+FnhI8NBvBa7lAQntb2N4MG8unFwUq7mR+WPXkNN/F941Cjdj7ei9xWwgb5RCeMN/rh6RLN8XTloBMJL3n+5kost6Nb7+WCS+OCcnd09KmUToROXnwkEsSYqRW2pTBhu/n0LIxFDzD79mZH3LhpK+1WLU5JOOo9jsO8LYv50Y2TFH1vfphirl3UR6TkLoEDqJYF6ZOkI6cHDugq8uQySdoQuyb794aa/j1zfTdaW/1wTwexYizgViw0aDdxF+beGRHQa2Q2ZEl7eKhbqF+vZeH1hWEiDgNWDyliBvky7a dIGNftME qkwkEDUhHO5CIXbW89pJc7wG6N1t+DI8+TN+a66f3rQeqfyn0GHa4wJs0CPXpMp1HdX3gicr2pZ4CKL5lGuzSwP62P6cIahNFJ5DI0+VfiuXdK4jgW5zOdUv2mHWm35T7bKSg5n8oL3uZXol/32lYMbtDpBxUIA/xLJ9QaKVoOegkmHXU+6r9Frl3IyZAsvjfo2hxdALNfDAq4cvo3rYKV7ifIGw2OkJ/85EziPKKBtkBXG7cWB3ML4MCIl3cDXnVomtxitK+vey4RnG2UcE6K8ueOtH4BMd+uowtUzrjKxoSVFCMgxRfUzwm+sSHivOJV0YvcD7kFCXvzd8pT+bZszKtkCTUMF/KAbbknq0rNhXBWuftmjAjts4o0xjmrbidYFV3GnutLBklyexDYSsy341oBHvPxAjcWwlXhFLXlT25lOHP2vV7Bqc803QMoNrZlJ6pNpqRuJfzfKvHpwoB/CJRNZQk3cgvRaUaRkdsQvJ45DN5qx/3bhS2Onjz0FTuKJhy1twvNXc+6gzLyXfuHEssobGFvUobXbzfrX9/lodTba2G47USOvh/BQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In the next commit the IOMMU shutdown function will be modified to not actually shut down the IOMMU when doing a kexec. To prevent leaving DMA mappings for non-persistent devices around during kexec we add a function to the kexec flow which iterates though all IOMMU domains and zaps the context entries for the devices belonging to those domain. A list of domains for the IOMMU is added and maintained. --- drivers/iommu/intel/dmar.c | 1 + drivers/iommu/intel/iommu.c | 34 ++++++++++++++++++++++++++++++---- drivers/iommu/intel/iommu.h | 2 ++ 3 files changed, 33 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 23cb80d62a9a..00f69f40a4ac 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1097,6 +1097,7 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) iommu->segment = drhd->segment; iommu->node = NUMA_NO_NODE; + INIT_LIST_HEAD(&iommu->domains); ver = readl(iommu->reg + DMAR_VER_REG); pr_info("%s: reg_base_addr %llx ver %d:%d cap %llx ecap %llx\n", diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 2dd3f055dbce..315c6b7f901c 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1831,6 +1831,7 @@ static int domain_attach_iommu(struct dmar_domain *domain, goto err_clear; } domain_update_iommu_cap(domain); + list_add(&domain->domains, &iommu->domains); spin_unlock(&iommu->lock); return 0; @@ -3608,6 +3609,33 @@ static void intel_disable_iommus(void) iommu_disable_translation(iommu); } +void zap_context_table_entries(struct intel_iommu *iommu) +{ + struct context_entry *context; + struct dmar_domain *domain; + struct device_domain_info *device; + int bus, devfn; + u16 did_old; + + list_for_each_entry(domain, &iommu->domains, domains) { + list_for_each_entry(device, &domain->devices, link) { + context = iommu_context_addr(iommu, device->bus, device->devfn, 0); + if (!context || !context_present(context)) + continue; + context_domain_id(context); + context_clear_entry(context); + __iommu_flush_cache(iommu, context, sizeof(*context)); + iommu->flush.flush_context(iommu, + did_old, + (((u16)bus) << 8) | devfn, + DMA_CCMD_MASK_NOBIT, + DMA_CCMD_DEVICE_INVL); + iommu->flush.flush_iotlb(iommu, did_old, 0, 0, + DMA_TLB_DSI_FLUSH); + } + } +} + void intel_iommu_shutdown(void) { struct dmar_drhd_unit *drhd; @@ -3620,10 +3648,8 @@ void intel_iommu_shutdown(void) /* Disable PMRs explicitly here. */ for_each_iommu(iommu, drhd) - iommu_disable_protect_mem_regions(iommu); - - /* Make sure the IOMMUs are switched off */ - intel_disable_iommus(); + zap_context_table_entries(iommu); + return up_write(&dmar_global_lock); } diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index a2338e398ba3..4a2f163a86f3 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -600,6 +600,7 @@ struct dmar_domain { spinlock_t lock; /* Protect device tracking lists */ struct list_head devices; /* all devices' list */ struct list_head dev_pasids; /* all attached pasids */ + struct list_head domains; /* all struct dmar_domains on this IOMMU */ struct dma_pte *pgd; /* virtual address */ int gaw; /* max guest address width */ @@ -700,6 +701,7 @@ struct intel_iommu { void *perf_statistic; struct iommu_pmu *pmu; + struct list_head domains; /* all struct dmar_domains on this IOMMU */ }; /* PCI domain-device relationship */ From patchwork Mon Feb 5 12:01:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545357 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF3C0C48291 for ; Mon, 5 Feb 2024 12:04:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D2356B009D; Mon, 5 Feb 2024 07:04:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 782506B009E; Mon, 5 Feb 2024 07:04:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 623096B009F; Mon, 5 Feb 2024 07:04:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5162B6B009D for ; Mon, 5 Feb 2024 07:04:42 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1FC59140A93 for ; Mon, 5 Feb 2024 12:04:42 +0000 (UTC) X-FDA: 81757618404.25.FDCAEB4 Received: from smtp-fw-80009.amazon.com (smtp-fw-80009.amazon.com [99.78.197.220]) by imf12.hostedemail.com (Postfix) with ESMTP id 160BC4001D for ; Mon, 5 Feb 2024 12:04:39 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=jBgeXVbC; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf12.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.220 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134680; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J/26rYh9SEOln6y0M0icroO5+g+Mdt4aS3KBehgGFN8=; b=xg6I7xBVaICvgJcSWVLdvQGWWlBLUdgpv35ksIXa6KzRvEzvpypdb0XEd3EP7LOPw0jsQ0 Yfr5YgyHjcDDa6MBSzFmMDjh9sgfFH7v4e5az7J8sM1igQUPSGuT7ClK1WTf2Liuz7Q3PC D9w41eyQjPjGmEuoL0EMEvCNp68Vnow= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=jBgeXVbC; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf12.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.220 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134680; a=rsa-sha256; cv=none; b=rTAtYEbANDprAYycXxObLbf448qC+j8Iq/ieVUrlr+A3tmnuN8CijzmFUkGF3JIUihn5FF LvrH4sB1XnoEOtpHmm86RKA5+Ntt7QisSNmAOTqnHuIvt7bt2PiDS89qS2XSnq5n0s4zae iZnB9w7pEeIGD2zbWpiCx/GahWrM5Sg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134680; x=1738670680; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=J/26rYh9SEOln6y0M0icroO5+g+Mdt4aS3KBehgGFN8=; b=jBgeXVbC7MQ7ZeIq302ImADqVc71l2YNCmbm/gxsDbKY2jZ4RtRubVSp BMrt66UyyL0Rk72Q8hIhiQAcndzT2dObMXoEZ8dAKfdA7IERIGzWm/p2L P8lNTmaiQmpbvrKZomnyha00s5pOWg0qd2QccLfBLKpD+5MDvaNB6k6aE c=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63724561" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:04:38 +0000 Received: from EX19MTAEUA002.ant.amazon.com [10.0.43.254:15018] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id 82f5a304-b865-4a4d-9902-a5be57c26a04; Mon, 5 Feb 2024 12:04:36 +0000 (UTC) X-Farcaster-Flow-ID: 82f5a304-b865-4a4d-9902-a5be57c26a04 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA002.ant.amazon.com (10.252.50.124) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:36 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:30 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 11/18] dma-iommu: Always enable deferred attaches for liveupdate Date: Mon, 5 Feb 2024 12:01:56 +0000 Message-ID: <20240205120203.60312-12-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D041UWA001.ant.amazon.com (10.13.139.124) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspamd-Queue-Id: 160BC4001D X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: tzurcp6gcqazhc9pzq6kagcgcbwjhyrg X-HE-Tag: 1707134679-618287 X-HE-Meta: U2FsdGVkX19mqmgdCir2To7GWMC3EMNeSiLRXF2uB5iwRmBlcE0LUkTIjI8kGVNokHmFecDYrGNO5x2WZx965ArAijSe6+rc0jLqXWqf15+b9Jr481PtUkgn1ulw/Zg+5wvEJR6TFdKq0rr8MnYFC9xRpQCDVOwOrMnEvm191P8FxdT6QaA2CFOBduEz2ih8g383h0o6NzUv4aBhnGFx3/79e9GW3qNqseZfooD0iTvoX4hMEb1OgrO0W4Pmr92xyLLE55vvCXZgS2JXQsZCAB0yDPvy3d027UssHma5e/0y+YUWvGKl32wkNrqqKhz33bSTEdv/KVorjaAo+wz7B06893qe4tLFnMOWWvCRUPiKGDH8rc4Vus1ESs0cBe+7HQV3V2GG5udPjpQci22fdg0mKYfz0b/ZxRcDg1W9T7GDVJukOhFRavyonjy6Ah2P+Xl8YFpVGrU8qRZhjGLkW27OXauPl7tZVqnE9zMQSUxEWaG+qR5yGlcHSE/M9aphhlUSnC/51boY3mBfRYrKuSYYTk1lDZfD0qTPwdyoXyoYaAwrtt8sIKB4J5aLlTWnl7vfXevNW1pS6EXw0GLY3gy74QG8wO0WCTFrwg4hs2RcLQkb2+qlP+GfBl13g/UO7Dv/HLnbs6ZIjCKKlGtb4o+KxxQfHJ/VF/TE7VsXkgrtR3C5C5y60FDRMTF7EsBeCpKDZ9w5naYtf1wTkArsgv2CKdyCuUDzCMCXgHZiOO3oxIfa1ughwfl1x91Bvd2CAPhiIbvitQ35Jd9DpHNQE/TTLiLREc8KED3tHWCIGMf0GWlR/b3IBq9HxuiUVzUYT52BoJEY72ilsoXUIeSd+UwX524PuVkv5yT4JWh0y8WFdj5G0vrYioNptTqKBHXUIxOuQ5w10Sl09ehPwy+MhYPN8L1IXYeruckVQ0FwQfDbkU0PwjfkjkR+1uizFO6Omf002XO7dmfUyMCTPh2 ymAnYapk JXVp3cmxPx1+tW44f34a6J2RutCGWj8O5drOACF3tc6h7JAKP9kcFsTxLu/toC8i5NuZrwsyTFtc/mKT8B5cc4/uXZdN80bS5JoNCqcJDTU/+haoN1G7zMuT/xuM7zUuHEeQxZmJze3oKtvkpo4WMSEZbGHMvgvZcYmOB3neXk+dE9O9OCma1kEvqCY5cLOCYO79MSvvg35VxdSk0Lc4O7nfzkbyL6Z6KdheZLQNJBQRXsk2BIcU2lcnzbH/ed+o/oFPjeWw8xkeOV0Nmfw+FtZGg2Ea1EZvukZa6Rv9pgWZjHasU69hpf5hr5Fgu3mxTgaAYEE46XFvi+QY8lVJ2LXxUv8eosEbtbn4ydSIRGkQQC97okacMIz+Aps7GGvIcz3009LtdCdXF/5Pdj5lbjXOht6GsbLFSoXjT2U+FBFQwKkWxqRF3wb72tsZe67eMt4mqX3oIauyngw2fRP2KXJ56r2uCBvuicFg09uuoRcIGvRiPKdUz5wardd3tW65sSNg7pEHc2OKiZQoU5tPs61mHCppIPz0OaRBnaD+X0WiPP2I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Seeing as translations are pre-enabled, all devices will be set for deferred attach. The deferred attached actually has to be done when doing DMA mapping for devices to work. There may be a better way to do this be, for example, consulting the context entry table and only deferring attach if there is a persisted context table entry for this device. --- drivers/iommu/dma-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index e5d087bd6da1..76f916848f48 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1750,7 +1750,7 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg) static int iommu_dma_init(void) { - if (is_kdump_kernel()) + if (is_kdump_kernel() || liveupdate) static_branch_enable(&iommu_deferred_attach_enabled); return iova_cache_get(); From patchwork Mon Feb 5 12:01:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545358 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C512BC4828D for ; Mon, 5 Feb 2024 12:05:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FCC56B0078; Mon, 5 Feb 2024 07:05:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5ADAB6B0089; Mon, 5 Feb 2024 07:05:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44DF66B00A0; Mon, 5 Feb 2024 07:05:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 36F556B0078 for ; Mon, 5 Feb 2024 07:05:15 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 119FCA0443 for ; Mon, 5 Feb 2024 12:05:15 +0000 (UTC) X-FDA: 81757619790.25.1F15A61 Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) by imf23.hostedemail.com (Postfix) with ESMTP id 2BF3B140020 for ; Mon, 5 Feb 2024 12:05:12 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=rBiC7ikK; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf23.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.150 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134713; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FfAwJVvQHyFiTbONYCZnQNlkav6R7SrA1Dbs6XI8844=; b=BVEu1SdYzROTlV2dk7h4GxHM93rJSeL8VZdjst0hax2vgmDfO+Q1E3lX6i3YkIX28r3DD0 49Lu7qJSqc7AcvWb37o5ZGxxCnNZqgXMZH6xfBKLwi9husU/ujV3/B2Yh4Jx/WVQS3Uviq Wkwb/2/Hi5HWyEAocdmzggrKQ6ZWwCc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=rBiC7ikK; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf23.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.150 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134713; a=rsa-sha256; cv=none; b=qSu7DBvXjdPuAsp5lUWgksPfnMi28AH08yDspfRZoNOVrYQziXlBFi7kbl3Qrze62RksaM qRuUYngPPs77ZfSwm9AnMPkdOmF2Lzp7uhDCJMBmOyMkvjangZYZ/IGRq0jBJ6ffSjF4yO 2LydMUDAOPumaMbT7bDlRAP2QjPrJu4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134713; x=1738670713; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FfAwJVvQHyFiTbONYCZnQNlkav6R7SrA1Dbs6XI8844=; b=rBiC7ikKOvQFn4asAe08Bc5Aazz7SkCi9MG/65Unu2c8sm+oWJpXnFwV g+F5I3SDvv2ophIVeXpl8l4FKUoYjdNTM2Ec4h3KnqbFNJbGLX/tjQzz7 xwH9VULxVmg7l9CbjzZh0S+ckkRukr1velW8RuvRnSIQp/pUGh3BkNWLW Q=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="610940405" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:05:08 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.43.254:20056] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.8.155:2525] with esmtp (Farcaster) id 0814e71f-b1b8-4c6d-9ee2-790efdbab159; Mon, 5 Feb 2024 12:05:06 +0000 (UTC) X-Farcaster-Flow-ID: 0814e71f-b1b8-4c6d-9ee2-790efdbab159 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:06 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:00 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 12/18] pkernfs: Add IOMMU domain pgtables file Date: Mon, 5 Feb 2024 12:01:57 +0000 Message-ID: <20240205120203.60312-13-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D046UWA004.ant.amazon.com (10.13.139.76) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2BF3B140020 X-Stat-Signature: msz7wykwxjd9oq1ihbn9961z46wkbxhx X-HE-Tag: 1707134712-812752 X-HE-Meta: U2FsdGVkX19+y9b+G0k/oyoYNgcYqTbXvdYQ1YI1C1DhL/zpgzWflHzdH/78YBUwXyk/RipgBK6ZLsmv+wqW7nABrRt6XiRuLUAMXym8ROgaa3Tkd13LbNClCNm+7c0Som9NkRc0pi3DlN2fjJYEo5wVbXT84G1xw37VoDEB6TBV6L7YsDEo6iuCVnJPxmakkeEyU3XCafWbB+atV9EHSqYiRb08Fb1v9lLdbovudKlLBzXtYtsod6ePOsgN80S+fGAhKBR5WStCoOxgVyNUQwA0NEm9u+yLK0VI8WVDWII2t5jjMaf2OZaXT8e3sR7TX/vxY5ZA6eNBZG9wVMAFgRi3iyou0VBg8fEBaRSPgSWIr7mhx7gSHKHP//IdJ53Eb9hUliDH1rLRPiuRfMsvHhMKUmuHMzXSb+IhbzCoPvxujN1FC3ci0hNotw4NWMZONPgIpcfpr2cwp+gNJ864SW48naRb1AJ/wzdoPSgKgS7v8GycjHQGxzgT0xeiaM2eexz01GIDfPrXJFswa0VyFL5cFydKYGQVImdyPoswYIXuOj39I7xptEyfmYJUKxAw3nceveMvkOIvqnNfa+oU+tNF0CmJxNt3mEfBFhhCrrklmmhabUvb+VQRX7z7oOIohE6b/tj/1Mt9nC9AQaXxfef5oM+TtV5BNuJ2fDf3mixP4Z7m3UEV4spP2vSpAGvfs1pU/Tqo22QdR8YO2VnKZs19jaBqFGzsNca5lJAxTc01aa2AHCjE/Sio8jL/JR0f2otVti00+XmKVJsPFOig8QGfvJYpdQRkOgbzeebdetHo19QcbWY/eWXWiRlVIJt2rmrJzrOd8OigtwIZrm88UEuyREhMabj8xE8S1JXD5Y00/ZCD7Uk/4gIWQBSKmXVacnNIymjtM9+9d446XTNYZImxNrRaNvs/sCFcGcpCg0mJssKbzg/ZgjoSMPW3o6rU6YWo2BeboKp3HJvnK7+ sDUUxBSC cOD+xOHNdcLG5+5fE+rnVVdsmnSg56cJ2QrIjyIaovwU6OCVbeuAgvixPA/jboKuNqQJHqDzMjGZw0J+EN77iF3MhHbetcGOQg//IY4W/qLZet7QmJLDsn8bhRJBIBSft5Guxbq3FmYohLnSvr8JCsxFzfnVVf11R2ke9mGQ5++8GvDHwzwIX9eweLH/Mx2IlqgUj5pmgDAMqZCsLafgWRs5cahOWkrm4NeO+wP9+3EgLcWy7UZj+FTnDZVK2nADDjX0EOSOOmuorqB/+X7miDjBelUSJ3GHWlP1RepYZ+CHsE0k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Similar to the IOMMU root pgtables file which was added in a previous commit, now support a file type for IOMMU domain pgtables in the IOMMU directory. These domain pgtable files only need to be useable after the system has booted up, for example by QEMU creating one of these files and using it to back the IOMMU pgtables for a persistent VM. As such the filesystem abstraction can be better maintained here as the kernel code doesn't need to reach "behind" the filesystem abstraction like it does for the root pgtables. A new inode type is created for domain pgtable files, and the IOMMU directory gets inode_operation callbacks to support creating and deleting these files in it. Note: there is a use-after-free risk here too: if the domain pgtable file is truncated while it's in-use for IOMMU pgtables then freed memory could still be mapped into the IOMMU. To mitigate this there should be a machanism to "freeze" the files once they've been given to the IOMMU. --- fs/pkernfs/inode.c | 9 +++++-- fs/pkernfs/iommu.c | 55 +++++++++++++++++++++++++++++++++++++++-- fs/pkernfs/pkernfs.h | 4 +++ include/linux/pkernfs.h | 1 + 4 files changed, 65 insertions(+), 4 deletions(-) diff --git a/fs/pkernfs/inode.c b/fs/pkernfs/inode.c index 1d712e0a82a1..35842cd61002 100644 --- a/fs/pkernfs/inode.c +++ b/fs/pkernfs/inode.c @@ -35,7 +35,11 @@ struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) inode->i_op = &pkernfs_iommu_dir_inode_operations; inode->i_fop = &pkernfs_dir_fops; inode->i_mode = S_IFDIR; - } else if (pkernfs_inode->flags | PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES) { + } else if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES) { + inode->i_fop = &pkernfs_file_fops; + inode->i_mode = S_IFREG; + } else if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES) { + inode->i_fop = &pkernfs_file_fops; inode->i_mode = S_IFREG; } @@ -175,6 +179,7 @@ const struct inode_operations pkernfs_dir_inode_operations = { }; const struct inode_operations pkernfs_iommu_dir_inode_operations = { + .create = pkernfs_create_iommu_pgtables, .lookup = pkernfs_lookup, + .unlink = pkernfs_unlink, }; - diff --git a/fs/pkernfs/iommu.c b/fs/pkernfs/iommu.c index 5bce8146d7bb..f14e76013e85 100644 --- a/fs/pkernfs/iommu.c +++ b/fs/pkernfs/iommu.c @@ -4,6 +4,27 @@ #include +void pkernfs_alloc_iommu_domain_pgtables(struct file *ppts, struct pkernfs_region *pkernfs_region) +{ + struct pkernfs_inode *pkernfs_inode; + unsigned long *mappings_block_vaddr; + unsigned long inode_idx; + + /* + * For a pkernfs region block, the "mappings_block" field is still + * just a block index, but that block doesn't actually contain mappings + * it contains the pkernfs_region data + */ + + inode_idx = ppts->f_inode->i_ino; + pkernfs_inode = pkernfs_get_persisted_inode(NULL, inode_idx); + + mappings_block_vaddr = (unsigned long *)pkernfs_addr_for_block(NULL, + pkernfs_inode->mappings_block); + set_bit(0, mappings_block_vaddr); + pkernfs_region->vaddr = mappings_block_vaddr; + pkernfs_region->paddr = pkernfs_base + (pkernfs_inode->mappings_block * (2 << 20)); +} void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) { unsigned long *mappings_block_vaddr; @@ -63,9 +84,8 @@ void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) * just a block index, but that block doesn't actually contain mappings * it contains the pkernfs_region data */ - mappings_block_vaddr = (unsigned long *)pkernfs_addr_for_block(NULL, - iommu_pgtables->mappings_block); + iommu_pgtables->mappings_block); set_bit(0, mappings_block_vaddr); pkernfs_region->vaddr = mappings_block_vaddr; pkernfs_region->paddr = pkernfs_base + (iommu_pgtables->mappings_block * PMD_SIZE); @@ -88,6 +108,29 @@ void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) (iommu_pgtables->mappings_block * PMD_SIZE); } +int pkernfs_create_iommu_pgtables(struct mnt_idmap *id, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl) +{ + unsigned long free_inode; + struct pkernfs_inode *pkernfs_inode; + struct inode *vfs_inode; + + free_inode = pkernfs_allocate_inode(dir->i_sb); + if (free_inode <= 0) + return -ENOMEM; + + pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, free_inode); + pkernfs_inode->sibling_ino = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino; + pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = free_inode; + strscpy(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN); + pkernfs_inode->flags = PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES; + pkernfs_inode->mappings_block = pkernfs_alloc_block(dir->i_sb); + memset(pkernfs_addr_for_block(dir->i_sb, pkernfs_inode->mappings_block), 0, (2 << 20)); + vfs_inode = pkernfs_inode_get(dir->i_sb, free_inode); + d_add(dentry, vfs_inode); + return 0; +} + void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr) { if (WARN_ON(paddr >= region->paddr + region->bytes)) @@ -96,3 +139,11 @@ void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long return NULL; return region->vaddr + (paddr - region->paddr); } + +bool pkernfs_is_iommu_domain_pgtables(struct file *f) +{ + return f && + pkernfs_get_persisted_inode(f->f_inode->i_sb, f->f_inode->i_ino)->flags & + PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES; +} + diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index e1b7ae3fe7f1..9bea827f8b40 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -21,6 +21,7 @@ struct pkernfs_sb { #define PKERNFS_INODE_FLAG_DIR (1 << 1) #define PKERNFS_INODE_FLAG_IOMMU_DIR (1 << 2) #define PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES (1 << 3) +#define PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES (1 << 4) struct pkernfs_inode { int flags; /* @@ -50,8 +51,11 @@ void *pkernfs_addr_for_block(struct super_block *sb, int block_idx); unsigned long pkernfs_allocate_inode(struct super_block *sb); struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); +int pkernfs_create_iommu_pgtables(struct mnt_idmap *id, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl); extern const struct file_operations pkernfs_dir_fops; extern const struct file_operations pkernfs_file_fops; extern const struct inode_operations pkernfs_file_inode_operations; extern const struct inode_operations pkernfs_iommu_dir_inode_operations; +extern const struct inode_operations pkernfs_iommu_domain_pgtables_inode_operations; diff --git a/include/linux/pkernfs.h b/include/linux/pkernfs.h index 0110e4784109..4ca923ee0d82 100644 --- a/include/linux/pkernfs.h +++ b/include/linux/pkernfs.h @@ -33,4 +33,5 @@ void pkernfs_alloc_page_from_region(struct pkernfs_region *pkernfs_region, void **vaddr, unsigned long *paddr); void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr); +bool pkernfs_is_iommu_domain_pgtables(struct file *f); #endif /* _LINUX_PKERNFS_H */ From patchwork Mon Feb 5 12:01:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545359 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1349C4828D for ; Mon, 5 Feb 2024 12:05:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 735896B0074; Mon, 5 Feb 2024 07:05:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E5C56B008C; Mon, 5 Feb 2024 07:05:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AE976B00A0; Mon, 5 Feb 2024 07:05:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4884C6B0074 for ; Mon, 5 Feb 2024 07:05:20 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2127E14025A for ; Mon, 5 Feb 2024 12:05:20 +0000 (UTC) X-FDA: 81757620000.18.CFE1205 Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) by imf07.hostedemail.com (Postfix) with ESMTP id F276940011 for ; Mon, 5 Feb 2024 12:05:17 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=Tpwt34uS; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf07.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.217 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134718; a=rsa-sha256; cv=none; b=0hLQ5t5wyCibMql9a1z0ynfIh98Gq9Eqw6SZeHxTcmDIqaXuOoBqBRWmWUbfO94szndHy7 W7gH//d4hR3IKFYPEpoNwzg7QMbbEkXuNWHAj5diH/JkEgRBuVRbsRpzEfsl0/spDYF/3H EkyWYg0foAR1Q6sPjDoZ6795jPUT+4U= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=Tpwt34uS; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf07.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.217 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134718; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mVhmItHN9qmSrpLWsRnSDOh51VJYGjIwmSMhmIPWHpo=; b=bUOc3ik0idEX5YjydrKNUq6H1zXQBL5LZamylNXfq809TezUVcZSc6w5kF3WQRoSoa6vts KThJfnPqukWBCJfmlIKH1sgOZ1jB6t0hwTMI/SI5rSOrk7DfW2oUJfC4Ysx/NVF9/tjXoU AKEirto3mxHb63Q5QHYRaTQEC2LaAUY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134718; x=1738670718; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mVhmItHN9qmSrpLWsRnSDOh51VJYGjIwmSMhmIPWHpo=; b=Tpwt34uSXBgwAGElG4iyGBSI7R7ggTDH+cfr4mhjI0GD/mOp6gbtQXG0 hggpy9hjb2RuMXM19PK8q7Tp0zpAejC0iXvq5+zqs5RFa/imrr3HcSIPO BCrkuBTFNwFzIRO5ZWCexm4I+x5kaJiwB0zuOoNfWapjhw0TU27MgLz1Z A=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="271102836" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:05:14 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.17.79:18332] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id 430958cf-6b4e-40ab-be7a-e53a50074ea7; Mon, 5 Feb 2024 12:05:13 +0000 (UTC) X-Farcaster-Flow-ID: 430958cf-6b4e-40ab-be7a-e53a50074ea7 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:13 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:06 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 13/18] vfio: add ioctl to define persistent pgtables on container Date: Mon, 5 Feb 2024 12:01:58 +0000 Message-ID: <20240205120203.60312-14-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D046UWA004.ant.amazon.com (10.13.139.76) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: F276940011 X-Stat-Signature: 4xbtzimrajhobrq8bahi6egbszxozieb X-HE-Tag: 1707134717-364625 X-HE-Meta: U2FsdGVkX1/9j2ZWEwBP/pO+mGhqCWyv8SOs9NGpCA1mVWUzStTYyiUaGXoCT/jyHleC9Po7vagS4IZ5DqGuea05T3guxrjs8bB/s2tbk4zV3BhiFTn14sZoGnsHM38nqGY9l8k7QpaWbnEQEFduRfC2/6xdzDlkxk8rTQ9OJo5JcrslRMpPywWFK9wEuzZBOj7NWDq9TOtnyeU9APDSulE4JtB3TAiw22vSs9ijGctQkMWEeYpO4IzJmtW90DzPIk4cvXzmcq8pJMIuzr+ADPjut2sFFUKG6T74R79KUa328Wibf3g3HGijQCiJdmCMgFgLo+cLnYK+kjPiV/n9t+lvJIDoMrLtNg0jGrvTWRyP3zeKvf1Ub0bhjFtM0393lnqHyJlx1c1bYt8oxjOK1oe8S0dLm2cXyDr/xfqAmkPmPWOBjscHUYla2/DPzpcHzzUGBZ+b0HVQpaDZtHZA2aRUlkUKlAxzhz7jRUXk0NHvBZeQDAUQ9rJaB/jFHiJa0lzAbULoARryrhxclMfnQaRi8lgRme33oj24JbeFhePetYJi2natMCs3H68tEtl3onQK8aluhd5OxTuqXjlNOjsx2308aOdMtafrHbswq1bmeannRqvcPF9HJubLprgK40r3mpdA8QM1+YTQVdMw0zyHxs9kpGigJ8zFU+lqR/JXN6Yu05QJ28mUqa0dE/A9h56pKARB6kcAOoz3JPp0i6/q4TDUc7NOA+5FuHoxC/AXoq9wJLt+XbJip8EAz9GK0xhmNOOpBDtH5/57EQtZhml9Rx82UHweyuXzufap0WOyDtC6MHGRLGzPW77IkUY5Qto48o3vwSHWuhEe4CuuiTmPMD4b8/ht5xnG/Fe6Mht3US7l3nxRV769XPZ/BUfkF/siOVvZHboceodqhoRwXeIuoiL9kgmYAW2TqArL1sTKF3OGb+/9v5LycgmWrgVgqIfMQg+Vs3RiYiR85O4 VOxxi0Jz VRb5BxeJu7BV5R7NhbcNGNUD+ygXBvZv6Nmp8ToyLoXgBw2Hc5O5EDbuHncJlmOFpwoQWluVX5a/aWxNQ2Gssmg6OP2LpfzNwo6SWrKgWlzjYaJxLzf1QOLY1d8l+a+8q6MsK/rcloA/0rwd7IWqhWN/xXc3rEBEAskXlG4+Y8Y+svB9a4vEnLunerWBEN6oP1U2+ovGVSZ0oizzobww08UmaMb5Ws/rkCPbmDbFPMZb/fA9j1kCL11OSf0iTjOHVb27poTT7J0bOt5+g0nUMyp5SIxW2P2BIs2XK/l8YQpCTimAHb47yup4P2TaSxaQKcyrCeftTVvEv58WDZb2+RugFpXMYruYCVgqtYB6GWzu38AXVs3f4EAIB6aoexkK0j6g4ZK3kCz14Pbo2/Nt8p5wvNRfXogcQvZKgM5s5VNLjonCK8TvdinezH42kfRwdGkGpQrInaBhL99/sBMAemXj9N0PZ2UeT5sGHVijY6QxLur0fwVP5kiItfbIcSEiq5rGH0rRTfmR7QMCp75dKlngQ0bCl6EflyulvLobnk038laaf85wpgfRrdg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The previous commits added a file type in pkernfs for IOMMU persistent page tables. Now support actually setting persistent page tables on an IOMMU domain. This is done via a VFIO ioctl on a VFIO container. Userspace needs to create and open a IOMMU persistent page tables file and then supply that fd to the new VFIO_CONTAINER_SET_PERSISTENT_PGTABLES ioctl. That ioctl sets the supplied struct file on the struct vfio_container. Later when the IOMMU domain is allocated by VFIO, VFIO will check to see if the persistent pagetables have been defined and if they have will use the iommu_domain_alloc_persistent API which was introduced in the previous commit to pass the struct file down to the IOMMU which will actually use it for page tables. After kexec userspace needs to open the same IOMMU page table file and set it again via the same ioctl so that the IOMMU continues to use the same memory region for its page tables for that domain. --- drivers/vfio/container.c | 27 +++++++++++++++++++++++++++ drivers/vfio/vfio.h | 2 ++ drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++++++++++++++++-- include/uapi/linux/vfio.h | 9 +++++++++ 4 files changed, 63 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c index d53d08f16973..b60fcbf7bad0 100644 --- a/drivers/vfio/container.c +++ b/drivers/vfio/container.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include "vfio.h" @@ -21,6 +22,7 @@ struct vfio_container { struct rw_semaphore group_lock; struct vfio_iommu_driver *iommu_driver; void *iommu_data; + struct file *persistent_pgtables; bool noiommu; }; @@ -306,6 +308,8 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container, continue; } + driver->ops->set_persistent_pgtables(data, container->persistent_pgtables); + ret = __vfio_container_attach_groups(container, driver, data); if (ret) { driver->ops->release(data); @@ -324,6 +328,26 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container, return ret; } +static int vfio_ioctl_set_persistent_pgtables(struct vfio_container *container, + unsigned long arg) +{ + struct vfio_set_persistent_pgtables set_ppts; + struct file *ppts; + + if (copy_from_user(&set_ppts, (void __user *)arg, sizeof(set_ppts))) + return -EFAULT; + + ppts = fget(set_ppts.persistent_pgtables_fd); + if (!ppts) + return -EBADF; + if (!pkernfs_is_iommu_domain_pgtables(ppts)) { + fput(ppts); + return -EBADF; + } + container->persistent_pgtables = ppts; + return 0; +} + static long vfio_fops_unl_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) { @@ -345,6 +369,9 @@ static long vfio_fops_unl_ioctl(struct file *filep, case VFIO_SET_IOMMU: ret = vfio_ioctl_set_iommu(container, arg); break; + case VFIO_CONTAINER_SET_PERSISTENT_PGTABLES: + ret = vfio_ioctl_set_persistent_pgtables(container, arg); + break; default: driver = container->iommu_driver; data = container->iommu_data; diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 307e3f29b527..6fa301bf6474 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -226,6 +226,8 @@ struct vfio_iommu_driver_ops { void *data, size_t count, bool write); struct iommu_domain *(*group_iommu_domain)(void *iommu_data, struct iommu_group *group); + int (*set_persistent_pgtables)(void *iommu_data, + struct file *ppts); }; struct vfio_iommu_driver { diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index eacd6ec04de5..b36edfc5c9ef 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -75,6 +75,7 @@ struct vfio_iommu { bool nesting; bool dirty_page_tracking; struct list_head emulated_iommu_groups; + struct file *persistent_pgtables; }; struct vfio_domain { @@ -2143,9 +2144,14 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu, static int vfio_iommu_domain_alloc(struct device *dev, void *data) { + /* data is an in pointer to PPTs, and an out to the new domain. */ + struct file *ppts = *(struct file **) data; struct iommu_domain **domain = data; - *domain = iommu_domain_alloc(dev->bus); + if (ppts) + *domain = iommu_domain_alloc_persistent(dev->bus, ppts); + else + *domain = iommu_domain_alloc(dev->bus); return 1; /* Don't iterate */ } @@ -2156,6 +2162,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, struct vfio_iommu_group *group; struct vfio_domain *domain, *d; bool resv_msi; + /* In/out ptr to iommu_domain_alloc. */ + void *domain_alloc_data; phys_addr_t resv_msi_base = 0; struct iommu_domain_geometry *geo; LIST_HEAD(iova_copy); @@ -2203,8 +2211,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, * want to iterate beyond the first device (if any). */ ret = -EIO; - iommu_group_for_each_dev(iommu_group, &domain->domain, + /* Smuggle the PPTs in the data field; it will be clobbered with the new domain */ + domain_alloc_data = iommu->persistent_pgtables; + iommu_group_for_each_dev(iommu_group, &domain_alloc_data, vfio_iommu_domain_alloc); + domain->domain = domain_alloc_data; + if (!domain->domain) goto out_free_domain; @@ -3165,6 +3177,16 @@ vfio_iommu_type1_group_iommu_domain(void *iommu_data, return domain; } +int vfio_iommu_type1_set_persistent_pgtables(void *iommu_data, + struct file *ppts) +{ + + struct vfio_iommu *iommu = iommu_data; + + iommu->persistent_pgtables = ppts; + return 0; +} + static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = { .name = "vfio-iommu-type1", .owner = THIS_MODULE, @@ -3179,6 +3201,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = { .unregister_device = vfio_iommu_type1_unregister_device, .dma_rw = vfio_iommu_type1_dma_rw, .group_iommu_domain = vfio_iommu_type1_group_iommu_domain, + .set_persistent_pgtables = vfio_iommu_type1_set_persistent_pgtables, }; static int __init vfio_iommu_type1_init(void) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index afc1369216d9..fa9676bb4b26 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1797,6 +1797,15 @@ struct vfio_iommu_spapr_tce_remove { }; #define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 20) +struct vfio_set_persistent_pgtables { + /* + * File descriptor for a pkernfs IOMMU pgtables + * file to be used for persistence. + */ + __u32 persistent_pgtables_fd; +}; +#define VFIO_CONTAINER_SET_PERSISTENT_PGTABLES _IO(VFIO_TYPE, VFIO_BASE + 21) + /* ***************************************************************** */ #endif /* _UAPIVFIO_H */ From patchwork Mon Feb 5 12:01:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545360 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE643C48291 for ; Mon, 5 Feb 2024 12:05:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 476716B00A2; Mon, 5 Feb 2024 07:05:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4267E6B00A3; Mon, 5 Feb 2024 07:05:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EF2A6B00A4; Mon, 5 Feb 2024 07:05:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1AD066B00A2 for ; Mon, 5 Feb 2024 07:05:26 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E6D2AA0A77 for ; Mon, 5 Feb 2024 12:05:25 +0000 (UTC) X-FDA: 81757620210.06.60C3796 Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by imf04.hostedemail.com (Postfix) with ESMTP id A69604000A for ; Mon, 5 Feb 2024 12:05:23 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=Cdlq+AIQ; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf04.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134723; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VONiUp37V5G8RKTN0EmhvHz4ErjJ4ifKDUQ6e8VlzjQ=; b=E2XQxlV9Ef/KpwEPaCwU7bVkZCq+//7q4Ll6Vy90LgnQw8PM9wQVr6ac8h/m28PG++9+7U xMYQ0P1yvu/YRmbwnjFH5C5QYV6SY//WsxHlfOL4Hy78mfVbB02uxRDWLhpWFciLJ5p5R0 U3kdpb94I/oEEi/xMCFqBYWxPZp3A6I= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=Cdlq+AIQ; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf04.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134723; a=rsa-sha256; cv=none; b=ncMMcvhOQcciKKi0Lo16JN9Ew6osurJwaNlLuP3sW9Kp5shkdNRKSFlvhCr/nTbTRVfIj/ k0YwEGz+UOM2dhzWiAYk9o+xnWy50dAurrbz3rkGBt3DcPapX7aNJaLsiVQhaSbuzw8yo1 7qpA81indLs8eKasaQXTSAV608ua9v0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134723; x=1738670723; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VONiUp37V5G8RKTN0EmhvHz4ErjJ4ifKDUQ6e8VlzjQ=; b=Cdlq+AIQbqx+tAN61oJAjm9aV3fcH14jYZ9Ta2F+LDtFMLbkquK5uamD z95JSWl2WGqUCcX8mdY/MNmYnjaYhjvM5V3eAD6bbU5yM0EiNnHk/mlB/ kanMzDDkDgLdDmyaqRl5piuDAk3sl3eFHqVfwloEEo+94Gf6NyOb7CTWq 8=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63755776" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:05:21 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.10.100:40572] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.14.233:2525] with esmtp (Farcaster) id 27f4f8c6-d18d-45f7-aee0-af42d924df8a; Mon, 5 Feb 2024 12:05:19 +0000 (UTC) X-Farcaster-Flow-ID: 27f4f8c6-d18d-45f7-aee0-af42d924df8a Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:19 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:13 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 14/18] intel-iommu: Allocate domain pgtable pages from pkernfs Date: Mon, 5 Feb 2024 12:01:59 +0000 Message-ID: <20240205120203.60312-15-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D046UWA004.ant.amazon.com (10.13.139.76) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspamd-Queue-Id: A69604000A X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: n3jj1jt36jkrpfj7atfhyj6p5xuswgqn X-HE-Tag: 1707134723-834118 X-HE-Meta: U2FsdGVkX1+OWpKlfPHvHX5LoOOmLPSc0oyYc/Ln3Se4kOje68n+2V3k3sKEoLLojBoI4NtPN2lG4+ec/j37+zZi9M8J1nR52Nmh8wjFQKveaFPum5JgHKlaURlOuxbT/FrBRK3D0e6aWrORTMRVf6yn+k3+AQKqK6CjvaJ+j93BrR0/JATlfVIX207LS6Ch60OVv5UtXci7RUOHoGy/0WOc2XQIF7ka1kHmCrdpGAxy0KYtB4xp4O0KlgU+lUK7wjNI52KFVaOLbONU1PFL8S4bmrcBz01xUQ8yVDMbCVvbQW9qpXgBypAohXR5VN4Tm/RlhFiqjqOP9D12Wo6gZt6F+5HNdyH9Y1YbSi7IWkihOljbURuC/0+t9dRiAKVAb8ljqOZajxbdfFKvnbfH6Ghg2M5fcJa1fi5228xPylOwIHa8gYYBXOPARNCkEn5w4kHje4iVFw12G+cKOuCoNdvXwsiWSoeYox6q4e0po3UjbRd+igupKbH4HnZINO22jyTkMoTuow/3QxvwFinvvt/KTQRnmeRYRuVXKUMK7SDoTkH6U0HV+3eEn6r4flxTsUojYgqH9X5l/TOgjJ7Dya7uMp6H4WkV4JyLf9M5BPmHeL+FpeAunC+FwcFMODOvsmiAzxyycFs069whFAbvnXjxg4OHmKGY4doe68SblDcl0xxHaSQi7R03EWP8O3Wbr6mUJoU7xZFnq18WThInJjPG5I+lcIzAOnqpBU0Omj+mpEoLFtBU7TfD6TlxbfEOPW8khzhZp1Lh5MhGd4eE/Hjzcqq3KYFUGoa8BqY206o6eDIAommDo6fYI9y+X1WJ+TD1x8QEe7AsyduQwwDwdK2xA902wYNIIwlCXbo92xiOwKE1EoaYUquSmrTL/Xc5/elvDvggE5JgICWzC0qhsv0e+PyIp83S5XNA070vwQ6c4kKBSTZ16FIC8mP+qv2moX2OZkGpjitIeGwvuUc nRLS2QYo 2rhuQ7W5VPuoFnNbmlf4ji6s6zGGehg6zIOISHC4gJsmiJ85YxkunXS33NEecv8pJPIJA4hwNtKOc5crlp6SxWi4C/e0qCieiwBwLwRjFySDW4ldAFzNc64OhaOzbA2swiqQ2VWJjLoiSliXBsTsWsofPj1wjIZ3tjxlHzzOQv90bbQaMYWP2+NFpJ5UP/7cpgtcxechVgOZiFpPt4MYO0dzNN7xI4nZKQAXXoYMCQg+R9VTQv6aPuecZhDCseDjJrRjxW89Xp6mpn91ICjVdRXL/r49wQQgfHHvy5tI2nRzonrDqdnzyv+wpWMb7h6uhl86BJa6wXWO13e2xo8eBKCNyF3vVz9ECUCTNaeTQ6/3feDunTmMDnYECh2lv9ddYJSww6+4BOtATtbMLXkIQ841oI82w+3nQMRXyWzblFo3zGVR1e1ec7QDUsebqHAge8PeyOZk1055bE9IVfJKqJLpkI6Q/JrHegSHw+ZwqI2+//BsFDLX2GKR0UGSiZYy8FWTx44FCA/+fyR5J8/+6FzpkjHrVZw+nNb7oW4Xq6DpPE7w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In the previous commit VFIO was updated to be able to define persistent pgtables on a container. Now the IOMMU driver is updated to accept the file for persistent pgtables when the domain is allocated and use that file as the source of pages for the pgtables. The iommu_ops.domain_alloc callback is extended to page a struct file for the pkernfs domain pgtables file. Most call sites are updated to supply NULL here, indicating no persistent pgtables. VFIO's caller is updated to plumb the pkernfs file through. When this file is supplied the md_domain_init() function convers the file into a pkernfs region and uses that region for pgtables. Similarly to the root pgtables there are use after free issues with this that need sorting out, and the free() functions also need to be updated to free from the pkernfs region. It may be better to store the struct file on the dmar_domain and map file offset to addr every time rather than using a pkernfs region for this. --- drivers/iommu/intel/iommu.c | 35 +++++++++++++++++++++++++++-------- drivers/iommu/intel/iommu.h | 1 + drivers/iommu/iommu.c | 22 ++++++++++++++-------- drivers/iommu/pgtable_alloc.c | 7 +++++++ drivers/iommu/pgtable_alloc.h | 1 + fs/pkernfs/iommu.c | 2 +- include/linux/iommu.h | 6 +++++- include/linux/pkernfs.h | 1 + 8 files changed, 57 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 315c6b7f901c..809ca9e93992 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -946,7 +946,13 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain, if (!dma_pte_present(pte)) { uint64_t pteval; - tmp_page = alloc_pgtable_page(domain->nid, gfp); + if (domain->pgtables_allocator.vaddr) + iommu_alloc_page_from_region( + &domain->pgtables_allocator, + &tmp_page, + NULL); + else + tmp_page = alloc_pgtable_page(domain->nid, gfp); if (!tmp_page) return NULL; @@ -2399,7 +2405,7 @@ static int iommu_domain_identity_map(struct dmar_domain *domain, DMA_PTE_READ|DMA_PTE_WRITE, GFP_KERNEL); } -static int md_domain_init(struct dmar_domain *domain, int guest_width); +static int md_domain_init(struct dmar_domain *domain, int guest_width, struct file *ppts); static int __init si_domain_init(int hw) { @@ -2411,7 +2417,7 @@ static int __init si_domain_init(int hw) if (!si_domain) return -EFAULT; - if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) { + if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH, NULL)) { domain_exit(si_domain); si_domain = NULL; return -EFAULT; @@ -4029,7 +4035,7 @@ static void device_block_translation(struct device *dev) info->domain = NULL; } -static int md_domain_init(struct dmar_domain *domain, int guest_width) +static int md_domain_init(struct dmar_domain *domain, int guest_width, struct file *ppts) { int adjust_width; @@ -4042,8 +4048,21 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width) domain->iommu_superpage = 0; domain->max_addr = 0; - /* always allocate the top pgd */ - domain->pgd = alloc_pgtable_page(domain->nid, GFP_ATOMIC); + if (ppts) { + unsigned long pgd_phy; + + pkernfs_get_region_for_ppts( + ppts, + &domain->pgtables_allocator); + iommu_get_pgd_page( + &domain->pgtables_allocator, + (void **) &domain->pgd, + &pgd_phy); + + } else { + /* always allocate the top pgd */ + domain->pgd = alloc_pgtable_page(domain->nid, GFP_ATOMIC); + } if (!domain->pgd) return -ENOMEM; domain_flush_cache(domain, domain->pgd, PAGE_SIZE); @@ -4064,7 +4083,7 @@ static struct iommu_domain blocking_domain = { } }; -static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) +static struct iommu_domain *intel_iommu_domain_alloc(unsigned int type, struct file *ppts) { struct dmar_domain *dmar_domain; struct iommu_domain *domain; @@ -4079,7 +4098,7 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) pr_err("Can't allocate dmar_domain\n"); return NULL; } - if (md_domain_init(dmar_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) { + if (md_domain_init(dmar_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH, ppts)) { pr_err("Domain initialization failed\n"); domain_exit(dmar_domain); return NULL; diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 4a2f163a86f3..f772fdcf3828 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -602,6 +602,7 @@ struct dmar_domain { struct list_head dev_pasids; /* all attached pasids */ struct list_head domains; /* all struct dmar_domains on this IOMMU */ + struct pkernfs_region pgtables_allocator; struct dma_pte *pgd; /* virtual address */ int gaw; /* max guest address width */ diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3a67e636287a..f26e83d5b159 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -97,7 +97,7 @@ static int iommu_bus_notifier(struct notifier_block *nb, unsigned long action, void *data); static void iommu_release_device(struct device *dev); static struct iommu_domain *__iommu_domain_alloc(const struct bus_type *bus, - unsigned type); + unsigned int type, struct file *ppts); static int __iommu_attach_device(struct iommu_domain *domain, struct device *dev); static int __iommu_attach_group(struct iommu_domain *domain, @@ -1734,7 +1734,7 @@ __iommu_group_alloc_default_domain(const struct bus_type *bus, { if (group->default_domain && group->default_domain->type == req_type) return group->default_domain; - return __iommu_domain_alloc(bus, req_type); + return __iommu_domain_alloc(bus, req_type, NULL); } /* @@ -1971,7 +1971,7 @@ void iommu_set_fault_handler(struct iommu_domain *domain, EXPORT_SYMBOL_GPL(iommu_set_fault_handler); static struct iommu_domain *__iommu_domain_alloc(const struct bus_type *bus, - unsigned type) + unsigned int type, struct file *ppts) { struct iommu_domain *domain; unsigned int alloc_type = type & IOMMU_DOMAIN_ALLOC_FLAGS; @@ -1979,7 +1979,7 @@ static struct iommu_domain *__iommu_domain_alloc(const struct bus_type *bus, if (bus == NULL || bus->iommu_ops == NULL) return NULL; - domain = bus->iommu_ops->domain_alloc(alloc_type); + domain = bus->iommu_ops->domain_alloc(alloc_type, ppts); if (!domain) return NULL; @@ -2001,9 +2001,15 @@ static struct iommu_domain *__iommu_domain_alloc(const struct bus_type *bus, return domain; } +struct iommu_domain *iommu_domain_alloc_persistent(const struct bus_type *bus, struct file *ppts) +{ + return __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED, ppts); +} +EXPORT_SYMBOL_GPL(iommu_domain_alloc_persistent); + struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus) { - return __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED); + return __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED, NULL); } EXPORT_SYMBOL_GPL(iommu_domain_alloc); @@ -3198,14 +3204,14 @@ static int __iommu_group_alloc_blocking_domain(struct iommu_group *group) return 0; group->blocking_domain = - __iommu_domain_alloc(dev->dev->bus, IOMMU_DOMAIN_BLOCKED); + __iommu_domain_alloc(dev->dev->bus, IOMMU_DOMAIN_BLOCKED, NULL); if (!group->blocking_domain) { /* * For drivers that do not yet understand IOMMU_DOMAIN_BLOCKED * create an empty domain instead. */ group->blocking_domain = __iommu_domain_alloc( - dev->dev->bus, IOMMU_DOMAIN_UNMANAGED); + dev->dev->bus, IOMMU_DOMAIN_UNMANAGED, NULL); if (!group->blocking_domain) return -EINVAL; } @@ -3500,7 +3506,7 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev, const struct iommu_ops *ops = dev_iommu_ops(dev); struct iommu_domain *domain; - domain = ops->domain_alloc(IOMMU_DOMAIN_SVA); + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA, false); if (!domain) return NULL; diff --git a/drivers/iommu/pgtable_alloc.c b/drivers/iommu/pgtable_alloc.c index f0c2e12f8a8b..276db15932cc 100644 --- a/drivers/iommu/pgtable_alloc.c +++ b/drivers/iommu/pgtable_alloc.c @@ -7,6 +7,13 @@ * The first 4 KiB is the bitmap - set the first bit in the bitmap. * Scan bitmap to find next free bits - it's next free page. */ +void iommu_get_pgd_page(struct pkernfs_region *region, void **vaddr, unsigned long *paddr) +{ + set_bit(1, region->vaddr); + *vaddr = region->vaddr + (1 << PAGE_SHIFT); + if (paddr) + *paddr = region->paddr + (1 << PAGE_SHIFT); +} void iommu_alloc_page_from_region(struct pkernfs_region *region, void **vaddr, unsigned long *paddr) { diff --git a/drivers/iommu/pgtable_alloc.h b/drivers/iommu/pgtable_alloc.h index c1666a7be3d3..50c3abba922b 100644 --- a/drivers/iommu/pgtable_alloc.h +++ b/drivers/iommu/pgtable_alloc.h @@ -3,6 +3,7 @@ #include #include +void iommu_get_pgd_page(struct pkernfs_region *region, void **vaddr, unsigned long *paddr); void iommu_alloc_page_from_region(struct pkernfs_region *region, void **vaddr, unsigned long *paddr); diff --git a/fs/pkernfs/iommu.c b/fs/pkernfs/iommu.c index f14e76013e85..5d0b256e7dd8 100644 --- a/fs/pkernfs/iommu.c +++ b/fs/pkernfs/iommu.c @@ -4,7 +4,7 @@ #include -void pkernfs_alloc_iommu_domain_pgtables(struct file *ppts, struct pkernfs_region *pkernfs_region) +void pkernfs_get_region_for_ppts(struct file *ppts, struct pkernfs_region *pkernfs_region) { struct pkernfs_inode *pkernfs_inode; unsigned long *mappings_block_vaddr; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0225cf7445de..01bb89246ef7 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -101,6 +101,7 @@ struct iommu_domain { enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault, void *data); void *fault_data; + struct file *persistent_pgtables; union { struct { iommu_fault_handler_t handler; @@ -266,7 +267,8 @@ struct iommu_ops { void *(*hw_info)(struct device *dev, u32 *length, u32 *type); /* Domain allocation and freeing by the iommu driver */ - struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); + /* If ppts is not null it is a persistent domain; null is non-persistent */ + struct iommu_domain *(*domain_alloc)(unsigned int tiommu_domain_type, struct file *ppts); struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev); @@ -466,6 +468,8 @@ extern bool iommu_present(const struct bus_type *bus); extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap); extern bool iommu_group_has_isolated_msi(struct iommu_group *group); extern struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus); +extern struct iommu_domain *iommu_domain_alloc_persistent(const struct bus_type *bus, + struct file *ppts); extern void iommu_domain_free(struct iommu_domain *domain); extern int iommu_attach_device(struct iommu_domain *domain, struct device *dev); diff --git a/include/linux/pkernfs.h b/include/linux/pkernfs.h index 4ca923ee0d82..8aa69ef5a2d8 100644 --- a/include/linux/pkernfs.h +++ b/include/linux/pkernfs.h @@ -31,6 +31,7 @@ struct pkernfs_region { void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region); void pkernfs_alloc_page_from_region(struct pkernfs_region *pkernfs_region, void **vaddr, unsigned long *paddr); +void pkernfs_get_region_for_ppts(struct file *ppts, struct pkernfs_region *pkernfs_region); void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr); bool pkernfs_is_iommu_domain_pgtables(struct file *f); From patchwork Mon Feb 5 12:02:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545363 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7274AC48298 for ; Mon, 5 Feb 2024 12:06:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 033876B0099; Mon, 5 Feb 2024 07:06:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EFD6C8D0001; Mon, 5 Feb 2024 07:06:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D76D96B00A8; Mon, 5 Feb 2024 07:06:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C7F4A6B0099 for ; Mon, 5 Feb 2024 07:06:08 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 82FBF1602BB for ; Mon, 5 Feb 2024 12:06:08 +0000 (UTC) X-FDA: 81757622016.07.ABC0B2A Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) by imf04.hostedemail.com (Postfix) with ESMTP id 906164000A for ; Mon, 5 Feb 2024 12:06:06 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=EfUur+1L; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf04.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134766; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nHBURAsv2kVj18uGoBS/ynF5eAKuZbHIuIE7fti0DT8=; b=y6mhB2mvK9mc+z0JkKLVj4sCCeavFrM3bduhhPJq46Pdo20FtJvz4ytf63H0MAr2M7YHMa K5xSbabf9FX4+OLR4ZdUyfe0GjkUaNoMQUjoL+GwhLZSKeGn75P60pagPLEDsPJApbtRmt CVF96mDm1TEZKEIuOJg9+4e/2jqxfBM= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=EfUur+1L; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf04.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134766; a=rsa-sha256; cv=none; b=MUaYzFyf4KEaEVBWeNdMVzaWXg46IG16HrR4w44McMRMWRTXLazsZ6dwwWF/GiJtlBB23p MNSyYSUTVpK9sYF/VAS8MMLWobVGmn/rcR1blGjMQqdq8apet4VJalVbEeursDvy4xZiBz 6Ca+1PDDcSTd8ix1W8htBKWb9u1/D0A= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134767; x=1738670767; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nHBURAsv2kVj18uGoBS/ynF5eAKuZbHIuIE7fti0DT8=; b=EfUur+1LCuHheRnPXNvIOZNyPin/p/li8S7jW+JZY+sTktzI1YNslElN +4HM6kxBDsI6IIqWpDqJsnQX4n1DnSExGX9V9eMYpoWWL6euAv69Sj7D8 np16/KTe3oIEOlZUuck2xFV8FRGRB97HARnSHVydp0W+E9zCnvKLrfLhm I=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="635764854" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:06:02 +0000 Received: from EX19MTAEUA002.ant.amazon.com [10.0.43.254:50504] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id a892883c-e384-427e-8387-b1c5e5820895; Mon, 5 Feb 2024 12:05:50 +0000 (UTC) X-Farcaster-Flow-ID: a892883c-e384-427e-8387-b1c5e5820895 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA002.ant.amazon.com (10.252.50.124) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:49 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:43 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 15/18] pkernfs: register device memory for IOMMU domain pgtables Date: Mon, 5 Feb 2024 12:02:00 +0000 Message-ID: <20240205120203.60312-16-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D042UWB002.ant.amazon.com (10.13.139.175) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspam-User: X-Stat-Signature: c4c5wumfb18x34wbmikqoixhfmox9363 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 906164000A X-HE-Tag: 1707134766-91815 X-HE-Meta: U2FsdGVkX1+R2NtFP+9crpRBQRQzAwCblpgkpQcPVA2QDbLn/SOsubEel7Nn9DdYnySvmhiOcbK8ur/ikad/NVAv7zZGA1VQuImDTYAynFT+XT34adV+9abn9unnWzFATWF6fTClEJa/lpVXeS3Lgdt4UWnAmFCWyQpuY2d7Q+xm8aRQNdqjZeWuNd6s2zvG82F5sDo0rq484cH5rNAsu7g536ZvaYxCpLOwILiWjdeR5oOTwis8pSLqlcSxtVxlQ1HqdVYeFGWWlYpbwBL2vPb3ha/Hp1xZOZuOjYx1u0xIhfVn/m2xlOjrM6Gy6JaeflkqwT34tpN9zve4jA0KQAHGR14vBFOgg3o0tcMe/5IZ3ziSz2eHUSa13CnRtpMrhHLePKtczEmm9l+TvP3hRfVKhh9+m0aLir1pfXDTIDZlM4K90nNJ02YdxPKnT24bxE63R/1OSFXONgyGVF0JzLrTAvJqco8Jsl8YR3p1V+tCRLkB+NfHkcMQWyWvDFbuBqLS+idCuGczvJqg8WklT4XOwl7mIX5upEjytMoXfCLy8Rktl1xwifhXTIrX43bpba/tYw1D/YQ2AU3DxzmC3rJRG8ZOCFH3EqSFVZy4jPnBvJa0j+vjcwKeW9XY4NWwZCoiRZp5Ovmd2IfSkrBj4feKCrS69cw06TlBfQfOydqlJ3eB7xRkAy8B8sD/xJn/uCizUfd5FnzSYf4afafXRJnQiB1uvrN4CAiZMkJYeKX1oRBERDfarZUH2KzAP1PhkL5Tk8TljLQcAH7mqFt74vfLJAMS6YGNXA3/FLrii8kpBlwb1+uWXL1Zp3vwWFI8pi/BiTHIOzrSFfAqpTDTkkjCZjYfTEvsLSmqDm6dbh6qk3yOLrJkRbTXeKBGeexOIWME4GLulq9JUy+mnM9n3v4x2tyZSHS0MBYtWgxeOCzInskC7jcs9lV8NSQ1dvy2/XXQxVV1z+FZ5YyvtMN 6dWS80vN tufMcV9X3zeRTA3TNDaF9H8GVshlTa8X6JjdR8DzlVs7YpQJZaKi+FeTAUiOFlyGxgGl93a2QGjMAY6yaTS5HJMYeYzgVKTDXja5qhF7zllArljdB0tkSlIQw32Q4VkclinBKDGJtna2Hi3vRndOEQ7fWNgY5fieBGYuZRXa4IDpPX5HFvrGrE3mGiqNF3BOTl3AsMOGeu4v5ma0fWn8btN7c3A5aw+jfK9xiEFfMKVn6tScCLfxcCHLjyuw9vdg4zgAZxuxWQldTrvA/8Jkl7npvyhVHwgQAOVETxTB0dh9ikyWaiuC1lEQT28WtXbxz0+J5j62N6+KuOVvB0ByooUGXDURTwGU1ETre1YN23XzHk0RN7JWfpdSdUDD9zka0hcJ6+7t+QAMnf0jM6JCau/+CamX4W8wRNtpdY0qqcX8c/rFpGfAZpFJwHeNXTKRI4jSSlVJRqZ978WZWeyz32Ck4XCcXBrX1DGY9Q9dO6W2OYzbItM8yi5HAvrCuQtERJQk8qoud9Bxp4XxFYPa5TKxnmdRd1ybhbAxbIE0litblNw8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Similarly to the root/context pgtables, the IOMMU driver also does phys_to_virt when walking the domain pgtables. To make this work properly the physical memory needs to be mapped in at the correct place in the direct map. Register a memory device to support this. The alternative would be to wrap all of the phys_to_virt functions in something which is pkernfs aware. --- fs/pkernfs/iommu.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/pkernfs/iommu.c b/fs/pkernfs/iommu.c index 5d0b256e7dd8..073b9dd48237 100644 --- a/fs/pkernfs/iommu.c +++ b/fs/pkernfs/iommu.c @@ -9,6 +9,7 @@ void pkernfs_get_region_for_ppts(struct file *ppts, struct pkernfs_region *pkern struct pkernfs_inode *pkernfs_inode; unsigned long *mappings_block_vaddr; unsigned long inode_idx; + int rc; /* * For a pkernfs region block, the "mappings_block" field is still @@ -22,7 +23,20 @@ void pkernfs_get_region_for_ppts(struct file *ppts, struct pkernfs_region *pkern mappings_block_vaddr = (unsigned long *)pkernfs_addr_for_block(NULL, pkernfs_inode->mappings_block); set_bit(0, mappings_block_vaddr); - pkernfs_region->vaddr = mappings_block_vaddr; + + dev_set_name(&pkernfs_region->dev, "vfio-ppt-%s", pkernfs_inode->filename); + rc = device_register(&pkernfs_region->dev); + if (rc) + pr_err("device_register failed: %i\n", rc); + + pkernfs_region->pgmap.range.start = pkernfs_base + + (pkernfs_inode->mappings_block * PMD_SIZE); + pkernfs_region->pgmap.range.end = + pkernfs_region->pgmap.range.start + PMD_SIZE - 1; + pkernfs_region->pgmap.nr_range = 1; + pkernfs_region->pgmap.type = MEMORY_DEVICE_GENERIC; + pkernfs_region->vaddr = + devm_memremap_pages(&pkernfs_region->dev, &pkernfs_region->pgmap); pkernfs_region->paddr = pkernfs_base + (pkernfs_inode->mappings_block * (2 << 20)); } void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) From patchwork Mon Feb 5 12:02:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545361 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C416C48291 for ; Mon, 5 Feb 2024 12:06:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11C5A6B007D; Mon, 5 Feb 2024 07:06:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A5536B00A5; Mon, 5 Feb 2024 07:06:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EAEC36B00A6; Mon, 5 Feb 2024 07:06:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DC1396B00A4 for ; Mon, 5 Feb 2024 07:06:00 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BF42340A33 for ; Mon, 5 Feb 2024 12:06:00 +0000 (UTC) X-FDA: 81757621680.04.156E627 Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by imf27.hostedemail.com (Postfix) with ESMTP id 9DC7440009 for ; Mon, 5 Feb 2024 12:05:58 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b="jXlC/27v"; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf27.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134758; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2AUEdbvoftHMg2AkT8RS6u9G56Zx/z0Bmygk8dweA74=; b=scMqYu6yx8UwqCxl5Mxj7RPxGPlTot7GtW1GnsUUjIOxXcbIRtZvCSA3aD3gavYC42DJKS F8tTGdWkJBzUs6wlv+j7iafDX5vsMS4Ty5Elu+jMyx74xcctnyLRjdwENfjM0vJwsdygTW TuilfBKXdi90le0gJE8qJaC+fZX0Byo= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b="jXlC/27v"; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf27.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134758; a=rsa-sha256; cv=none; b=hOauNB5poW6Kmr3nDibQiHqmuW0ZMbJjxkaE2A1ZqvTtKBd22POChjc+c2jvUlwXOKk8CV XE6a3He8kocZPaHQXGz9fDu0WvYnjp/fWkQpZJulGp/RA1oKKIm7nfRzV6mc25bnk+djfu n/eITryQeMF9Fuj3Z6l6DiMk7uxg52M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134758; x=1738670758; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2AUEdbvoftHMg2AkT8RS6u9G56Zx/z0Bmygk8dweA74=; b=jXlC/27vV4NXEiaotGKEkf/n78ZOBFYDligP/bum6PA2zDvcpN1B6Gk7 KupFHQz0+vHvkLuJ+muCE4+RDJ4x1MWyUOWN6uIEOHn5apMhCA7tWruU5 5ZB1698j5/qYUokPgobd9+X645SEe8iO2P0kvK1o0UDh1sOYhf53WCarE U=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63755948" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:05:57 +0000 Received: from EX19MTAEUB001.ant.amazon.com [10.0.17.79:13316] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.45.85:2525] with esmtp (Farcaster) id 596cc4dd-1066-4e32-b40f-00d6ab21896d; Mon, 5 Feb 2024 12:05:56 +0000 (UTC) X-Farcaster-Flow-ID: 596cc4dd-1066-4e32-b40f-00d6ab21896d Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB001.ant.amazon.com (10.252.51.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:56 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:49 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 16/18] vfio: support not mapping IOMMU pgtables on live-update Date: Mon, 5 Feb 2024 12:02:01 +0000 Message-ID: <20240205120203.60312-17-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D042UWB002.ant.amazon.com (10.13.139.175) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspamd-Queue-Id: 9DC7440009 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 4pu8cp1f8c8nw814wonpedixscqipwi6 X-HE-Tag: 1707134758-776707 X-HE-Meta: U2FsdGVkX1/HzIGwvVuWu8shUn4eMNQpgXdY9Ddcs4VujI5yGOwBbgDhje/hKe9j08dhJk6EDbKJ/NaD4UhancI4Kf7+oC0CXNaJhD6oHjh+eNCszejsf6cl4PoBFz1QFQPHkreWPleC5TJPdEjwJnyNuJ+in/7Kn5T90MR3FraGMlaSHV4xyNAOIdSi3mvH9iZ/V80M0YENlE96u7xO9fHMk2Hos+bUiMNIa+zYt5piq+X1qlu0hz3L2AuuktjGcH1KdT6hnwm3ztsV2Bg6uIvHSw/YuAaQbRqj4u5DOP5MQNc/hhvSQFNwOp9lDTXImCBNEx+DUAIgYufz45o6PE/d7sPPcxf9yWjZzSUNiwtj5TRFCZjwzkoWud1z7Rpi7ePZEQu/srOsxO/lbCY+GKy/cH/wInBTPq25bBOfBYBoZNfm8jbgkcLGZ3tFC3q+EYcbrKAYO030AaMXARqd9CTvGO8yEXvQ8Op4F+1D7/xbkAuRxvSpky7z0tlpitcPe/Q+ftd8EIgATUtkIck0/i/BBByDzp0/TrW17QOFmNFfcCgMk2/CbCUJbu79ig/RTOgnrsCzsJtcKG8z5KJI3W/qTZpTaH8s9repEbBLRQED3F5wK6wOpYvk7zD27LJNOeNNKX0nv8xVcOXh2nfaQJh8hY4N9Gc/P5WRAdTTBTBAwp9k8ocgwPFnXb2APDlHDfXlqh0vtl4RS3JVqeAxwKgc1PtyOoiGS92diL10Ue4TKPGz1Dn00/uXwpdQ5WNUNnwhYTR8W7+6gzAsLwdUZ0inwf1N5FaRaq6OReKK7bJZ/ayKfxu2NZsFZri5zcIqODTx49dSeKvHyRnvIm4nhPnkNCBsi02w80XTVBHKoEOz0nqoT3odOB8uZg2zvLYOjINd/L3NaaN/h86JOMwh5QWsw8uJCsXeLVoRgsUBilK9QTbvqyG7rCSlHj9D/JXGhR7SjAg5rtbAna5xjwQ ueM1PD0L EY6HzFZUcvu4jgo7gK7wqW0onF682QXG7kS7jYthuUTiCE7H01uPQooE3HmZTZ5PaESkZAzI2RNIeOcauuR/ZE8kxRDS51k8STZAzm0fHn7J7QKyH6cr+cE9zV0fMFMq4smeOsm7UgZofkZHlZf7t8xKsIHW+hhFshfKpZ88nzpMZetfszmHGdK7yqYzAccMauehp4+Koi3zYIHmaI5Qacq58SvEWIko36dK8HVOZf6WiLNoucg+BUS4W3AYLmfKlw5W0/IjO2tPnqpijYiSsx691PhtyUafaRe3lgoQRot3xpyxZJZ2CB4Vc7HqoQFpRLca0KXOOX2YqDml9E9uuXkDNp+HJ4SdRNr7If0Lu8geTnV+Q8rpTeX69Lyeroz9jgCgf0ldzg4/jpAv4L+5tyKMZo1GWmxDt6cInNtsm4iaCq765eP8mK5YuZIsAvwMnmFP2KOJcTLhikMF0JLSBzgMZGHnJYdgqAWm6QR64ZjgTZIvL1kdYeB37raE7HBnwz5kg16tFBNQgE2RvEXb8TmCegbiMGEsueBemYKahEu9mjlg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When restoring VMs after live update kexec, the IOVAs for the guest VM are already present in the persisted page tables. It is unnecessary to clobber the existing pgtable entries and it may introduce races if pgtable modifications happen concurrently with DMA. Provide a new VFIO MAP_DMA flag which userspace can supply to inform VFIO that the IOVAs are already mapped. In this case VFIO will skip over the call to the IOMMU driver to do the mapping. VFIO still needs the MAP_DMA ioctl to set up its internal data structures about the mapping. It would probably be better to move the persistence one layer up and persist the VFIO container in pkernfs. That way the whole container could be picked up and re-used without needing to do any MAP_DMA ioctls after kexec. --- drivers/vfio/vfio_iommu_type1.c | 24 +++++++++++++----------- include/uapi/linux/vfio.h | 1 + 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index b36edfc5c9ef..dc2682fbda2e 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1456,7 +1456,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova, } static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, - size_t map_size) + size_t map_size, unsigned int flags) { dma_addr_t iova = dma->iova; unsigned long vaddr = dma->vaddr; @@ -1479,14 +1479,16 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, break; } - /* Map it! */ - ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, - dma->prot); - if (ret) { - vfio_unpin_pages_remote(dma, iova + dma->size, pfn, - npage, true); - vfio_batch_unpin(&batch, dma); - break; + if (!(flags & VFIO_DMA_MAP_FLAG_LIVE_UPDATE)) { + /* Map it! */ + ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, + dma->prot); + if (ret) { + vfio_unpin_pages_remote(dma, iova + dma->size, pfn, + npage, true); + vfio_batch_unpin(&batch, dma); + break; + } } size -= npage << PAGE_SHIFT; @@ -1662,7 +1664,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, if (list_empty(&iommu->domain_list)) dma->size = size; else - ret = vfio_pin_map_dma(iommu, dma, size); + ret = vfio_pin_map_dma(iommu, dma, size, map->flags); if (!ret && iommu->dirty_page_tracking) { ret = vfio_dma_bitmap_alloc(dma, pgsize); @@ -2836,7 +2838,7 @@ static int vfio_iommu_type1_map_dma(struct vfio_iommu *iommu, struct vfio_iommu_type1_dma_map map; unsigned long minsz; uint32_t mask = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE | - VFIO_DMA_MAP_FLAG_VADDR; + VFIO_DMA_MAP_FLAG_VADDR | VFIO_DMA_MAP_FLAG_LIVE_UPDATE; minsz = offsetofend(struct vfio_iommu_type1_dma_map, size); diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index fa9676bb4b26..d04d28e52110 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1536,6 +1536,7 @@ struct vfio_iommu_type1_dma_map { #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */ #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */ #define VFIO_DMA_MAP_FLAG_VADDR (1 << 2) +#define VFIO_DMA_MAP_FLAG_LIVE_UPDATE (1 << 3) /* IOVAs already mapped in IOMMU before LU */ __u64 vaddr; /* Process virtual address */ __u64 iova; /* IO virtual address */ __u64 size; /* Size of mapping (bytes) */ From patchwork Mon Feb 5 12:02:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545362 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7A15C48291 for ; Mon, 5 Feb 2024 12:06:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 821BD6B00A6; Mon, 5 Feb 2024 07:06:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D06D6B00A7; Mon, 5 Feb 2024 07:06:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 699A66B00A8; Mon, 5 Feb 2024 07:06:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 55B296B00A6 for ; Mon, 5 Feb 2024 07:06:06 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 343D5A1CF6 for ; Mon, 5 Feb 2024 12:06:06 +0000 (UTC) X-FDA: 81757621932.04.7D553AE Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) by imf04.hostedemail.com (Postfix) with ESMTP id 3F9CE4001A for ; Mon, 5 Feb 2024 12:06:04 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b="U/ckuN0e"; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf04.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.154 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134764; a=rsa-sha256; cv=none; b=o59rgJZ7fApBe5f9aUb/S/UAzx0V9uQ33EC2wWHxfcOlAutcIASkj0v68Y+gAER4HUqdbg Ce/z7A6jP1LuYh8AlX9PXgM83uT9imLc6CeCcCyF7oNIMuB4Albobi24T1gkZ/3WY4mmh8 WpRgfjt0n2ZAgF5SRrdspBd4bDT7xXc= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b="U/ckuN0e"; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf04.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 52.119.213.154 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134764; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PZz9j1l+xmnCAt/gSMnG+c7uMbg+JDxTUK6mJnmY1gY=; b=IeW4Try0rLVlABrXj9M6xXGBORHCD6qswgB253BboPxbTr9vea3B9Mx9NNbp6qYcUjf/Nt o4Ywzz67ilvcYFA53vw4yIAYu0uzZH1O1sciI8w1PrtsBIL8cEdHM7uNtN5Omz1G2H+Wa+ sxqcJMuMrUemYT446Jcy/0cp3jQSM54= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134764; x=1738670764; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PZz9j1l+xmnCAt/gSMnG+c7uMbg+JDxTUK6mJnmY1gY=; b=U/ckuN0efPirCZOhg0jyhphmPGDn1IRr2LwtDRkRQrbeEPzEH/weHts0 YorB72I59IZN2MAX3ID5nYprUDZrZXdTGi5vEM9OQ86aEqJ/1XC0FI3SJ tUE++vkfb1i2+c74YrOeZMtCEQPg8DChdCuFjpne0BfHmvsXSSpti5eCb g=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="182633597" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:06:04 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.17.79:57869] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.144:2525] with esmtp (Farcaster) id 1dbb430b-a9a2-4718-957f-9c92058f05d9; Mon, 5 Feb 2024 12:06:02 +0000 (UTC) X-Farcaster-Flow-ID: 1dbb430b-a9a2-4718-957f-9c92058f05d9 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:06:02 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:56 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 17/18] pci: Don't clear bus master is persistence enabled Date: Mon, 5 Feb 2024 12:02:02 +0000 Message-ID: <20240205120203.60312-18-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D042UWB002.ant.amazon.com (10.13.139.175) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3F9CE4001A X-Stat-Signature: 6p31bmspbqhupgdx8kfxyancobefdyny X-HE-Tag: 1707134764-264053 X-HE-Meta: U2FsdGVkX19fU3H6dB6M7aABOOPTZOwTyZjp+EjJaqIJ8gVAXkqF3eZyluep4U24hQZkes1kVVjqKJkRYuGu3oVSGQHFkRBVmRlxBLWzyW+aVNt3Gm51fFMMoaTNvCVoUp3tBTLhQW8J8HdX3aflHp9tAl0O8YjvkfbfWjzRTm0SX/Ak1C7TnKwnizyf/l8b0XS/vjwKw8BC6hPPtklozaRrOk3l/oJbpya1yLj1ZLQe2gciMqTMBcwP4mRKjWlKzbZKbngA5ZdjaY+QSBEYfREKITcdC8JwyZFpa6jjU2sNI+SBXpwKe6ONUn+LcwqiAveerIWJ5bfutCEFF66MeupXk1KOeY+4Qakawo2Da0PBYrUwYse0c0pTX6qZPqpdosH2hsNY2I61dTfW1UYDQCBAdNnjMGpVBpYQjXnqinNqlYkOIXZeUQijAhw6vEpXfL6QEiarudPuiZ5axvACK4p4ZR6LeciRV+lFElcrNNAtvKAS0ywGotsD1I4b8kXsocr1yahI/WpM0QvkY83Hkaa+3v2Wi+StnOqIBGxbL4+U5+DhHLLgXl5BgBwLPeP0SbWpuNLenzuaTNXepn8gObTQmEaiPWb6514yqmUebxsMnUNsWItQ3KCxDrxk1EY8yXx2slnYeoxD4cCrOArB97psLmNpzWMy8fV6pMQrJ9ayNw8H+GuUqkAqLRsaaI7u9ZGXtePzzyyZkconWVfie7mpWpcnmIbGa16uNekbDdCLfyv178nAKt/SaAc7RUC2qMBeVakdtfWMcqJdpCKH34cO/jMDmK9GywuX2vM4/G+KqVrcspfW5+1v5gTlFIZ01fTLefMIs6HCi3ZpuKLVtJgE3UrU+jlTSj6+va3r9p0LrpSY5vzc2dIirq8CpSb0JEuac2zRlB0ykYtI5WAwI9CXv50intY3YAqCPiMw+8xr7KfbiVshPza+G/m5ofiyBbIBVBOs7DQ4LBTLpvd 5GCmTe2t pzNxmU1sR88xQt4GUXMZpzzOJ8JZHRLlbtImd6WHGqUff1lKywyf34YoCmI+M70K7nPZ+G032Gtkz1GHRbEIBjwuPBKozeuGDLwL/XuOewuw40oBS4gDZuRFV8BDnoq57vyWV6tLR3CRSRxhsNhF2wyISkb/qDoXKMD89/9eekVMxxaX8W4gNRZg7qYQ3kpaHMWLZJAjrDQyDPSiVCcYEEqax5In46JuRihOhrmSBIJ2Oq5GkQ8X4yIH9HFYqK2vF4D9amBAyzBnwyDm6uYsc+K5yMxMOmYGyG6ghO2AnybCgFY8xjxWFV2nQUCxXfObSmHqUQGz8u1HKYgnuMyUMkzT+0lAiChyKgJfUcrI7k4O3d5y7zBmUZjxoq/sKhaRCqYw8i4T035r/kkR1vSexKXmjqqUclVQUT0RJQMTW9gsXgd9kECyJiZDzpEFdhvNLKk3RCEKkbLMLnWT67BKmXbq5pAHIv6wXC3PrrF3rXMvuli+Wvmf1mOBsRseiHHShxkJK1VQHoPTwNJVXeuErNXhCXQueoKzgErySuPjPlLQUnlg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In order for persistent devices to continue to DMA during kexec the bus mastering capability needs to remain on. Do not disable bus mastering if pkernfs is enabled, indicating that persistent devices are enabled. Only persistent devices should have bus mastering left on during kexec but this serves as a rough approximation of the functionality needed for this pkernfs RFC. --- drivers/pci/pci-driver.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 51ec9e7e784f..131127967811 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -519,7 +520,8 @@ static void pci_device_shutdown(struct device *dev) * If it is not a kexec reboot, firmware will hit the PCI * devices with big hammer and stop their DMA any way. */ - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) + if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot) + && !pkernfs_enabled()) pci_clear_master(pci_dev); } From patchwork Mon Feb 5 12:02:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 13545364 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E64DC48291 for ; Mon, 5 Feb 2024 12:06:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3B888D0001; Mon, 5 Feb 2024 07:06:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DEC106B00AB; Mon, 5 Feb 2024 07:06:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC6DD8D0001; Mon, 5 Feb 2024 07:06:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B800A6B00AA for ; Mon, 5 Feb 2024 07:06:40 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 99CF4A03E0 for ; Mon, 5 Feb 2024 12:06:40 +0000 (UTC) X-FDA: 81757623360.07.65B5318 Received: from smtp-fw-2101.amazon.com (smtp-fw-2101.amazon.com [72.21.196.25]) by imf26.hostedemail.com (Postfix) with ESMTP id 9BE1B140018 for ; Mon, 5 Feb 2024 12:06:38 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=KS4I5EHo; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf26.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 72.21.196.25 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707134798; a=rsa-sha256; cv=none; b=dU+Dz0jf8hMYvrYLqZG/TTFI1QS3Eb6oCJQXyPqIkrQcnb9u7kNOQtHA6836et4lqisTY7 dz8FTGhtzBe1EBa1SI26vihhhgyampPTfreW+/GdshAc6t6bSnDF8RS/xHXJOAaPdhUXB7 IezM4abMXMXwy5TTKwLfAed3CqT4usM= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=KS4I5EHo; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf26.hostedemail.com: domain of "prvs=75897cb1d=jgowans@amazon.com" designates 72.21.196.25 as permitted sender) smtp.mailfrom="prvs=75897cb1d=jgowans@amazon.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707134798; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FiBUV1fiPqoy2U4uMejLoWWd/EiydMvzfGCqIJIuYJY=; b=jXzOKQV6UcC55t1dhDx9NxnVIALEd+fUq7AWpNClKqXKGLMuQDOVaEggCUg7UCThs5itQn MukSVEruRTFSMbXfVtwM/lucYgFpOUjROwmO3UgP0Y7w6tgRrd4gtJntG2Has3arIbcDqB DryE0VlFnjyqKovOf/x/c/3EyBzYyFk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134799; x=1738670799; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FiBUV1fiPqoy2U4uMejLoWWd/EiydMvzfGCqIJIuYJY=; b=KS4I5EHoPb/GLevhTJXhrqZGA19Gu0J1gr1dLxgNL42X9GGMElWCxcV6 TMkSKYaGibsuGrDtJ/BVIk12A5B2F6AeD63lMV41v1myZ+ZGI6khCfgDA KtYbjAX5a0RIQlGh00w6P5OeRBJ1H9xpdzO4LHNt/rm5sA/HwFOxllbTC U=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="378967633" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-2101.iad2.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:06:35 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.17.79:8094] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.45.85:2525] with esmtp (Farcaster) id 26f74936-a3bc-4ca0-9d65-59fbb5471a6c; Mon, 5 Feb 2024 12:06:32 +0000 (UTC) X-Farcaster-Flow-ID: 26f74936-a3bc-4ca0-9d65-59fbb5471a6c Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:06:32 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:06:26 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 18/18] vfio-pci: Assume device working after liveupdate Date: Mon, 5 Feb 2024 12:02:03 +0000 Message-ID: <20240205120203.60312-19-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> MIME-Version: 1.0 X-Originating-IP: [172.19.112.191] X-ClientProxiedBy: EX19D044UWB004.ant.amazon.com (10.13.139.134) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9BE1B140018 X-Stat-Signature: 3zbwes88tdxwwm3w36redunsiyucpsfp X-HE-Tag: 1707134798-260343 X-HE-Meta: U2FsdGVkX1/w66Z8PFdhAE9/pSMTwh4mqg5AI5m3ZpS5Pkbpwf0Ta7/ICwrzNPdl9lzJjcvqNdYN11TsEW3WWfbf5czFczipR/J/JWvEI6O7n1QYAnK3fbFikKHuwnxfhIsthdvg2qldCHTf5O6Xzl5QjZGZGmp60hiT6uDZBwvD8zZSDnesFCeLXnrDtZsbdniqgAEs4aimrA4jaP3Zh86VCbXnLoZt/xvmg6bKro8EijB1ZMaTXCHNxJhj2sJ+cPkclE/txmhYPsHZyqSGhoQkLuDKN8CPuFsT6hZ3nyNCHDZsVDOogyI1VAb6em21hrqa/uvW0xAYOOhxJb9VtdkHq7ZjH8p6+91SKiwe5Lpu0P3vx7Yo7cyW7ih8PZQX8Zl5C19fyZHzxSn9n+7/TpJ2lzo4UQPIoX4kh5LXAfRIp94oQeAmOclvopBAJtrpJ8nBKjd0Sef44ymd5TerIyaNtmEKREslHnde+Agr1L95teTpsw5kxmj+Z8azEuHcr5ozpAMwgfBuytzBR7IHmMdRsgXX4wcLbGzN0RvAzzTrAriSp/1bDLUAL/IRq2F9ORQ2QbsQ9sHpqCVdTxB4bU5vPMbMMHcjhed9gbDImq/4UaICW41zBiqZE4RfdvyGgTNwDI0QzbHGwg0vVtKdBTCSV3igH8mD++COZuyOtMUkcrdoUVn7FR74MzQyemZmKeetxUyNse73bMQ5q7lqqwTV2Pp3ePzx+tjsu25KH0fzxe2MF6SD9o6roTDqh70doCGH7aPWBuMSknmUZP6urYVDbyw0Akk5LJdbRnvrGAFyqWnnWUuJaRGE8tVMAsbLIg+0udUCikyNRljoGu194LAwrUd6yGen2Jvxk8SIZVX+7NcDbLwERvHJ8lH2h6YJ7b5yUmw0q704LWrWLS1hr93vCyI+5Rr27Wg85FUGrjHbLMMqUmK8dwcn4XLHsTsjK8BYK5kzMM0yyaCDK3V GYGuxnQf 4GR4Wcy91u4FmHMHu0qhQAB10S1j1EIAvKVD8HURZoGpKb2sjyAVEqmpYacQgQqQKgSgikOE2AhjV4Pbxk5muUZkDGhUW8Z0hcjpI/nyqK6t6Nii4wea4HK6qeissFBxfEip+Fa/8RvHGw4SBg8PpNlSCCAzLRQ/Rg27QQxZcP2Vy0OnNFzqAqD9osHJTHz9s8opIALyrjcci5A+5aGuQw+ncONOwf4RKhBOkEZaVP+NYbA3kitnA9m3zQCCvZfdsgkOjuvl0Y7NeW64w8bHPF5Bi/jOa1c2ssz5Np3ioY9Bmo3j9ccN6YIUlcyfZhtEaHo9VDVAkH+nOX4EsqJYFCWVxyjNfizLLt4DDSvdmVZjBs0jK1HbsgwREDq6V/RIjYS7F X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When re-creating a VFIO device after liveupdate no desctructive actions should be taken on it to avoid interrupting any ongoing DMA. Specifically bus mastering should not be cleared and the device should not be reset. Assume that reset works properly and skip over bus mastering reset. Ideally this would only be done for persistent devices but in this rough RFC there currently is no mechanism at this point to easily tell if a device is persisted or not. --- drivers/vfio/pci/vfio_pci_core.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 1929103ee59a..a7f56d43e0a4 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -480,19 +480,25 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) return ret; } - /* Don't allow our initial saved state to include busmaster */ - pci_clear_master(pdev); + if (!liveupdate) { + /* Don't allow our initial saved state to include busmaster */ + pci_clear_master(pdev); + } ret = pci_enable_device(pdev); if (ret) goto out_power; - /* If reset fails because of the device lock, fail this path entirely */ - ret = pci_try_reset_function(pdev); - if (ret == -EAGAIN) - goto out_disable_device; + if (!liveupdate) { + /* If reset fails because of the device lock, fail this path entirely */ + ret = pci_try_reset_function(pdev); + if (ret == -EAGAIN) + goto out_disable_device; - vdev->reset_works = !ret; + vdev->reset_works = !ret; + } else { + vdev->reset_works = 1; + } pci_save_state(pdev); vdev->pci_saved_state = pci_store_saved_state(pdev); if (!vdev->pci_saved_state)