From patchwork Mon Apr 29 17:04:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647397 Received: from mail-ot1-f51.google.com (mail-ot1-f51.google.com [209.85.210.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D6438627A for ; Mon, 29 Apr 2024 17:04:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410290; cv=none; b=JwFxbUgziLSt3798dKj8TVP4e+NcAzgdQqGBhltT+KVGt+blzSTQf5bKHE1cnjkRPOy9syFCJmuNul8GEHUo/pk6YFRASh7jcR9EMbrAnaxqv/xsk4A8ZCKUYhRI7jQTBitvySlm4K7GG3io277/XpHyx43DZEymeeQzQ9EZ3iE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410290; c=relaxed/simple; bh=DnBFz4/Se6xVWS99x9m7QEYEtuOZb/pR9I7lm8YTTnM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=R7g/Sa69Z+NHRc3+ypRdlMwdZY7MVPe/1lGVJizmqLSka+8PxtlxKLSaMnRjSdC38Ezv1k4xg9gBcXxNFxnGf4AQxGKdKKf3unnODUD9I6raZHdt4u+r2g/Jg57CiWrQfLUrrafgJIyERYLfBE5oqz8xD72Mo4NsmIJ7XGeBk4s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GnssSQAP; arc=none smtp.client-ip=209.85.210.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GnssSQAP" Received: by mail-ot1-f51.google.com with SMTP id 46e09a7af769-6ee2d64423cso896718a34.2 for ; Mon, 29 Apr 2024 10:04:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410288; x=1715015088; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=jhow96kP5BU3qcQBMxGZEfW9W+QoKa4rPIJ24CtlxHk=; b=GnssSQAPdRGbmDbWnDZsYYJ35u8r9r3/9vEsg+Xe0u9W1CRiN1m8xNkAjom6sRKkLT tP/MKl8sETKylE8rrqAPIX/PUcklGjp2L7d+Q5WyLRAlxHDXDaCIaBeSS0M+B7FcLFdD XKaUbBqhS2IJ6iNc1B6+7YzIvPRLOZ0r0rkjqYcuCfmhnV4fc9y37JC4laJ0CjsFDzYd AoWTH+X+h8/LBPUhlmDipIIJ0pc1+ICdKGWzfyzn80vqjA6KARV9+NwQOdVoW+otu1j3 uEGqY60wS7ZL4ASEa1CTL4fGgDIbL94JFpipuq67zXPqv+S2XmYJjxKp7aLAOhzfau9p YL+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410288; x=1715015088; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=jhow96kP5BU3qcQBMxGZEfW9W+QoKa4rPIJ24CtlxHk=; b=qiUS/IDJNyGPEHQxV58AbXK2p5MoRJmg/s7qpxYBYtwm/CIItFWmvTcU8hSGive4rJ Re28Lab+2eV5slDk/NaK06xfVg1zIBPzA+HonSc84GqWdw3tsGXcRGIdHTjpCNrugsFr HoF6ecKSoiav0LrrQoDLRf3t9TXyuo/bVELCEyyd2BeoohbBe7T3vLi/alodUmD9y+Cm fQfuSIiGl4HcmePdtBaqdt9jJNR2lc38c2dh1W2fWDqNcK5Pw6U8gxzcA6TYC+NSjjr5 tqvvRYeODXVok4QN9KxWvDMYQDFweI0FKpL80YNl3N273SXV8NjM8BkwR+bnXitjLxi0 pJmA== X-Forwarded-Encrypted: i=1; AJvYcCX0kpWv0EqyJlcNFE5p6WLtnLDu3tG0DEBLtp+cWoAasTFZhdWBJA6FtHCtfG+Bk3FyQQmzbGywb7SSr2jbPTRKsZuosxmD X-Gm-Message-State: AOJu0Yyzl3CDcpT42JnhXliQR1W+VmkxARfni1e4VKuj+SpaHehmJL71 kBShqbulDTTBWeizAXcnNha7jlmZiuzTDM1l5YK6y56b0nm5WJUy X-Google-Smtp-Source: AGHT+IFT0esK+nrgP4Zf50tdR4YgA9tgrSYLGG5TA0ZvSJostVvZqlCI2BgehfoxtpiJFe8p6wC3NQ== X-Received: by 2002:a05:6830:19cc:b0:6eb:d847:ff8a with SMTP id p12-20020a05683019cc00b006ebd847ff8amr8346147otp.9.1714410288151; Mon, 29 Apr 2024 10:04:48 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.04.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:04:47 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 01/12] famfs: Introduce famfs documentation Date: Mon, 29 Apr 2024 12:04:17 -0500 Message-Id: <0270b3e2d4c6511990978479771598ad62cf2ddd.1714409084.git.john@groves.net> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 * Introduce Documentation/filesystems/famfs.rst into the Documentation tree and filesystems index * Add famfs famfs.rst to the filesystems doc index * Add famfs' ioctl opcodes to ioctl-number.rst * Update MAINTAINERS FILE Signed-off-by: John Groves Reviewed-by: Bagas Sanjaya --- Documentation/filesystems/famfs.rst | 135 ++++++++++++++++++ Documentation/filesystems/index.rst | 1 + .../userspace-api/ioctl/ioctl-number.rst | 1 + MAINTAINERS | 9 ++ 4 files changed, 146 insertions(+) create mode 100644 Documentation/filesystems/famfs.rst diff --git a/Documentation/filesystems/famfs.rst b/Documentation/filesystems/famfs.rst new file mode 100644 index 000000000000..792785598d6a --- /dev/null +++ b/Documentation/filesystems/famfs.rst @@ -0,0 +1,135 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _famfs_index: + +================================================================== +famfs: The kernel component of the famfs shared memory file system +================================================================== + +- Copyright (C) 2024 Micron Technology, Inc. + +Introduction +============ +Compute Express Link (CXL) provides a mechanism for disaggregated or +fabric-attached memory (FAM). This creates opportunities for data sharing; +clustered apps that would otherwise have to shard or replicate data can +share one copy in disaggregated memory. + +Famfs, which is not CXL-specific in any way, provides a mechanism for +multiple hosts to use data in shared memory, by giving it a file system +interface. With famfs, any app that understands files (which is almost +all apps) can access data sets in shared memory. Although famfs +supports read and write, the real point is to support mmap, which +provides direct (dax) access to the memory - either writable or read-only. + +Shared memory can pose complex coherency and synchronization issues, but +there are also simple cases. Two simple and eminently useful patterns that +occur frequently in data analytics and AI are: + +* Serial Sharing - Only one host or process at a time has access to a file +* Read-only Sharing - Multiple hosts or processes share read-only access + to a file + +The famfs kernel file system is part of the famfs framework; User space +components [1] handle metadata allocation and distribution, and direct the +famfs kernel module to instantiate files that map to specific memory. + +The famfs framework manages coherency of its own metadata and structures, +but does not attempt to manage coherency for applications. + +Famfs also provides data isolation between files. That is, even though +the host has access to an entire memory "device" (as a dax device), apps +cannot write to memory for which the file is read-only, and mapping one +file provides isolation from the memory of all other files. This is pretty +basic, but some experimental shared memory usage patterns provide no such +isolation. + +Principles of Operation +======================= + +Without its user space components, the famfs kernel module doesn't do +anything useful. The user space components maintain superblocks and +metadata logs, and use the famfs kernel component to provide a file system +view of shared memory across multiple hosts. + +Each host has an independent instance of the famfs kernel module. After +mount, files are not visible until the user space component instantiates +them (normally by playing the famfs metadata log). + +Once instantiated, files on each host can point to the same shared memory, +but in-memory metadata (inodes, etc.) is ephemeral on each host that has a +famfs instance mounted. Like ramfs, the famfs in-kernel file system has no +backing store for metadata modifications. If metadata mutations are ever +persisted, that must be done by the user space components. However, +mutations to file data are saved to the shared memory - subject to write +permission and processor cache behavior. + + +Famfs is Not a Conventional File System +--------------------------------------- + +Famfs files can be accessed by conventional means, but there are +limitations. The kernel component of famfs is not involved in the +allocation of backing memory for files at all; the famfs user space +creates files and passes the allocation extent lists into the kernel via +the per-file FAMFSIOC_MAP_CREATE ioctl. A file that lacks this metadata is +treated as invalid by the famfs kernel module. As a practical matter files +must be created via the famfs library or cli, but they can be consumed as +if they were conventional files. + +Famfs differs in some important ways from conventional file systems: + +* Files must be pre-allocated by the famfs framework; Allocation is never + performed on (or after) write. +* Any operation that changes a file's size is considered to put the file + in an invalid state, disabling access to the data. It may be possible to + revisit this in the future. (Typically the famfs user space can restore + files to a valid state by replaying the famfs metadata log.) + +Famfs exists to apply the existing file system abstractions to shared +memory so applications and workflows can more easily adapt to an +environment with disaggregated shared memory. + +Memory Error Handling +===================== + +Possible memory errors include timeouts, poison and unexpected +reconfiguration of an underlying dax device. In all of these cases, famfs +receives a call via its iomap_ops->notify_failure() function. If any +memory errors have been detected, Access to the affected famfs mount is +disabled to avoid further errors or corruption. Testing indicates that +a famfs instance that has encountered errors can be unmounted cleanly, but +Repairing memory errors or corruption is outside the scope of famfs. + +Key Requirements +================ + +The primary requirements for famfs are: + +1. Must support a file system abstraction backed by sharable dax memory +2. Files must efficiently handle VMA faults +3. Must support metadata distribution in a sharable way +4. Must handle clients with a stale copy of metadata + +The famfs kernel component takes care of 1-2 above by caching each file's +mapping metadata in the kernel. + +Requirements 3 and 4 are handled by the user space components, and are +largely orthogonal to the functionality of the famfs kernel module. + +Requirements 3 and 4 cannot be met by conventional fs-dax file systems +(e.g. xfs and ext4) because they use write-back metadata; it is not valid +to mount such a file system on two hosts from the same in-memory image. + + +Famfs Usage +=========== + +Famfs usage is documented at [1]. + + +References +========== + +- [1] Famfs user space repository and documentation + https://github.com/cxl-micron-reskit/famfs diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index 1f9b4c905a6a..0fe2c70a106f 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -87,6 +87,7 @@ Documentation for filesystem implementations. ext3 ext4/index f2fs + famfs gfs2 gfs2-uevents gfs2-glocks diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index c472423412bf..ac407802cf10 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -289,6 +289,7 @@ Code Seq# Include File Comments 'u' 00-1F linux/smb_fs.h gone 'u' 20-3F linux/uvcvideo.h USB video class host driver 'u' 40-4f linux/udmabuf.h userspace dma-buf misc device +'u' 50-5F linux/famfs_ioctl.h famfs shared memory file system 'v' 00-1F linux/ext2_fs.h conflict! 'v' 00-1F linux/fs.h conflict! 'v' 00-0F linux/sonypi.h conflict! diff --git a/MAINTAINERS b/MAINTAINERS index ebf03f5f0619..3f2d847dcf01 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8180,6 +8180,15 @@ F: Documentation/networking/failover.rst F: include/net/failover.h F: net/core/failover.c +FAMFS +M: John Groves +M: John Groves +M: John Groves +L: linux-cxl@vger.kernel.org +L: linux-fsdevel@vger.kernel.org +S: Supported +F: Documentation/filesystems/famfs.rst + FANOTIFY M: Jan Kara R: Amir Goldstein From patchwork Mon Apr 29 17:04:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647398 Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE1AE86136 for ; Mon, 29 Apr 2024 17:04:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410295; cv=none; b=iFAxHRFKo1vg+Yi1mt1MPb/0nFrwak1PordGAPLOEQaEgsaeTitPkerAhGsWKZG1oh5g5ztudzKd3EDqngghKkBoGawOJoMRdN4yQNMeqSLAZvgXqMBv1IEjMDE65fqiV2iNovT7nQrbq9y+zE0A1fDcpM8R7czBQ/rAgsNxTL0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410295; c=relaxed/simple; bh=WMjz9J531Kmle80GG+B28JoC2U8bxRD8YtN4xwrFy4U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=O9TnZUbTMIn1ThSAdPw2U+t8xF8X1eXHVx7wXM6XsZke8ydV+QdKBC1wuTzGYBGO8U9ubwOjn4vhDM/TeB6H8A4p6lELXxHNzu5Jauo53mwT+Pi1pvTgEXEP+7jo3gGTQZ9nsPA08SD+4hgScUEDpO2BoLDJgRNawvQc5IYkhas= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MLtvyUWQ; arc=none smtp.client-ip=209.85.210.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MLtvyUWQ" Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-6ea2ac4607aso2546445a34.3 for ; Mon, 29 Apr 2024 10:04:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410293; x=1715015093; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=yFzk9HeKkizf0o48ibh6kc3Q60VfzVMF4+JwNubBTrc=; b=MLtvyUWQqlQf6ouGRmrH7B3Y2ED2Fxgx3gtAdiH9k/pWDaDTwDzdFeyIMcvR3Mz3J3 U3UEcij84m4Kt7BPFGiZFJFceSnakdZ0MsTeyYXQtaP8BPxbOUKyJ6u8BzxKFuA4ISGv LlvoA5rvYMV7bdpX90yrJgOeVSCYTD93ihs8LlbJbUmzfTg56NClidmw1A1iLmHDH5Ue HPLOhKiRVOZIkwPt+/HIDQHBHC3Yi0UoXjVq5TCKvqGK0TBB75zHrr7YHGOnu6PaCYIl S0rU4gN9dhXxPMCAzZZzPZ+jBo+cNntz/IzklovXOGZgrmdOyHViJFNbX+ewpBfI3s7I 2vhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410293; x=1715015093; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=yFzk9HeKkizf0o48ibh6kc3Q60VfzVMF4+JwNubBTrc=; b=BoLX/KL/SEoi6ObLtQ0eIWQrzERScqYCX4vZXQY1hb6EAHTTwK/LiU6VLuwk44B9f5 fg5XZCILtJEX5SdlZeReeMoSP0PfLYMwgc1ETIfpDH8y385ENyTCiu0lVGoX1hpwKT53 dCdGuQ+5hoxxAHHwROe1z2VzQfhNDkkQuSUqmTnSTtdMMzCIrYjKjx1HAWhJPa3SjowG 8tZ/Ee8N3DO4qlF7DVo+OZKy9/c+exTmz/ftAq52FTWJwiBzn3baZBxSQjAuAI4pAhdc MxEcFfHAVwHlBEWKu7TLUo5b4Bvs/qgsOxvsUQc+GCYb1CI1sDNasGRYfSuRvK8iMwhA KDFQ== X-Forwarded-Encrypted: i=1; AJvYcCUX/6FWSdocP1BHCQmvRzZ6qtdGEolNT169EujGuIqmNF1XhQkrBr2TAfYPY1hPd+86kSbE3dhSiAA5szVyeKAVmCzqnPkX X-Gm-Message-State: AOJu0Yx0MZWG9bBIMVojmGp9QJzx3x1hVYfUdq306kVaHnQK1mQ+QFsq 1q0TZcs6kTYIVWYuh+/aDDIQg4a/ym011Qg216h14skP5WlSP7wM X-Google-Smtp-Source: AGHT+IH1LmxDBYpYdV8qfbKzAt1ETJgRjycszT1zAA1fNj6kpgVTUiKrSaeYD5ARjcBkfK0lW4EGeg== X-Received: by 2002:a05:6830:4a2:b0:6eb:7c52:fd19 with SMTP id l2-20020a05683004a200b006eb7c52fd19mr12585723otd.16.1714410292768; Mon, 29 Apr 2024 10:04:52 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.04.49 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:04:52 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 02/12] dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c Date: Mon, 29 Apr 2024 12:04:18 -0500 Message-Id: <552c86dd6c3c4252994a94e23bad2cb95e3ed392.1714409084.git.john@groves.net> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 No changes to the function - just moved it. dev_dax_iomap needs to call this function from drivers/dax/bus.c. drivers/dax/bus.c can't call functions in drivers/dax/device.c - that creates a circular linkage dependency - but device.c can call functions in bus.c. Also exports dax_pgoff_to_phys() since both bus.c and device.c now call it. Signed-off-by: John Groves --- drivers/dax/bus.c | 24 ++++++++++++++++++++++++ drivers/dax/device.c | 23 ----------------------- 2 files changed, 24 insertions(+), 23 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 797e1ebff299..f894272beab8 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1447,6 +1447,30 @@ static const struct device_type dev_dax_type = { .groups = dax_attribute_groups, }; +/* see "strong" declaration in tools/testing/nvdimm/dax-dev.c */ +__weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, + unsigned long size) +{ + int i; + + for (i = 0; i < dev_dax->nr_range; i++) { + struct dev_dax_range *dax_range = &dev_dax->ranges[i]; + struct range *range = &dax_range->range; + unsigned long long pgoff_end; + phys_addr_t phys; + + pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1; + if (pgoff < dax_range->pgoff || pgoff > pgoff_end) + continue; + phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start; + if (phys + size - 1 <= range->end) + return phys; + break; + } + return -1; +} +EXPORT_SYMBOL_GPL(dax_pgoff_to_phys); + static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data) { struct dax_region *dax_region = data->dax_region; diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 93ebedc5ec8c..40ba660013cf 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -50,29 +50,6 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma, return 0; } -/* see "strong" declaration in tools/testing/nvdimm/dax-dev.c */ -__weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, - unsigned long size) -{ - int i; - - for (i = 0; i < dev_dax->nr_range; i++) { - struct dev_dax_range *dax_range = &dev_dax->ranges[i]; - struct range *range = &dax_range->range; - unsigned long long pgoff_end; - phys_addr_t phys; - - pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1; - if (pgoff < dax_range->pgoff || pgoff > pgoff_end) - continue; - phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start; - if (phys + size - 1 <= range->end) - return phys; - break; - } - return -1; -} - static void dax_set_mapping(struct vm_fault *vmf, pfn_t pfn, unsigned long fault_size) { From patchwork Mon Apr 29 17:04:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647399 Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 951518626C for ; Mon, 29 Apr 2024 17:04:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410300; cv=none; b=HLf8ns998n5SfyGKnKri/aGCtrxWCenBFJOsu12Cdl0kJkEsPWhRJJiVk84ShyIC3e3T7OIt7Es9CTDE33NkIjeRaNgFRpsInTp4ZX33gSih79PykmNX9o/p1rfY04F3Vp03AbUL629e0LQ+G83cNeENonRKDlDerWXHANFb5zo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410300; c=relaxed/simple; bh=rgBnb8kNelGFoEuF+9gSifpAE2erEJQ3IXeaCnjnJIE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=h7ADVe84Sed5OJUgHi9wvzafWSbMscWUNeSh4r11/DsPe8lYChiBi8T20UVQjWrYEkqBcvH/6gcQXY/h4PRaDWhhCc3t07nzLPWv7b7yFzDdcDhwDSNMfrMBy+CrLyp8FjU1Vo6q2mYQID1A+4dWyWrML9ieWRc7mpigIZZxFz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Hump5lni; arc=none smtp.client-ip=209.85.210.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Hump5lni" Received: by mail-ot1-f50.google.com with SMTP id 46e09a7af769-6eb812370a5so3003309a34.0 for ; Mon, 29 Apr 2024 10:04:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410298; x=1715015098; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=cbJ8300Bg7dObjiydXebhYyYyzlw5PN8FKXGfdtJkHo=; b=Hump5lniHWS/qSq4DjWaLZaKKI88X1DsF0FvjYpFCnQ+Da7FCsFdKroXd30bT2jw3t wXjsIv1ry6ZVGy1h0xjx+1OF+zllNgv4jLXNZTtsghxZDL9ZKRHLKDzWJ8kMFfLcWeB0 gBZeoeqEAF1cMBoB/PxaJSTeLnirG2mFM0u8QZY6JJA9TUGNmEycLSjzNRN/uyBVSqkZ THe1ZXqsdp+RKKXCooPns9G0WoK/iOg3WqgdALEEjXyLRlXrBd75Gm5VcbUjbem9Tae0 /7tb0dfea0Say9w4wCZb6WOgfpNngOi+Izc0yQsCFPtxjxpJ6kGVF3zuSbDd4g6hWGoB F4qQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410298; x=1715015098; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=cbJ8300Bg7dObjiydXebhYyYyzlw5PN8FKXGfdtJkHo=; b=mDXYmUK99XEH9FnmnJvBYBmcTmRMaiFWE4nUW7YPlHTVCvFWmVhd67YWpca0SbLIIH clU+7uVAxZQ1hnj8f0Kq11c7FVGBvf4G7bMTpinItqIHjsuHId/dZTiliIdMwXzeZDtA szEjsCyQp+ic4aPxVMwJFSw0sA1BpXUHPUnspCk6BZC94/+hzrm9WZqfUUha3Al8xbcB IdG/r2xoTxaparC6p1f2ZV9BF7d4FtTl47+UPIRLr2I1LSbs3HgUFtBnTHYcHnvfKx3D ky9BsaFdb4mnUhgdAoVhjl//kw2wNcHKmPsHV4R+6D438kE5Zg6zKNTQ14Z3/xrRwolz ujSA== X-Forwarded-Encrypted: i=1; AJvYcCWV5TLV55r0SxPIh+rYhyONEkduiD0LBa6nJEjSCNuc0JXLDjTMWHazTgg6MKl0gUGJx9RsTRjDo/ysTFl1blwDFVfM42op X-Gm-Message-State: AOJu0YxUOTuD94rnmVBnV5k7flzCU7vUQWU6m+KQ2zCpIWYLkgKtOaRI azbun8yu8S7tJkl99fpfdeASC8S1E5uMIzCS9sjn7XYRvhOOnn2m X-Google-Smtp-Source: AGHT+IGTSUZCw51+UtLT9AMWjJRVxBGYIxlCGo0X8H6A2GGnenuf0ViFidr/sXJeRaXyY8aUr+2TQw== X-Received: by 2002:a9d:7981:0:b0:6ee:2d1e:10f9 with SMTP id h1-20020a9d7981000000b006ee2d1e10f9mr4520928otm.15.1714410297664; Mon, 29 Apr 2024 10:04:57 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.04.54 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:04:57 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 03/12] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage Date: Mon, 29 Apr 2024 12:04:19 -0500 Message-Id: X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This function should be called by fs-dax file systems after opening the devdax device. This adds holder_operations, which effects exclusivity between callers of fs_dax_get(). This function serves the same role as fs_dax_get_by_bdev(), which dax file systems call after opening the pmem block device. This also adds the CONFIG_DEV_DAX_IOMAP Kconfig parameter Signed-off-by: John Groves --- drivers/dax/Kconfig | 6 ++++++ drivers/dax/super.c | 30 ++++++++++++++++++++++++++++++ include/linux/dax.h | 5 +++++ 3 files changed, 41 insertions(+) diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index a88744244149..b1ebcc77120b 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -78,4 +78,10 @@ config DEV_DAX_KMEM Say N if unsure. +config DEV_DAX_IOMAP + depends on DEV_DAX && DAX + def_bool y + help + Support iomap mapping of devdax devices (for FS-DAX file + systems that reside on character /dev/dax devices) endif diff --git a/drivers/dax/super.c b/drivers/dax/super.c index aca71d7fccc1..4b55f79849b0 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -122,6 +122,36 @@ void fs_put_dax(struct dax_device *dax_dev, void *holder) EXPORT_SYMBOL_GPL(fs_put_dax); #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */ +#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP) +/** + * fs_dax_get() + * + * fs-dax file systems call this function to prepare to use a devdax device for + * fsdax. This is like fs_dax_get_by_bdev(), but the caller already has struct + * dev_dax (and there * is no bdev). The holder makes this exclusive. + * + * @dax_dev: dev to be prepared for fs-dax usage + * @holder: filesystem or mapped device inside the dax_device + * @hops: operations for the inner holder + * + * Returns: 0 on success, <0 on failure + */ +int fs_dax_get(struct dax_device *dax_dev, void *holder, + const struct dax_holder_operations *hops) +{ + if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode)) + return -ENODEV; + + if (cmpxchg(&dax_dev->holder_data, NULL, holder)) + return -EBUSY; + + dax_dev->holder_ops = hops; + + return 0; +} +EXPORT_SYMBOL_GPL(fs_dax_get); +#endif /* DEV_DAX_IOMAP */ + enum dax_device_flags { /* !alive + rcu grace period == no new operations / mappings */ DAXDEV_ALIVE, diff --git a/include/linux/dax.h b/include/linux/dax.h index 9d3e3327af4c..4a86716f932a 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -57,6 +57,11 @@ struct dax_holder_operations { #if IS_ENABLED(CONFIG_DAX) struct dax_device *alloc_dax(void *private, const struct dax_operations *ops); + +#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP) +int fs_dax_get(struct dax_device *dax_dev, void *holder, const struct dax_holder_operations *hops); +struct dax_device *inode_dax(struct inode *inode); +#endif void *dax_holder(struct dax_device *dax_dev); void put_dax(struct dax_device *dax_dev); void kill_dax(struct dax_device *dax_dev); From patchwork Mon Apr 29 17:04:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647400 Received: from mail-oa1-f46.google.com (mail-oa1-f46.google.com [209.85.160.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E05888627A for ; Mon, 29 Apr 2024 17:05:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410304; cv=none; b=ozVOUAJVsBhpE4ue7BqZhlJ7xkIEjk84mpJmrpRxUq1YeWBBMd2LQp7Hnp0aHLZc4IK2XINE3yidN718Vdr/HF6t3b0Ozqmv51rKzYds/lTV+LjdAthdO4DKC4ygxWJ8MQhKaFnt8sfWDLLqYJ85YZjNqVNr8wxkRe5VFiwQE5I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410304; c=relaxed/simple; bh=MF/nGBkB6SU7LTC2q8o8OtAo1cFLHGL5yhdgeaFsrRw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IwKGU/CwuqsxGMNV8czlk+dBnQQPaFZSDoOS/WONjexFp0bi0UKRxr5TlUVkbPgczm4VhmGGLRPNfksEYmKQzniy4HqgKzsibPK09cEJ6dxuL63paE3nMJ7fFmAoZ6UQVEoPZtZf4jvUitUoCUxG+dFDr439zmRsMKTxYpUG+Eg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=drYrqjdq; arc=none smtp.client-ip=209.85.160.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="drYrqjdq" Received: by mail-oa1-f46.google.com with SMTP id 586e51a60fabf-23319017c4cso3049434fac.2 for ; Mon, 29 Apr 2024 10:05:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410302; x=1715015102; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=BpFg6AryW/38Pe75AIWbtaiD3q42kDFfLzhRRyJ/+BY=; b=drYrqjdqthgsN8A6efHz2b0gKkGt5vJB+ZNCvpqYGbdpL2WVEYpMfqTsASbEZlClCu Et6MO7+nHMObHhRWqzpg3l1sPn5IiVb1NATshjpNtJwdc6iXrtA3d9ZilQ0he3kCH68Z XrsPDHmzPZWDeMuphruTW0jYpFoMaviWt4dQ1SfqwRzruNSSguuvsmFflclZzPbJMFHc ik8zsKtV2eUJtg8VUHAnBV19E+lyWoyvfdBESNWwder/RkMnz/JbisEPnPUzCxzx90Pd U270NPHmJ/th3D2CgmyxHyBgWGp4bqDYZHzyGbQvqlwyRwiRKVcmdgafXAsVtMNgY7wD gb4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410302; x=1715015102; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=BpFg6AryW/38Pe75AIWbtaiD3q42kDFfLzhRRyJ/+BY=; b=K1TiVe/ndMlIubn7Txs2ZDi1sHAcg/QlQgE1t56q6spAUm0kTsfJYfxyFwgof77xRY ONjPAUVCdaT0ZhWMsqQdl2vhwKhdirVGl4fb/5lzN8eRHCHXOuOCkBuC9CcS5OLbk60y zBve4n77yWMtNVZmi/4jQHgvWyrjnQiHCA2+hTWDdUW4gTCo+chBilRh8lJ2AVLyauo0 aLpHJaninxtccdYmGoCO1hLEUPJYj0JhbNLv2FmiyxlgrWLd3O2Y99XNGRtmIqQa3myW dE8oF8yaMyPD/RUSwbBYaZ/PaUX7wz2qprFrUgQFVR+A4SSmKrDt8l9++CzkWECrWd9Q ya/w== X-Forwarded-Encrypted: i=1; AJvYcCUf1SteTfh0Kq8NZKs+8ZcPLFbAYA5kCVINiUqJ76zt+2B4ppubgpF4xr+06cC9g+M3cxuRc3sIMphLwEy9lDIvx9f7plWF X-Gm-Message-State: AOJu0Yxl64b8/nkcG8EE915Ot3GmcADZIvIZ7uZdlMgqm5m4vN7wgX4I AcFs86DejsyQ0yiT5ZzBj5zylAJHqT8ANUEjIxksARcJkIqWxuAV X-Google-Smtp-Source: AGHT+IFVJywrgMfv62xcHp0KCFD6JQ/TzPkywYZ5Go7vHf3u7Bi5WE1VU9aozhj5LAVxYHRlvKWs8w== X-Received: by 2002:a05:6870:224f:b0:23c:ad86:9935 with SMTP id j15-20020a056870224f00b0023cad869935mr3417616oaf.45.1714410301811; Mon, 29 Apr 2024 10:05:01 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.04.59 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:05:01 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 04/12] dev_dax_iomap: Save the kva from memremap Date: Mon, 29 Apr 2024 12:04:20 -0500 Message-Id: X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Save the kva from memremap because we need it for iomap rw support. Prior to famfs, there were no iomap users of /dev/dax - so the virtual address from memremap was not needed. Also: in some cases dev_dax_probe() is called with the first dev_dax->range offset past the start of pgmap[0].range. In those cases we need to add the difference to virt_addr in order to have the physaddr's in dev_dax->ranges match dev_dax->virt_addr. This happens with devdax devices that started as pmem and got converted to devdax. I'm not sure whether the offset is due to label storage, or page tables, but this works in all known cases. Signed-off-by: John Groves --- drivers/dax/dax-private.h | 1 + drivers/dax/device.c | 15 +++++++++++++++ 2 files changed, 16 insertions(+) diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 446617b73aea..df5b3d975df4 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -63,6 +63,7 @@ struct dax_mapping { struct dev_dax { struct dax_region *region; struct dax_device *dax_dev; + void *virt_addr; unsigned int align; int target_node; bool dyn_id; diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 40ba660013cf..17323b5f6f57 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -372,6 +372,7 @@ static int dev_dax_probe(struct dev_dax *dev_dax) struct dax_device *dax_dev = dev_dax->dax_dev; struct device *dev = &dev_dax->dev; struct dev_pagemap *pgmap; + u64 data_offset = 0; struct inode *inode; struct cdev *cdev; void *addr; @@ -426,6 +427,20 @@ static int dev_dax_probe(struct dev_dax *dev_dax) if (IS_ERR(addr)) return PTR_ERR(addr); + /* Detect whether the data is at a non-zero offset into the memory */ + if (pgmap->range.start != dev_dax->ranges[0].range.start) { + u64 phys = dev_dax->ranges[0].range.start; + u64 pgmap_phys = dev_dax->pgmap[0].range.start; + u64 vmemmap_shift = dev_dax->pgmap[0].vmemmap_shift; + + if (!WARN_ON(pgmap_phys > phys)) + data_offset = phys - pgmap_phys; + + pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx offset=%llx shift=%llx\n", + __func__, phys, pgmap_phys, data_offset, vmemmap_shift); + } + dev_dax->virt_addr = addr + data_offset; + inode = dax_inode(dax_dev); cdev = inode->i_cdev; cdev_init(cdev, &dax_fops); From patchwork Mon Apr 29 17:04:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647401 Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB7BC86642 for ; Mon, 29 Apr 2024 17:05:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410309; cv=none; b=ERjgffuesZGljSOc5CL5H0AYL2DMQOCcYxXRK/eBPnXNjmiOf0mvjYi1xdvPv0GZjZ51DYgplPM7SgK8iJCuK3NswUNlqAAHNVoYrdd5+PCe95U8d2opTlOYII5KjVdDetWvQAsUWQ0hNLB8Z498JxRT0cCGhHYQ5t5Ke5aZDKo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410309; c=relaxed/simple; bh=qjg/aYIA2Aj3ETwT1bnW30oXWKlF7YzrJy/VAO7WdCo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GNSfVgGy7QLFfgxROEbrxYP1CAi1PspfGUm1JCDcQEcCR4Fh9+aMf3r/Zru1E0vZY04vrtPEdx2hJeAr8lQgYxWc7CcCupdXbV8ztC+UX2L2P9J5EXSiEP0p0cRTby5/SL7Xgn8dw9WLDCsDi3qm7fANUPq7BNjvT0hLNZ/+s6E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JKjbJwOh; arc=none smtp.client-ip=209.85.210.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JKjbJwOh" Received: by mail-ot1-f50.google.com with SMTP id 46e09a7af769-6ee4dcc4567so421405a34.3 for ; Mon, 29 Apr 2024 10:05:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410307; x=1715015107; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=rbyzPWFSpme/6Xxw3JGZg5PIHNbasBNXPhuMHV0OHPQ=; b=JKjbJwOh9MZ+1U5PJ+RutY1TvP/XbF6QnGzEsH3gdi8IcvUTxRcBdH1lw5mL92eVwz b1tiem+tZwYCJTzoAJbZEuEdaPnsFQ46cQ3OUOqPOHS5Xo4k1sOsUdEdiUUQyChoOx1U 1M5ppZ3YRQcWgS34pZOyCmgFm1eKGx5gj7R58DDlhgtk93XzUpZCP9RDqnH/r3HRjewE YiG4zPxZ9XsBk6eLjhwBkNC9rvNrv4dhVqBie5G1ZwBYMyFw/yU6ZEcjoyaZLNAHTYIX ayxhxXHZ6xGb5Lb5UhPLxzxhgX5bDgSiwu7wWj0pSpTCqZF946rro9aX15Z+ay+hYwp6 hoQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410307; x=1715015107; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=rbyzPWFSpme/6Xxw3JGZg5PIHNbasBNXPhuMHV0OHPQ=; b=BA8oZSMwCkzzf9Y8+qd/3icwa36t/vUVHyZYWTeFaNMkGc/ECYeLOzK1+aFaM71RdQ yXc4RC83d3yaXU2yPKQFvKMskZb36vHTosnWfG66llvo3bA7Pz1v9ApWWy8uC+XfA82r 3LtORkH6UKtVgasb1dWIx2NPYR4wWgtUPAGmL8hgnox6U1s9uVtZayHGAwPIwaPvQkYh syo/rJy+uVZW4NKaCXImIUN4r2QGOjgSSqj6XY4snemFBiD4RI/LXYEerT4hzYcBOgRD WdR5SEfzfiBxZ5oPvj2b8am9EzWBzbD+KV3FGGzlaWgcnccWXTRfl8/Wl6M05gZi+7mp 5viw== X-Forwarded-Encrypted: i=1; AJvYcCWCwhy0uzFZ9jZaK9XOB/KHaevEJhSGocTI/L8Bh/kTWi/O1jshe2Rc4leWH+tzvZ9SnfBZFUFHXREj3R8LNJQ5Ns0KL+oN X-Gm-Message-State: AOJu0YxtZkLo/8Z8Tbvg1H+Tbl+3fTpuRPSWh8VzFcRTKXv2NXkCx5/J fcvtkHeGyYVU5MsviMHvzMAhlgkrtjO3aoqdjbJh5alqKuuRfbO4 X-Google-Smtp-Source: AGHT+IGjO7mh1AyfnujisR4C9AKwUM+jghociccM9M53gOQe5r8ngqJlfGiDhTWRhS5N4AbdDyMemw== X-Received: by 2002:a05:6830:20cd:b0:6ee:32f0:ec4e with SMTP id z13-20020a05683020cd00b006ee32f0ec4emr4182582otq.31.1714410306753; Mon, 29 Apr 2024 10:05:06 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.05.04 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:05:06 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 05/12] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax Date: Mon, 29 Apr 2024 12:04:21 -0500 Message-Id: <2a8b926ce25a9ef242c933fa451b29401e62bb37.1714409084.git.john@groves.net> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Notes about this commit: * These methods are based on pmem_dax_ops from drivers/nvdimm/pmem.c * dev_dax_direct_access() is returns the hpa, pfn and kva. The kva was newly stored as dev_dax->virt_addr by dev_dax_probe(). * The hpa/pfn are used for mmap (dax_iomap_fault()), and the kva is used for read/write (dax_iomap_rw()) * dev_dax_recovery_write() and dev_dax_zero_page_range() have not been tested yet. I'm looking for suggestions as to how to test those. Signed-off-by: John Groves --- drivers/dax/bus.c | 120 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 115 insertions(+), 5 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index f894272beab8..9c57d4139b74 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -7,6 +7,10 @@ #include #include #include +#include +#include +#include +#include #include "dax-private.h" #include "bus.h" @@ -1471,6 +1475,105 @@ __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, } EXPORT_SYMBOL_GPL(dax_pgoff_to_phys); +#if IS_ENABLED(CONFIG_DEV_DAX_IOMAP) + +static void write_dax(void *pmem_addr, struct page *page, + unsigned int off, unsigned int len) +{ + unsigned int chunk; + void *mem; + + while (len) { + mem = kmap_local_page(page); + chunk = min_t(unsigned int, len, PAGE_SIZE - off); + memcpy_flushcache(pmem_addr, mem + off, chunk); + kunmap_local(mem); + len -= chunk; + off = 0; + page++; + pmem_addr += chunk; + } +} + +static long __dev_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, + long nr_pages, enum dax_access_mode mode, void **kaddr, + pfn_t *pfn) +{ + struct dev_dax *dev_dax = dax_get_private(dax_dev); + size_t size = nr_pages << PAGE_SHIFT; + size_t offset = pgoff << PAGE_SHIFT; + void *virt_addr = dev_dax->virt_addr + offset; + u64 flags = PFN_DEV|PFN_MAP; + phys_addr_t phys; + pfn_t local_pfn; + size_t dax_size; + + WARN_ON(!dev_dax->virt_addr); + + if (down_read_interruptible(&dax_dev_rwsem)) + return 0; /* no valid data since we were killed */ + dax_size = dev_dax_size(dev_dax); + up_read(&dax_dev_rwsem); + + phys = dax_pgoff_to_phys(dev_dax, pgoff, nr_pages << PAGE_SHIFT); + + if (kaddr) + *kaddr = virt_addr; + + local_pfn = phys_to_pfn_t(phys, flags); /* are flags correct? */ + if (pfn) + *pfn = local_pfn; + + /* This the valid size at the specified address */ + return PHYS_PFN(min_t(size_t, size, dax_size - offset)); +} + +static int dev_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff, + size_t nr_pages) +{ + long resid = nr_pages << PAGE_SHIFT; + long offset = pgoff << PAGE_SHIFT; + + /* Break into one write per dax region */ + while (resid > 0) { + void *kaddr; + pgoff_t poff = offset >> PAGE_SHIFT; + long len = __dev_dax_direct_access(dax_dev, poff, + nr_pages, DAX_ACCESS, &kaddr, NULL); + len = min_t(long, len, PAGE_SIZE); + write_dax(kaddr, ZERO_PAGE(0), offset, len); + + offset += len; + resid -= len; + } + return 0; +} + +static long dev_dax_direct_access(struct dax_device *dax_dev, + pgoff_t pgoff, long nr_pages, enum dax_access_mode mode, + void **kaddr, pfn_t *pfn) +{ + return __dev_dax_direct_access(dax_dev, pgoff, nr_pages, mode, kaddr, pfn); +} + +static size_t dev_dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff, + void *addr, size_t bytes, struct iov_iter *i) +{ + size_t off; + + off = offset_in_page(addr); + + return _copy_from_iter_flushcache(addr, bytes, i); +} + +static const struct dax_operations dev_dax_ops = { + .direct_access = dev_dax_direct_access, + .zero_page_range = dev_dax_zero_page_range, + .recovery_write = dev_dax_recovery_write, +}; + +#endif /* IS_ENABLED(CONFIG_DEV_DAX_IOMAP) */ + static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data) { struct dax_region *dax_region = data->dax_region; @@ -1526,11 +1629,18 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data) } } - /* - * No dax_operations since there is no access to this device outside of - * mmap of the resulting character device. - */ - dax_dev = alloc_dax(dev_dax, NULL); + if (IS_ENABLED(CONFIG_DEV_DAX_IOMAP)) + /* holder_ops currently populated separately in a slightly + * hacky way + */ + dax_dev = alloc_dax(dev_dax, &dev_dax_ops); + else + /* + * No dax_operations since there is no access to this device + * outside of mmap of the resulting character device. + */ + dax_dev = alloc_dax(dev_dax, NULL); + if (IS_ERR(dax_dev)) { rc = PTR_ERR(dax_dev); goto err_alloc_dax; From patchwork Mon Apr 29 17:04:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647402 Received: from mail-oi1-f173.google.com (mail-oi1-f173.google.com [209.85.167.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3E3D126F21 for ; Mon, 29 Apr 2024 17:05:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410313; cv=none; b=p/G3x8svH6Rax8ZpnyxjBA3zD8FieDsz+l9Y/Mm4SeF7u2kY4WQp2C+CzufITkJK2nSxDIDkqisDvh8O5S2j7ajmkFcN6R/skRCDWPAZ3/tNLf2Sx9TkSN1zDdAClPXNNNbBWvXgyfdu/r9a3UsVkkokOiAs7V49+zlSqhA3bq4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410313; c=relaxed/simple; bh=Dq3IXVMAl3LwKTvbPabCyz/Wl0bJyojQgrN3Tsz/e+8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=UlWXzP3niVa8P0zYC8il+wwg0t9oaBZq4Cn+1RCEfiVVYYIo0pae/P7PnbhOQqEqTeEg/QB4/BQZDoKy1S4CMUIq6Jyo35XIAzX3iodv6vzRYcB3hYY1xCpMkexIKhMMD2n82dJQE7aSaU9gB5vNxclo7pkHlWVXp2tM0UuYXB4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=KorWQ7ys; arc=none smtp.client-ip=209.85.167.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KorWQ7ys" Received: by mail-oi1-f173.google.com with SMTP id 5614622812f47-3c5ee4ce695so1178340b6e.0 for ; Mon, 29 Apr 2024 10:05:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410310; x=1715015110; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=lyJucMVndmC161Z9DZrkXcMb4iGiZ+dSnJlqt7NeU2c=; b=KorWQ7yslkJZHjsj9ZOf81fCqW25i8cEKf8iD6Fse6ImQHU7UDmebLoySWa5bxRg1q vAGh2zbXm4EtyDXMju847AqcbDDp88FriFaVOqK9/su6zgQcszPqtSXLWTEEDJBweOAN kzGGNlFICNlXB1eQ3RTlBuM5J4XLfb8yzjjFJa1zNabGLassNsH12RRWGOyKrAYX3IwK 9h6SGFebff+zehGx6DT83EmnPuEqz9IBdF3JCuspBxWuBfsXEkzYSUU0gMECRbcxtQCi ebHq0L5/8tiuzfAjS7XbzrVdiqBG8mtE6vVh+jZ9WGoWR7Gr7+0QNUNtV141De9metk+ lr7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410310; x=1715015110; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=lyJucMVndmC161Z9DZrkXcMb4iGiZ+dSnJlqt7NeU2c=; b=APVIrlTioS+hTY2N82YrCxB8UgdkUtskVsUndVS4xqYwuO+1r2ui0IYlAYy2EtRuCs FqJ/WlGLpr0kcz+V1Idr2HcuXGA8cS+b2TXWIKKjklCWhP760U8BHapgLcIEMPaPsFbC WrdIBZ/g5vjXZeo/U0UXh4cn00NBTtoLwsxsn9hGCGNL8yCOsCBAoI6ya2sbO/ouX9JH fJX3JYmTvfNdYylPHjCC2GAEH6r7aeXvgOu4aD/pwojmRt8Eb6Q1bz8glPRRByMUkX38 bJL4aJbbPimO4exo6gJfhKVQtHt6gmIBbx4pdNOB1NfydM9ICR+Nal/8KrTKyZb9qAB9 PQ1A== X-Forwarded-Encrypted: i=1; AJvYcCU/OoFckZRYtvQr3ZetCUYf7JdSmuKNsQ7yhLxq7XGJepzlP4OqQIDb6Qdx6SHAGi5E/ysBSQ3t8A5hbyhqRLT2DutvWo39 X-Gm-Message-State: AOJu0Yx/H88vRWOWX0N3EWmmrEY7WsENj12HYZakx/hs4CeXpcvQ/fsg 4hGLnWktjyMU2qRVqkfYMwgTdMW62N7IA7SAQ/fux8oD6CATUiDt X-Google-Smtp-Source: AGHT+IHCqWTpNSXlvdUZw5ucdR3X7rIc7toNINCsl6MqALgYwnAAdl+NtVvIB095bO1OU6/08opuLg== X-Received: by 2002:a05:6871:a4ca:b0:229:faa9:3b35 with SMTP id wb10-20020a056871a4ca00b00229faa93b35mr12606534oab.21.1714410310612; Mon, 29 Apr 2024 10:05:10 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.05.07 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:05:10 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 06/12] dev_dax_iomap: export dax_dev_get() Date: Mon, 29 Apr 2024 12:04:22 -0500 Message-Id: X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 famfs needs access to dev_dax_get() Signed-off-by: John Groves --- drivers/dax/super.c | 3 ++- include/linux/dax.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 4b55f79849b0..8475093ba973 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -452,7 +452,7 @@ static int dax_set(struct inode *inode, void *data) return 0; } -static struct dax_device *dax_dev_get(dev_t devt) +struct dax_device *dax_dev_get(dev_t devt) { struct dax_device *dax_dev; struct inode *inode; @@ -475,6 +475,7 @@ static struct dax_device *dax_dev_get(dev_t devt) return dax_dev; } +EXPORT_SYMBOL_GPL(dax_dev_get); struct dax_device *alloc_dax(void *private, const struct dax_operations *ops) { diff --git a/include/linux/dax.h b/include/linux/dax.h index 4a86716f932a..29d3dd6452c3 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -61,6 +61,7 @@ struct dax_device *alloc_dax(void *private, const struct dax_operations *ops); #if IS_ENABLED(CONFIG_DEV_DAX_IOMAP) int fs_dax_get(struct dax_device *dax_dev, void *holder, const struct dax_holder_operations *hops); struct dax_device *inode_dax(struct inode *inode); +struct dax_device *dax_dev_get(dev_t devt); #endif void *dax_holder(struct dax_device *dax_dev); void put_dax(struct dax_device *dax_dev); From patchwork Mon Apr 29 17:04:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647403 Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAC3186651 for ; Mon, 29 Apr 2024 17:05:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410318; cv=none; b=CNuZE3B6ugXxZTmCdRlOdBzA1qkflY1Jby43y+sxFLuOcJbKzGUl2Fvngm3PZ0YeArW5yDjfEIfjZX2pP+bYuDEOtt4Me16oqX9K761ZLKBsLIgSlN2DW3b+M3VKRILNquzVmXKj7EqeE6yTDkL604iWAptMxgMyPanwuSrm7Pc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410318; c=relaxed/simple; bh=soONSsHFI6nzOHKUp35TcQ0GKddlcVxDmE7JFRha/Gg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=KkLM6XNRA5tZ4pHvAj52RW7mXCQe4lZdzwwFm3Gl6WdMFHzu5ZnQNWulHQjjG7FD/sPE14gk6Mos8NFAyDTVRluXwBPimQWN4S0ht2O9GSXomHt+K1ZNko6SK7zx77GeNI5FC0TeK8o81NZQIvDfaJqrXjK3qI4CjSPHED0RA+o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=F3yAbXJY; arc=none smtp.client-ip=209.85.210.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="F3yAbXJY" Received: by mail-ot1-f50.google.com with SMTP id 46e09a7af769-6ee1b203f30so1095625a34.0 for ; Mon, 29 Apr 2024 10:05:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410315; x=1715015115; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=eRZMXpVDaupswF2EGFq0CRj0FomMbu4maf2unw0x+4s=; b=F3yAbXJYYtiEqgmc6prd65pf0ZG4kmXk/jALFPUQfO/Vci4o/iv8gKGrN8IsKKVo/5 gUZkPCw3SxcL6ezUKbB+iZtEBPiVktoquwQ9/A1OsFOsCU6Ad1ivrVomt79A9YHqKjs+ ek1lo1E01PdlWyEIGfgx7UlACjK98DVrPtJmmWyZKNfWsJ4omiYHoSztKzfUZ44EDm5I 1gLmeSoJWgtQdwxvjNP7SvQWCtKkQSlVy5M8uHIHIfcPW4qJ9jSdnDzCk6R87MUFNr6K 3EgC8vR7dFcS4uJaPN7uxnDdfumm4VZV6Wg+0uL2Yc15O0Pxk8yUojh2MxW1Tv1oBCgy uWvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410315; x=1715015115; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=eRZMXpVDaupswF2EGFq0CRj0FomMbu4maf2unw0x+4s=; b=EaZdkcDC3lNr01p+xivbTRk9qWILx42rGOY41cktfnAf97iylQgecXYW/npQiYLNoW SqDWfSWk8w4S2CTESgqPbHynoMhmr1LZ39/BYgzd+dUAPWgZO8fXHVHrQ3qvvwDsODjt aFC/WfxkeN9z2xlG1P541uWBLRFAEFRPUoke81WnpJkEj3o0Y0t8LPUqggjENGb989UK 0j6WhbAOmHoUIecKHsoS8tp9d4plwNGghCc7Fk1/6MQ8vP1ECtphdCKJuJdidbUObN6J c5rzNc5myocBXP5/1oB1SSnNpt/wgWz/XAN0/3M1cxzmYmad1IKTQbsyaVOExRSLxEli yXvA== X-Forwarded-Encrypted: i=1; AJvYcCUdNaANbMwEw4VhPnuxHsanwghhFhIhu0+mQdpPLHRRgNuSAikiI0sKG0BepvhYolxEKBriGi4FO890UIIgLRAOPgu7VfeC X-Gm-Message-State: AOJu0YxWvmMtkcj3GE/CTXwjo8Y8/QRBp6Upq0gpGdjnvAkAZtMaVRh6 8pDDlVAP5YX04jvrGOFPzAJWIi+zAmwApdMQ7IImeBLyUOu5Zyvt X-Google-Smtp-Source: AGHT+IFAZMCprMAfngbGfmfnfc0KD83EThJP15yBR4/UvGbdxcZfQLPaHwO+y+izxvLX28mjpzQ3QA== X-Received: by 2002:a05:6830:1516:b0:6ee:3710:231c with SMTP id k22-20020a056830151600b006ee3710231cmr3205587otp.2.1714410315062; Mon, 29 Apr 2024 10:05:15 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.05.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:05:14 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 07/12] famfs prep: Add fs/super.c:kill_char_super() Date: Mon, 29 Apr 2024 12:04:23 -0500 Message-Id: X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Famfs needs a slightly different kill_super variant than already existed. Putting it local to famfs would require exporting d_genocide(); this seemed a bit cleaner. Signed-off-by: John Groves --- fs/super.c | 9 +++++++++ include/linux/fs.h | 1 + 2 files changed, 10 insertions(+) diff --git a/fs/super.c b/fs/super.c index 69ce6c600968..cd276d30b522 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1236,6 +1236,15 @@ void kill_litter_super(struct super_block *sb) } EXPORT_SYMBOL(kill_litter_super); +void kill_char_super(struct super_block *sb) +{ + if (sb->s_root) + d_genocide(sb->s_root); + generic_shutdown_super(sb); + kill_super_notify(sb); +} +EXPORT_SYMBOL(kill_char_super); + int set_anon_super_fc(struct super_block *sb, struct fs_context *fc) { return set_anon_super(sb, NULL); diff --git a/include/linux/fs.h b/include/linux/fs.h index 8dfd53b52744..cc586f30397d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2511,6 +2511,7 @@ void generic_shutdown_super(struct super_block *sb); void kill_block_super(struct super_block *sb); void kill_anon_super(struct super_block *sb); void kill_litter_super(struct super_block *sb); +void kill_char_super(struct super_block *sb); void deactivate_super(struct super_block *sb); void deactivate_locked_super(struct super_block *sb); int set_anon_super(struct super_block *s, void *data); From patchwork Mon Apr 29 17:04:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647404 Received: from mail-ot1-f51.google.com (mail-ot1-f51.google.com [209.85.210.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3356C8627D for ; Mon, 29 Apr 2024 17:05:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410322; cv=none; b=JgSzWijPIrt6u7znTZiCnNtJndtzXKhqowvKQEOpFwpSHk8nrGFgrzmCqYVAMPDX0f5Qt2TBKzOWEH5ytoRgMqcrIQbRjZBTzy758Csfzphrl3EKOzbe1gEXER/9mDtyfkR0KJi0I/7LcGl+q36ks50SsUmwm5woVtryKGd5Uy0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410322; c=relaxed/simple; bh=jsPGyOwkC8X4WUgX0lbpTX6dCPCsii/eyuR6Mz+GpWs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gQKTGuI+VHXR8e+RbtRinSkfLeLvmS+UqivmFkM6K7Gk2Bp48BzHOW9yIkuiTVkFv1tiHpGxoeZY7f9ginvu050YjwRZ5hcXAPgEpmBt/Iq/8tBmbgzOS8GOOJlL00qi2Wp/gvu6YuEQywsHg1K/KLloqML/jlrvIxgWNhbwmNw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Z456yIR+; arc=none smtp.client-ip=209.85.210.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Z456yIR+" Received: by mail-ot1-f51.google.com with SMTP id 46e09a7af769-6ee575da779so205352a34.2 for ; Mon, 29 Apr 2024 10:05:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410319; x=1715015119; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=MilasPvhzvPiGTInd59RFs02ReQyQfw/VRapBOjtlfM=; b=Z456yIR+hNHPsm+4MXt54NoSNOZ2Cd1lTN0GKtujrA7wGtJdWhzTU0LoYpD0FDME7j mxJS1YPwC2WUYdZJQ/Gq2ZLrC7JnvxIDd2S5YxZUXeGw6Wc+c2WE04voUlud7zjWZD92 sDVNN/PuRIEI2appsNdt4HXxFyb6V9qHhSmzgjTqFQsiUOuDdKWd1NYDazIUIJXlwwfj Skt/weiEq6BPGOv2oaBG0ZA4vB1AL6TjNqJeoqhbq8nophqSQ7hF1Zmw0f35s93LS4YJ Xp6OQY7Mz5xtUiJnHFzE2lU4qzEATRqVTbO3vuYYlIRNotwne31FQiagR5Y3qiebzFVv djDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410319; x=1715015119; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=MilasPvhzvPiGTInd59RFs02ReQyQfw/VRapBOjtlfM=; b=uFiJ4IAfb1vWmM1p3ktMcOAXRtp+/I7l1rJdD/rKKPk+IWt3PP9UB0GdcP3gBmcXq0 T4ScYITG7xIUOuPm9rPu2sREK6WC0jlK+dDwQJW11FjFijEOfbNx4+gsX4I1g2HGsBgY XIqh3rDnhA6SaXhuur9ZAT8BNQm9L+rL2vPSPlwqHU5woY4nLi6YJrw8hyeVZuohstAE AqvCKaRVc4oVvZpdoDH89Ji+FobW9OrcfhqTcV/woZNSKYEVvW7jJgSoKqUKnGxkrvRb iW2g2kqEBY5eAk6k0ytsIN2ONr2Zlz1sII2Ei9hF1wMQJjDaP8aSeJnHBh6fIIdh2AFM PZ+Q== X-Forwarded-Encrypted: i=1; AJvYcCVh4ZJwMVq+Lj8Ai0lap6+V7zA/6fbtFe9fAF583XsWJ168l3nIDKHEe0cduhwtUq9YBtWeUC35FEkgqlJdy8Qp1qvOkJ7G X-Gm-Message-State: AOJu0YybCm1iZNcsJlksy4pb1u2lCD22n9rfBilw0/Ex7wSytShslGpl tpKtF9dQ22UYNJv8EHmf/5f5u7oW2ugQ6J/Dk++cjXZ7cONEhtbD X-Google-Smtp-Source: AGHT+IGvLXcUVeIkg6vV7OMCWw5YqAe1KWWabhYTgi9tEZ/mgodSlwYBR1JbErJxPxFlPTBLShZEXQ== X-Received: by 2002:a05:6830:16d5:b0:6ee:2a3a:566 with SMTP id l21-20020a05683016d500b006ee2a3a0566mr4642555otr.14.1714410318984; Mon, 29 Apr 2024 10:05:18 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.05.16 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:05:18 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 08/12] famfs: module operations & fs_context Date: Mon, 29 Apr 2024 12:04:24 -0500 Message-Id: <86694a1a663ab0b6e8e35c7b187f5ad179103482.1714409084.git.john@groves.net> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Start building up from the famfs module operations. This commit includes the following: * Register as a file system * Parse mount parameters * Allocate or find (and initialize) a superblock via famfs_get_tree() * Lookup the host dax device, and bail if it's in use (or not dax) * Register as the holder of the dax device if it's available * Add Kconfig and Makefile misc to build famfs * Add FAMFS_SUPER_MAGIC to include/uapi/linux/magic.h * Add export of fs/namei.c:may_open_dev(), which famfs needs to call * Update MAINTAINERS file for the fs/famfs/ path The following exports had to happen to enable famfs: * This uses the new fs/super.c:kill_char_super() - the other kill*super helpers were not quite right. * This uses the dev_dax_iomap export of dax_dev_get() This commit builds but is otherwise too incomplete to run Signed-off-by: John Groves --- MAINTAINERS | 1 + fs/Kconfig | 2 + fs/Makefile | 1 + fs/famfs/Kconfig | 10 ++ fs/famfs/Makefile | 5 + fs/famfs/famfs_inode.c | 345 +++++++++++++++++++++++++++++++++++++ fs/famfs/famfs_internal.h | 36 ++++ fs/namei.c | 1 + include/uapi/linux/magic.h | 1 + 9 files changed, 402 insertions(+) create mode 100644 fs/famfs/Kconfig create mode 100644 fs/famfs/Makefile create mode 100644 fs/famfs/famfs_inode.c create mode 100644 fs/famfs/famfs_internal.h diff --git a/MAINTAINERS b/MAINTAINERS index 3f2d847dcf01..365d678e2f40 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8188,6 +8188,7 @@ L: linux-cxl@vger.kernel.org L: linux-fsdevel@vger.kernel.org S: Supported F: Documentation/filesystems/famfs.rst +F: fs/famfs FANOTIFY M: Jan Kara diff --git a/fs/Kconfig b/fs/Kconfig index a46b0cbc4d8f..53b4629e92a0 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -140,6 +140,8 @@ source "fs/autofs/Kconfig" source "fs/fuse/Kconfig" source "fs/overlayfs/Kconfig" +source "fs/famfs/Kconfig" + menu "Caches" source "fs/netfs/Kconfig" diff --git a/fs/Makefile b/fs/Makefile index 6ecc9b0a53f2..3393f399a9e9 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -129,3 +129,4 @@ obj-$(CONFIG_EFIVAR_FS) += efivarfs/ obj-$(CONFIG_EROFS_FS) += erofs/ obj-$(CONFIG_VBOXSF_FS) += vboxsf/ obj-$(CONFIG_ZONEFS_FS) += zonefs/ +obj-$(CONFIG_FAMFS) += famfs/ diff --git a/fs/famfs/Kconfig b/fs/famfs/Kconfig new file mode 100644 index 000000000000..edb8980820f7 --- /dev/null +++ b/fs/famfs/Kconfig @@ -0,0 +1,10 @@ + + +config FAMFS + tristate "famfs: shared memory file system" + depends on DEV_DAX && FS_DAX && DEV_DAX_IOMAP + help + Support for the famfs file system. Famfs is a dax file system that + can support scale-out shared access to fabric-attached memory + (e.g. CXL shared memory). Famfs is not a general purpose file system; + it is an enabler for data sets in shared memory. diff --git a/fs/famfs/Makefile b/fs/famfs/Makefile new file mode 100644 index 000000000000..62230bcd6793 --- /dev/null +++ b/fs/famfs/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_FAMFS) += famfs.o + +famfs-y := famfs_inode.o diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c new file mode 100644 index 000000000000..61306240fc0b --- /dev/null +++ b/fs/famfs/famfs_inode.c @@ -0,0 +1,345 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * famfs - dax file system for shared fabric-attached memory + * + * Copyright 2023-2024 Micron Technology, inc + * + * This file system, originally based on ramfs the dax support from xfs, + * is intended to allow multiple host systems to mount a common file system + * view of dax files that map to shared memory. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "famfs_internal.h" + +#define FAMFS_DEFAULT_MODE 0755 + +static struct inode *famfs_get_inode(struct super_block *sb, + const struct inode *dir, + umode_t mode, dev_t dev) +{ + struct inode *inode = new_inode(sb); + struct timespec64 tv; + + if (!inode) + return NULL; + + inode->i_ino = get_next_ino(); + inode_init_owner(&nop_mnt_idmap, inode, dir, mode); + inode->i_mapping->a_ops = &ram_aops; + mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); + mapping_set_unevictable(inode->i_mapping); + tv = inode_set_ctime_current(inode); + inode_set_mtime_to_ts(inode, tv); + inode_set_atime_to_ts(inode, tv); + + switch (mode & S_IFMT) { + default: + init_special_inode(inode, mode, dev); + break; + case S_IFREG: + inode->i_op = NULL /* famfs_file_inode_operations */; + inode->i_fop = NULL /* &famfs_file_operations */; + break; + case S_IFDIR: + inode->i_op = NULL /* famfs_dir_inode_operations */; + inode->i_fop = &simple_dir_operations; + + /* Directory inodes start off with i_nlink == 2 (for ".") */ + inc_nlink(inode); + break; + case S_IFLNK: + inode->i_op = &page_symlink_inode_operations; + inode_nohighmem(inode); + break; + } + return inode; +} + +/* + * famfs dax_operations (for char dax) + */ +static int +famfs_dax_notify_failure(struct dax_device *dax_dev, u64 offset, + u64 len, int mf_flags) +{ + struct super_block *sb = dax_holder(dax_dev); + struct famfs_fs_info *fsi = sb->s_fs_info; + + pr_err("%s: rootdev=%s offset=%lld len=%llu flags=%x\n", __func__, + fsi->rootdev, offset, len, mf_flags); + + return 0; +} + +static const struct dax_holder_operations famfs_dax_holder_ops = { + .notify_failure = famfs_dax_notify_failure, +}; + +/***************************************************************************** + * fs_context_operations + */ + +static int +famfs_fill_super(struct super_block *sb, struct fs_context *fc) +{ + int rc = 0; + + sb->s_maxbytes = MAX_LFS_FILESIZE; + sb->s_blocksize = PAGE_SIZE; + sb->s_blocksize_bits = PAGE_SHIFT; + sb->s_magic = FAMFS_SUPER_MAGIC; + sb->s_op = NULL /* famfs_super_ops */; + sb->s_time_gran = 1; + + return rc; +} + +static int +lookup_daxdev(const char *pathname, dev_t *devno) +{ + struct inode *inode; + struct path path; + int err; + + if (!pathname || !*pathname) + return -EINVAL; + + err = kern_path(pathname, LOOKUP_FOLLOW, &path); + if (err) + return err; + + inode = d_backing_inode(path.dentry); + if (!S_ISCHR(inode->i_mode)) { + err = -EINVAL; + goto out_path_put; + } + + if (!may_open_dev(&path)) { /* had to export this */ + err = -EACCES; + goto out_path_put; + } + + /* if it's dax, i_rdev is struct dax_device */ + *devno = inode->i_rdev; + +out_path_put: + path_put(&path); + return err; +} + +static int +famfs_get_tree(struct fs_context *fc) +{ + struct famfs_fs_info *fsi = fc->s_fs_info; + struct dax_device *dax_devp; + struct super_block *sb; + struct inode *inode; + dev_t daxdevno; + int err; + + /* TODO: clean up chatty messages */ + + err = lookup_daxdev(fc->source, &daxdevno); + if (err) + return err; + + fsi->daxdevno = daxdevno; + + /* This will set sb->s_dev=daxdevno */ + sb = sget_dev(fc, daxdevno); + if (IS_ERR(sb)) { + pr_err("%s: sget_dev error\n", __func__); + return PTR_ERR(sb); + } + + if (sb->s_root) { + pr_info("%s: found a matching suerblock for %s\n", + __func__, fc->source); + + /* We don't expect to find a match by dev_t; if we do, it must + * already be mounted, so we bail + */ + err = -EBUSY; + goto deactivate_out; + } else { + pr_info("%s: initializing new superblock for %s\n", + __func__, fc->source); + err = famfs_fill_super(sb, fc); + if (err) + goto deactivate_out; + } + + /* This will fail if it's not a dax device */ + dax_devp = dax_dev_get(daxdevno); + if (!dax_devp) { + pr_warn("%s: device %s not found or not dax\n", + __func__, fc->source); + err = -ENODEV; + goto deactivate_out; + } + + err = fs_dax_get(dax_devp, sb, &famfs_dax_holder_ops); + if (err) { + pr_err("%s: fs_dax_get(%lld) failed\n", __func__, (u64)daxdevno); + err = -EBUSY; + goto deactivate_out; + } + fsi->dax_devp = dax_devp; + + inode = famfs_get_inode(sb, NULL, S_IFDIR | fsi->mount_opts.mode, 0); + sb->s_root = d_make_root(inode); + if (!sb->s_root) { + pr_err("%s: d_make_root() failed\n", __func__); + err = -ENOMEM; + fs_put_dax(fsi->dax_devp, sb); + goto deactivate_out; + } + + sb->s_flags |= SB_ACTIVE; + + WARN_ON(fc->root); + fc->root = dget(sb->s_root); + return err; + +deactivate_out: + pr_debug("%s: deactivating sb=%llx\n", __func__, (u64)sb); + deactivate_locked_super(sb); + return err; +} + +/*****************************************************************************/ + +enum famfs_param { + Opt_mode, + Opt_dax, +}; + +const struct fs_parameter_spec famfs_fs_parameters[] = { + fsparam_u32oct("mode", Opt_mode), + fsparam_string("dax", Opt_dax), + {} +}; + +static int famfs_parse_param(struct fs_context *fc, struct fs_parameter *param) +{ + struct famfs_fs_info *fsi = fc->s_fs_info; + struct fs_parse_result result; + int opt; + + opt = fs_parse(fc, famfs_fs_parameters, param, &result); + if (opt == -ENOPARAM) { + opt = vfs_parse_fs_param_source(fc, param); + if (opt != -ENOPARAM) + return opt; + + return 0; + } + if (opt < 0) + return opt; + + switch (opt) { + case Opt_mode: + fsi->mount_opts.mode = result.uint_32 & S_IALLUGO; + break; + case Opt_dax: + if (strcmp(param->string, "always")) + pr_notice("%s: invalid dax mode %s\n", + __func__, param->string); + break; + } + + return 0; +} + +static void famfs_free_fc(struct fs_context *fc) +{ + struct famfs_fs_info *fsi = fc->s_fs_info; + + if (fsi && fsi->rootdev) + kfree(fsi->rootdev); + + kfree(fsi); +} + +static const struct fs_context_operations famfs_context_ops = { + .free = famfs_free_fc, + .parse_param = famfs_parse_param, + .get_tree = famfs_get_tree, +}; + +static int famfs_init_fs_context(struct fs_context *fc) +{ + struct famfs_fs_info *fsi; + + fsi = kzalloc(sizeof(*fsi), GFP_KERNEL); + if (!fsi) + return -ENOMEM; + + fsi->mount_opts.mode = FAMFS_DEFAULT_MODE; + fc->s_fs_info = fsi; + fc->ops = &famfs_context_ops; + return 0; +} + +static void famfs_kill_sb(struct super_block *sb) +{ + struct famfs_fs_info *fsi = sb->s_fs_info; + + if (fsi->dax_devp) + fs_put_dax(fsi->dax_devp, sb); + if (fsi && fsi->rootdev) + kfree(fsi->rootdev); + kfree(fsi); + sb->s_fs_info = NULL; + + kill_char_super(sb); /* new */ +} + +#define MODULE_NAME "famfs" +static struct file_system_type famfs_fs_type = { + .name = MODULE_NAME, + .init_fs_context = famfs_init_fs_context, + .parameters = famfs_fs_parameters, + .kill_sb = famfs_kill_sb, + .fs_flags = FS_USERNS_MOUNT, +}; + +/****************************************************************************** + * Module stuff + */ +static int __init init_famfs_fs(void) +{ + int rc; + + rc = register_filesystem(&famfs_fs_type); + + return rc; +} + +static void +__exit famfs_exit(void) +{ + unregister_filesystem(&famfs_fs_type); + pr_info("%s: unregistered\n", __func__); +} + +fs_initcall(init_famfs_fs); +module_exit(famfs_exit); + +MODULE_AUTHOR("John Groves, Micron Technology"); +MODULE_LICENSE("GPL"); diff --git a/fs/famfs/famfs_internal.h b/fs/famfs/famfs_internal.h new file mode 100644 index 000000000000..951b32ec4fbd --- /dev/null +++ b/fs/famfs/famfs_internal.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * famfs - dax file system for shared fabric-attached memory + * + * Copyright 2023-2024 Micron Technology, Inc. + * + * This file system, originally based on ramfs the dax support from xfs, + * is intended to allow multiple host systems to mount a common file system + * view of dax files that map to shared memory. + */ +#ifndef FAMFS_INTERNAL_H +#define FAMFS_INTERNAL_H + +struct famfs_mount_opts { + umode_t mode; +}; + +/** + * @famfs_fs_info + * + * @mount_opts: the mount options + * @dax_devp: The underlying character devdax device + * @rootdev: Dax device path used in mount + * @daxdevno: Dax device dev_t + * @deverror: True if the dax device has called our notify_failure entry + * point, or if other "shutdown" conditions exist + */ +struct famfs_fs_info { + struct famfs_mount_opts mount_opts; + struct dax_device *dax_devp; + char *rootdev; + dev_t daxdevno; + bool deverror; +}; + +#endif /* FAMFS_INTERNAL_H */ diff --git a/fs/namei.c b/fs/namei.c index c5b2a25be7d0..f24b268473cd 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3229,6 +3229,7 @@ bool may_open_dev(const struct path *path) return !(path->mnt->mnt_flags & MNT_NODEV) && !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV); } +EXPORT_SYMBOL(may_open_dev); static int may_open(struct mnt_idmap *idmap, const struct path *path, int acc_mode, int flag) diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index 1b40a968ba91..e9bdd6a415e2 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -37,6 +37,7 @@ #define HOSTFS_SUPER_MAGIC 0x00c0ffee #define OVERLAYFS_SUPER_MAGIC 0x794c7630 #define FUSE_SUPER_MAGIC 0x65735546 +#define FAMFS_SUPER_MAGIC 0x87b282ff #define MINIX_SUPER_MAGIC 0x137F /* minix v1 fs, 14 char names */ #define MINIX_SUPER_MAGIC2 0x138F /* minix v1 fs, 30 char names */ From patchwork Mon Apr 29 17:04:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647405 Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E67EB127B40 for ; Mon, 29 Apr 2024 17:05:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410325; cv=none; b=DMeJZfx8b9Z3yXhSTCd7als0DA7Lac6XTfOVBJX/Wfktj6U8ECc1dBeKRO+LY3nq9AtTge5RgjqLm2kN+AntHJWIEFxUtQHOSuc4nX+AfyJGffcEJl7Q8XFXM590CTAjvofX5ZfUZhGGTwXxM2Q5Eh5Y3fT5SqttD+XsnfJbMwY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410325; c=relaxed/simple; bh=3okyjXHCMhcAJ0EEcEcf6cQaC7PkHdUAYWpAVtSMGPY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SymaqU2fVb00j5pVWO/a0KhQ0oKxL74W97peOgGP8frTrY5jX7HG2lQrdVzys5vyEBrAZksYeG0a2JlHBkDtexGxJ5B2PIOu3pmqNmeeUQtyZ5KzWl5AFG/rjHe4I3vvF/3PjKBEliT2kG5XlNYaPmqhhhTGCONiGJlNC7KCQQE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ETdrG8vo; arc=none smtp.client-ip=209.85.210.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ETdrG8vo" Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-6ee27cb096cso730613a34.2 for ; Mon, 29 Apr 2024 10:05:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410323; x=1715015123; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=2UaofW1Ss7ysgCpPsl4E9jKgtjI2OUWLad4tGWrXZwU=; b=ETdrG8vofgnxoXr9Lyr4b1GSLzr5z3kwANzlwJbE1aoYu7VH6NCWyHMCKCvNB/8mTH 9FasGvePku6NGZDFm08rni4ws1uV9JbZIxPmq4e+hWJnz1lJfGB3k9Zig1la5a+MMP+2 UL3/hahfVbEGr9De+I+55VSs3HVldSmbeeUAZ8wfeLU1MmupLo2+pe5r3IU3KqQSdh0M cAMUzD59pcrsXSg8HmPpl1IO4CIVwOj6GCdLvIxBRGrczo2D/kVISWZGC/Th4MxyGDnk CXLwD1K4vginBkgPeA15pzy09DmcoP9Ahb3AZRunt081vyEjoQfrB1c+3JlQf6fZE9MI O26g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410323; x=1715015123; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=2UaofW1Ss7ysgCpPsl4E9jKgtjI2OUWLad4tGWrXZwU=; b=kR0Qcu565++LxRMo8BXGKEr3+IZ1OAWO6hop9/vuIXgBFunvI0diln04MgWfIlWFst X8m8RKliTY7jApEF0yNGMripDdkmpUpI+DnYBJoJmVZSjzBVZrldGcgotKJCUNoDMNbD nz9wq5QeyttPodkd8qLTE9Si9710ImcSK83yR3E8CKyPx6szQ4SK4iC74hG6qAgoxmpb psb0kTHjdDD+8bPm8w5PVENRilmPKQsYHle2HXzLMTeJIxHZO+aW+KpPdblBI9VinfZ/ cqNAQwNyFjBUyY/2zANnykPqD2t6o7ixfiT7nY6IBevBnpyQgitoLRyi7Z4n4TxeQ05m Bxnw== X-Forwarded-Encrypted: i=1; AJvYcCVwJLJipZETu6jld5coYCkFfiG3H598eiK9T+7aPu+ut0/bx2BFHnjgSDkbYIK85G92LizgxORQfI6zGQF1ZUo8/4lbz1Ud X-Gm-Message-State: AOJu0YyzSWzg9G2jCYfMJcJRX9PKVHn2urYPNwbcxwjPrqchB0zQhqgN IZAMEGSDfQJozCGFecM85PLzCC1bbkhqkTk94Cd7mSZtT2dSheCP X-Google-Smtp-Source: AGHT+IFvD88Udyo+NvOBr4PNFG9gOR+wiZU+XL9Ie5iTR0+/Cv+NkgpNj5yEeSCkfCvrufai21KfAw== X-Received: by 2002:a05:6830:4513:b0:6eb:d349:8c3f with SMTP id i19-20020a056830451300b006ebd3498c3fmr13821572otv.28.1714410323111; Mon, 29 Apr 2024 10:05:23 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.05.20 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:05:22 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 09/12] famfs: Introduce inode_operations and super_operations Date: Mon, 29 Apr 2024 12:04:25 -0500 Message-Id: X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The famfs inode and super operations are pretty much generic. This commit builds but is still too incomplete to run Signed-off-by: John Groves --- fs/famfs/famfs_inode.c | 113 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 110 insertions(+), 3 deletions(-) diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c index 61306240fc0b..e00e9cdecadf 100644 --- a/fs/famfs/famfs_inode.c +++ b/fs/famfs/famfs_inode.c @@ -28,6 +28,9 @@ #define FAMFS_DEFAULT_MODE 0755 +static const struct inode_operations famfs_file_inode_operations; +static const struct inode_operations famfs_dir_inode_operations; + static struct inode *famfs_get_inode(struct super_block *sb, const struct inode *dir, umode_t mode, dev_t dev) @@ -52,11 +55,11 @@ static struct inode *famfs_get_inode(struct super_block *sb, init_special_inode(inode, mode, dev); break; case S_IFREG: - inode->i_op = NULL /* famfs_file_inode_operations */; + inode->i_op = &famfs_file_inode_operations; inode->i_fop = NULL /* &famfs_file_operations */; break; case S_IFDIR: - inode->i_op = NULL /* famfs_dir_inode_operations */; + inode->i_op = &famfs_dir_inode_operations; inode->i_fop = &simple_dir_operations; /* Directory inodes start off with i_nlink == 2 (for ".") */ @@ -70,6 +73,110 @@ static struct inode *famfs_get_inode(struct super_block *sb, return inode; } +/*************************************************************************** + * famfs inode_operations: these are currently pretty much boilerplate + */ + +static const struct inode_operations famfs_file_inode_operations = { + /* All generic */ + .setattr = simple_setattr, + .getattr = simple_getattr, +}; + +/* + * File creation. Allocate an inode, and we're done.. + */ +static int +famfs_mknod(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, + umode_t mode, dev_t dev) +{ + struct famfs_fs_info *fsi = dir->i_sb->s_fs_info; + struct timespec64 tv; + struct inode *inode; + + if (fsi->deverror) + return -ENODEV; + + inode = famfs_get_inode(dir->i_sb, dir, mode, dev); + if (!inode) + return -ENOSPC; + + d_instantiate(dentry, inode); + dget(dentry); /* Extra count - pin the dentry in core */ + tv = inode_set_ctime_current(inode); + inode_set_mtime_to_ts(inode, tv); + inode_set_atime_to_ts(inode, tv); + + return 0; +} + +static int famfs_mkdir(struct mnt_idmap *idmap, struct inode *dir, + struct dentry *dentry, umode_t mode) +{ + struct famfs_fs_info *fsi = dir->i_sb->s_fs_info; + int rc; + + if (fsi->deverror) + return -ENODEV; + + rc = famfs_mknod(&nop_mnt_idmap, dir, dentry, mode | S_IFDIR, 0); + if (rc) + return rc; + + inc_nlink(dir); + + return 0; +} + +static int famfs_create(struct mnt_idmap *idmap, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl) +{ + struct famfs_fs_info *fsi = dir->i_sb->s_fs_info; + + if (fsi->deverror) + return -ENODEV; + + return famfs_mknod(&nop_mnt_idmap, dir, dentry, mode | S_IFREG, 0); +} + +static const struct inode_operations famfs_dir_inode_operations = { + .create = famfs_create, + .lookup = simple_lookup, + .link = simple_link, + .unlink = simple_unlink, + .mkdir = famfs_mkdir, + .rmdir = simple_rmdir, + .rename = simple_rename, +}; + +/***************************************************************************** + * famfs super_operations + * + * TODO: implement a famfs_statfs() that shows size, free and available space, + * etc. + */ + +/* + * famfs_show_options() - Display the mount options in /proc/mounts. + */ +static int famfs_show_options(struct seq_file *m, struct dentry *root) +{ + struct famfs_fs_info *fsi = root->d_sb->s_fs_info; + + if (fsi->mount_opts.mode != FAMFS_DEFAULT_MODE) + seq_printf(m, ",mode=%o", fsi->mount_opts.mode); + + return 0; +} + +static const struct super_operations famfs_super_ops = { + .statfs = simple_statfs, + .drop_inode = generic_delete_inode, + .show_options = famfs_show_options, +}; + +/*****************************************************************************/ + /* * famfs dax_operations (for char dax) */ @@ -103,7 +210,7 @@ famfs_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_blocksize = PAGE_SIZE; sb->s_blocksize_bits = PAGE_SHIFT; sb->s_magic = FAMFS_SUPER_MAGIC; - sb->s_op = NULL /* famfs_super_ops */; + sb->s_op = &famfs_super_ops; sb->s_time_gran = 1; return rc; From patchwork Mon Apr 29 17:04:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647406 Received: from mail-ot1-f48.google.com (mail-ot1-f48.google.com [209.85.210.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3378127B51 for ; Mon, 29 Apr 2024 17:05:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410330; cv=none; b=dgv6HRVr3YqUYiR8AjMk+kvBs/zjAPVOxMylC3FW3/+UWN/uGZoUq5Vd0KwvDcLsOibGqBXPSph0TryW87ijcANy8uuRywJa9SqJzgqXpLc2dyrc8tLpHr4qudyv8WCnKC/sngsXfyDN2TzO9ewvmhx1uecH8nmUtnCLE0x0x7E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410330; c=relaxed/simple; bh=f7nr9BHgMNY07SuBb7AMIuKL8EhAF6Lae/DCFaA0XMM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=B6ZiHLulbMcyAZ/AxtYumUs9YWLbaSSfzZSKfH9/fU355WtJ2snui9YZ3aiFMZ41NlHhgL6isGaZPTQpwhP4AvUnYTQhDqIZOji1Va0FP0GyG65kL5b4oPM/CA0YCsyfIolQC233qHwfdMczlUup0C6tDj7RXh2niMRhd+ZnlTY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=D9jRzO2t; arc=none smtp.client-ip=209.85.210.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="D9jRzO2t" Received: by mail-ot1-f48.google.com with SMTP id 46e09a7af769-6ee3a49bdcfso515401a34.0 for ; Mon, 29 Apr 2024 10:05:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410328; x=1715015128; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=MvRMVdHLhDXhOx55ypqAZ1Z/pDm9XYtxQRRB/JnE99c=; b=D9jRzO2tdEvnfGvk3ElftrvWUpG+2ooWxiYNq0ZtN+XGgtwfuOzReody9uW8EYZeN7 ZfgCmTthL4ZxMlBUdq+GFFzPaAyEdTUJHtrMxm5voDgArwrnXTeiAk2CDZ8ERQArMYeT 9flnoG7sVGvpxvqCBetK7hSLm46KTXtTPtSgXeIuqNUL7BIxhtsULXSrc2rFogI0igwv goqz4H4eU1NS3ByZODkDfW1fW9iejiq2G+gaHvpXhQLybLH4Pgt3yhCce8UqQQsyPEOC PoI5fOEdovJrl1+nDhpFlndba09d4h4jJN9OEzum/o0Au9gwQdPMZlTEJauH5TgF9rB7 T7pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410328; x=1715015128; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=MvRMVdHLhDXhOx55ypqAZ1Z/pDm9XYtxQRRB/JnE99c=; b=nlI5H0fX7rpZ7mFgNjg57yL13PLkaihy2m8+VD+YMItDA5S/U1zFL4MXwGMfMPb1Ja beUhmsEiK0XI8R6yZQViYzWO3j1ZeTmJPBeHDPA6tTPJ5pYE/bt7jhgk5cgo6lyDlN9P rezmJxL6nvC5/lR2jOJbGviC6n3jkUcXHzXzEKFjy/kWvW0E/Vy/nQEwWUGxOgRPyU8H gEHnn4CCM8zZMH8Nuz6JQGEDYr0eVVoEp3GnkgdDIAUb0QrBkdgBF0WhsyUn0urvj4aA Rtp36PvVD/PvQhW0S2pvno4NusJgQc1fmgdE2l3TAPEhza7/47CQetNWNuvnKhlNQsht iXCg== X-Forwarded-Encrypted: i=1; AJvYcCWi1DwYK3ZkGOB24VoojK552c5j7v0kxeNoHPl/Gq0trzOW0TxxgbtG8/rRbxREkdDevjdiU3RwYlVOpIcxBv2X+0HniFe3 X-Gm-Message-State: AOJu0YzBF+po1XABKh+xkSV/F8QdqkmRVU9cDN31c/IacCS4sOHq2q1+ Kj6YdNyYHBSrPkvlekvxPaRPUDQDcorjSkQKr6qOY4KsCGifkadk X-Google-Smtp-Source: AGHT+IG/ZcwB0huKygPBvFdcgDmjShYrcDklwOFhU2UIhkvvjhkEis6/aiEjDF1au3eTMnKT4Qud0g== X-Received: by 2002:a05:6830:59:b0:6ee:3232:160a with SMTP id d25-20020a056830005900b006ee3232160amr328210otp.38.1714410326944; Mon, 29 Apr 2024 10:05:26 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.05.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:05:26 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 10/12] famfs: Introduce file_operations read/write Date: Mon, 29 Apr 2024 12:04:26 -0500 Message-Id: <4584f1e26802af540a60eadb70f42c6ac5fe4679.1714409084.git.john@groves.net> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This commit introduces fs/famfs/famfs_file.c and the famfs file_operations for read/write. This is not usable yet because: * It calls dax_iomap_rw() with NULL iomap_ops (which will be introduced in a subsequent commit). * famfs_ioctl() is coming in a later commit, and it is necessary to map a file to a memory allocation. Signed-off-by: John Groves --- fs/famfs/Makefile | 2 +- fs/famfs/famfs_file.c | 122 ++++++++++++++++++++++++++++++++++++++ fs/famfs/famfs_inode.c | 2 +- fs/famfs/famfs_internal.h | 2 + 4 files changed, 126 insertions(+), 2 deletions(-) create mode 100644 fs/famfs/famfs_file.c diff --git a/fs/famfs/Makefile b/fs/famfs/Makefile index 62230bcd6793..8cac90c090a4 100644 --- a/fs/famfs/Makefile +++ b/fs/famfs/Makefile @@ -2,4 +2,4 @@ obj-$(CONFIG_FAMFS) += famfs.o -famfs-y := famfs_inode.o +famfs-y := famfs_inode.o famfs_file.o diff --git a/fs/famfs/famfs_file.c b/fs/famfs/famfs_file.c new file mode 100644 index 000000000000..48036c71d4ed --- /dev/null +++ b/fs/famfs/famfs_file.c @@ -0,0 +1,122 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * famfs - dax file system for shared fabric-attached memory + * + * Copyright 2023-2024 Micron Technology, Inc. + * + * This file system, originally based on ramfs the dax support from xfs, + * is intended to allow multiple host systems to mount a common file system + * view of dax files that map to shared memory. + */ + +#include +#include +#include +#include + +#include "famfs_internal.h" + +/********************************************************************* + * file_operations + */ + +/* Reject I/O to files that aren't in a valid state */ +static ssize_t +famfs_file_invalid(struct inode *inode) +{ + if (!IS_DAX(inode)) { + pr_debug("%s: inode %llx IS_DAX is false\n", __func__, (u64)inode); + return -ENXIO; + } + return 0; +} + +static ssize_t +famfs_rw_prep(struct kiocb *iocb, struct iov_iter *ubuf) +{ + struct inode *inode = iocb->ki_filp->f_mapping->host; + struct super_block *sb = inode->i_sb; + struct famfs_fs_info *fsi = sb->s_fs_info; + size_t i_size = i_size_read(inode); + size_t count = iov_iter_count(ubuf); + size_t max_count; + ssize_t rc; + + if (fsi->deverror) + return -ENODEV; + + rc = famfs_file_invalid(inode); + if (rc) + return rc; + + max_count = max_t(size_t, 0, i_size - iocb->ki_pos); + + if (count > max_count) + iov_iter_truncate(ubuf, max_count); + + if (!iov_iter_count(ubuf)) + return 0; + + return rc; +} + +static ssize_t +famfs_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + ssize_t rc; + + rc = famfs_rw_prep(iocb, to); + if (rc) + return rc; + + if (!iov_iter_count(to)) + return 0; + + rc = dax_iomap_rw(iocb, to, NULL /*&famfs_iomap_ops */); + + file_accessed(iocb->ki_filp); + return rc; +} + +/** + * famfs_dax_write_iter() + * + * We need our own write-iter in order to prevent append + * + * @iocb: + * @from: iterator describing the user memory source for the write + */ +static ssize_t +famfs_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + ssize_t rc; + + rc = famfs_rw_prep(iocb, from); + if (rc) + return rc; + + if (!iov_iter_count(from)) + return 0; + + return dax_iomap_rw(iocb, from, NULL /*&famfs_iomap_ops*/); +} + +const struct file_operations famfs_file_operations = { + .owner = THIS_MODULE, + + /* Custom famfs operations */ + .write_iter = famfs_dax_write_iter, + .read_iter = famfs_dax_read_iter, + .unlocked_ioctl = NULL /*famfs_file_ioctl*/, + .mmap = NULL /* famfs_file_mmap */, + + /* Force PMD alignment for mmap */ + .get_unmapped_area = thp_get_unmapped_area, + + /* Generic Operations */ + .fsync = noop_fsync, + .splice_read = filemap_splice_read, + .splice_write = iter_file_splice_write, + .llseek = generic_file_llseek, +}; + diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c index e00e9cdecadf..490a2c0fd326 100644 --- a/fs/famfs/famfs_inode.c +++ b/fs/famfs/famfs_inode.c @@ -56,7 +56,7 @@ static struct inode *famfs_get_inode(struct super_block *sb, break; case S_IFREG: inode->i_op = &famfs_file_inode_operations; - inode->i_fop = NULL /* &famfs_file_operations */; + inode->i_fop = &famfs_file_operations; break; case S_IFDIR: inode->i_op = &famfs_dir_inode_operations; diff --git a/fs/famfs/famfs_internal.h b/fs/famfs/famfs_internal.h index 951b32ec4fbd..36efaef425e7 100644 --- a/fs/famfs/famfs_internal.h +++ b/fs/famfs/famfs_internal.h @@ -11,6 +11,8 @@ #ifndef FAMFS_INTERNAL_H #define FAMFS_INTERNAL_H +extern const struct file_operations famfs_file_operations; + struct famfs_mount_opts { umode_t mode; }; From patchwork Mon Apr 29 17:04:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647407 Received: from mail-oa1-f52.google.com (mail-oa1-f52.google.com [209.85.160.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE9B786AE3 for ; Mon, 29 Apr 2024 17:05:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410336; cv=none; b=elu80608TaYf+O28r/yVjrQeUVtOxGdEjKonZyHbIBbTWPlmONZsUMWLCnf0LrHDsdRvwLYsKd80Gv5lG2Tsd622cCD6OyWIrF1wm0Q1mSGIKzpZSg7aE5JeaTLbtt1QQyZkyyLzrxkdjIt6BRUY4XMCpINNmoZFEfiRI14Tvq4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410336; c=relaxed/simple; bh=qYcyL0hIBedDSxOc4RRWxp74kl4dcZDEKqcuPynXMsA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=T8yWVlkUdqgf39Vign0UMW5iTW9FBD8aqA8i3sVe7NGsL6B4tKxUeoF4r3K8r8Z/WQMeHjshEYG+6n0unYVC9fyLpohOKeEVQRzqGXqKsbFa6XZMctQZE5ye8Fy+3qXZ03eRasfsGt7wwOhCHvrghKxgoctt6SSEMQn5pOaGH6I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=CKB80pJz; arc=none smtp.client-ip=209.85.160.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CKB80pJz" Received: by mail-oa1-f52.google.com with SMTP id 586e51a60fabf-23a6a8e9978so2061587fac.3 for ; Mon, 29 Apr 2024 10:05:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410331; x=1715015131; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=yYsNSadbDL/XO8+Ce/eg3N1cTx7anq8xr5/2FvNISeg=; b=CKB80pJzIq4Kja1+ZWWHTT+ZmqGTEZkTZZJWS0TioPUOkzT6LrRdH91w3sImgw7euB oiF402N20et24biYj/8iaC6BaBLF6PUHBgEB3Q11yxq/osYVRur2kFEpQbOyMLdeVIlk zMGCBR3DG6fzvdnBmsUSYTGfGxYTDCZHu4OXWLi0CUAY0SaVLlVnxvMtqYlcQZONtzeI q3/g/aasjkzsX3yfYkV9gUiq2to57gfd2rUTkpwtO2IfQKqyqaIc57VGM2zx3rV9eMwV HmQFKqg5vupWO7sveqKbRb+Fp7r5tHiw0md8m/0Z546EppH944nZOw8Nnjvc2WwHPkEx G0Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410331; x=1715015131; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=yYsNSadbDL/XO8+Ce/eg3N1cTx7anq8xr5/2FvNISeg=; b=tBNBQpW9T0GDVBIs8ppxM78kPJ+WrUX+VYGluTfDIEDrsC/DMgoSAqOJzLH5MEYZit 3EJJkbha2DCOQc++damFZPkVqUXi6vyyJ/Z64DdU7kq/Gdei0YqR/Wh/zsd17ctCApzi ZtAj16IH7sck+9kYHv1/8RBQh7d4Ygkwga8gIaHXb3dxNLWby9GMlCCWqXFgf4fmoQAQ l5UAsL8B1hWDIFTt2hcaDsDAmXUv8ogCd9B5882ao04ApozMCLbuU/m7Ud/rgTAMJZ1K TAywgHK2PNNDr9Pw9uzOZQCIMTz9D1xZmaNxbNp8mtMAVnYjCDkSHWRwrRhDt4RZNdEe FsZg== X-Forwarded-Encrypted: i=1; AJvYcCUeeiU1MMM/NBvbEQyFVsxEG/a0+WcOagIc0IWOw0fBgULARsLuVfRUFFwGD2UweDgb+wL7kwPJL1d9krGDKuLp7aS2Cpsi X-Gm-Message-State: AOJu0YzSjywm4qNYUWNMrW7cXRqkfSBrW7dHZeiabyrbu2+K8j4gLQIf 7jL+5IIlc3waWikwvQV4m1J8ceNRMR7v5nsu+VOeInpFzRSwKnvn X-Google-Smtp-Source: AGHT+IHpy5rIS85sX04zYGzAy0Qk38IVj3GM56nKt2jdu2LLX/opQJ4URM6nJrxFSBqIHBiafXyrsQ== X-Received: by 2002:a05:6871:408a:b0:21f:2b1:cdea with SMTP id kz10-20020a056871408a00b0021f02b1cdeamr14600153oab.57.1714410331618; Mon, 29 Apr 2024 10:05:31 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.05.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:05:31 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 11/12] famfs: Introduce mmap and VM fault handling Date: Mon, 29 Apr 2024 12:04:27 -0500 Message-Id: <744981e208f94d5fc12549e48b775d10cee550e8.1714409084.git.john@groves.net> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This commit adds vm_operations, plus famfs_mmap() and fault handlers. It is still missing iomap_ops, iomap mapping resolution, and famfs_ioctl() for setting up file-to-memory mappings. Signed-off-by: John Groves --- fs/famfs/famfs_file.c | 108 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 106 insertions(+), 2 deletions(-) diff --git a/fs/famfs/famfs_file.c b/fs/famfs/famfs_file.c index 48036c71d4ed..585b776dd73c 100644 --- a/fs/famfs/famfs_file.c +++ b/fs/famfs/famfs_file.c @@ -16,6 +16,88 @@ #include "famfs_internal.h" +/********************************************************************* + * vm_operations + */ +static vm_fault_t +__famfs_filemap_fault(struct vm_fault *vmf, unsigned int pe_size, + bool write_fault) +{ + struct inode *inode = file_inode(vmf->vma->vm_file); + struct super_block *sb = inode->i_sb; + struct famfs_fs_info *fsi = sb->s_fs_info; + vm_fault_t ret; + pfn_t pfn; + + if (fsi->deverror) + return VM_FAULT_SIGBUS; + + if (!IS_DAX(file_inode(vmf->vma->vm_file))) { + pr_err("%s: file not marked IS_DAX!!\n", __func__); + return VM_FAULT_SIGBUS; + } + + if (write_fault) { + sb_start_pagefault(inode->i_sb); + file_update_time(vmf->vma->vm_file); + } + + ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, NULL /*&famfs_iomap_ops */); + if (ret & VM_FAULT_NEEDDSYNC) + ret = dax_finish_sync_fault(vmf, pe_size, pfn); + + if (write_fault) + sb_end_pagefault(inode->i_sb); + + return ret; +} + +static inline bool +famfs_is_write_fault(struct vm_fault *vmf) +{ + return (vmf->flags & FAULT_FLAG_WRITE) && + (vmf->vma->vm_flags & VM_SHARED); +} + +static vm_fault_t +famfs_filemap_fault(struct vm_fault *vmf) +{ + return __famfs_filemap_fault(vmf, 0, famfs_is_write_fault(vmf)); +} + +static vm_fault_t +famfs_filemap_huge_fault(struct vm_fault *vmf, unsigned int pe_size) +{ + return __famfs_filemap_fault(vmf, pe_size, famfs_is_write_fault(vmf)); +} + +static vm_fault_t +famfs_filemap_page_mkwrite(struct vm_fault *vmf) +{ + return __famfs_filemap_fault(vmf, 0, true); +} + +static vm_fault_t +famfs_filemap_pfn_mkwrite(struct vm_fault *vmf) +{ + return __famfs_filemap_fault(vmf, 0, true); +} + +static vm_fault_t +famfs_filemap_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, + pgoff_t end_pgoff) +{ + return filemap_map_pages(vmf, start_pgoff, end_pgoff); +} + +const struct vm_operations_struct famfs_file_vm_ops = { + .fault = famfs_filemap_fault, + .huge_fault = famfs_filemap_huge_fault, + .map_pages = famfs_filemap_map_pages, + .page_mkwrite = famfs_filemap_page_mkwrite, + .pfn_mkwrite = famfs_filemap_pfn_mkwrite, +}; + /********************************************************************* * file_operations */ @@ -25,7 +107,8 @@ static ssize_t famfs_file_invalid(struct inode *inode) { if (!IS_DAX(inode)) { - pr_debug("%s: inode %llx IS_DAX is false\n", __func__, (u64)inode); + pr_debug("%s: inode %llx IS_DAX is false\n", + __func__, (u64)inode); return -ENXIO; } return 0; @@ -101,6 +184,27 @@ famfs_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) return dax_iomap_rw(iocb, from, NULL /*&famfs_iomap_ops*/); } +static int +famfs_file_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct inode *inode = file_inode(file); + struct super_block *sb = inode->i_sb; + struct famfs_fs_info *fsi = sb->s_fs_info; + ssize_t rc; + + if (fsi->deverror) + return -ENODEV; + + rc = famfs_file_invalid(inode); + if (rc) + return (int)rc; + + file_accessed(file); + vma->vm_ops = &famfs_file_vm_ops; + vm_flags_set(vma, VM_HUGEPAGE); + return 0; +} + const struct file_operations famfs_file_operations = { .owner = THIS_MODULE, @@ -108,7 +212,7 @@ const struct file_operations famfs_file_operations = { .write_iter = famfs_dax_write_iter, .read_iter = famfs_dax_read_iter, .unlocked_ioctl = NULL /*famfs_file_ioctl*/, - .mmap = NULL /* famfs_file_mmap */, + .mmap = famfs_file_mmap, /* Force PMD alignment for mmap */ .get_unmapped_area = thp_get_unmapped_area, From patchwork Mon Apr 29 17:04:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Groves X-Patchwork-Id: 13647408 Received: from mail-ot1-f42.google.com (mail-ot1-f42.google.com [209.85.210.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73309127E1E for ; Mon, 29 Apr 2024 17:05:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410343; cv=none; b=Yom+YK8rcEqlDP3jQwUGnIrMUDxNj0J7uS9RPb+0ji//OFwDLyI0HbfGRoRhg/Re8GEbhHcoyOlBJsLIKJJLsoJZdIzLQOL9eDHrQVqjfvWx4j1QgH2wZaHhRgP4894Uj/WC/oxrcpcsKNPbKMPOssQxVGHyTYlmXp/r/4VKj7I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714410343; c=relaxed/simple; bh=uraQR7x/sGs8bT6f+ne9e8Dyo7H0N2VLJU5n4krNVvw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QxEVcAUwh9kob5BKXG3rVIccvU+EINupez5LyBs07YklLvl2qu6eNc5qPl1anTtUqsjqY6IZA0m4dMC9XjPe8Jqr+Cb6wEvJDVid9wQqkklse9pK8QA+xSOewotXzkVeSHsBWdj8dRueV5ZVb30Xy39qqTmxJM+2hu/UNR0fFQo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FfVz0Kr0; arc=none smtp.client-ip=209.85.210.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=Groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FfVz0Kr0" Received: by mail-ot1-f42.google.com with SMTP id 46e09a7af769-6ee2d64423cso897415a34.2 for ; Mon, 29 Apr 2024 10:05:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714410336; x=1715015136; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=8XGfzHhfU9BjKWtreEiEkkMsTodSRwxrpWg7XVozVVs=; b=FfVz0Kr0/2M23Bh5aeqKFrhT+nwIgHmRiuDjoMeecpQblU16uNBVFDnkQNXu/g1t47 Cc4cdZkDVuFK+vqe8vLnNUVQAGqHwuNEuI/iQhPG092Zvp4Fugq3c1zGKzDy5tp2QSK5 DWj1SRpJ0yFLjdykhN/SCG7vxX6HLCPHJw4kth3rGihRUzUeIEJk1olq/intp5gNDzPj JhsewOw2k56QLyHfv0nSN6/AQKXFYw/FhMNbMfpNilGRZeD1UojCbn0GLUMN6+z7gq9W xhth0tG9Thbc4BMBs11UftA+I438gME/8sPriLhKC3ELX9abNNrr5GnqLaKOAUj57kQF ao2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714410336; x=1715015136; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=8XGfzHhfU9BjKWtreEiEkkMsTodSRwxrpWg7XVozVVs=; b=eoX5HWC6YCGREgWPfTC8fKVgI3VnC9zYS2HreicJQ2W1PQPmKr0k3Vcrqd5tudB0g9 6jTTBWasfs3lQju327IUAFcJvhwTNAijRLJefj+gPQ3oVgpD151QC2+2eyfMhiWL86rP VS+317W4abMQkY5xq6a/tP2uh3oKM2aizXVM5SdFgB2aXzoBYSqAJwcAvgEwcW7I/Zd4 h+3+/0BAi01ju2UtIpRlkIO9Qi2zZSXBUx3LHx/ptWjQaCulVizhCuuiPHCgbqOQq881 M+hvQbcoUUYG4iDyuTU8enDgOTqN5+Q7coEuf/ZA4lnjm8wECHAxY5Mj8EhCWEc3Iv/7 OG7w== X-Forwarded-Encrypted: i=1; AJvYcCXGxaDvKoE3CTiwBNu+0K6dH5U7OHcjwa6I1k7Bx7mzj0DkLZ2uP9smP4YQXnkSUFd/vJr2kHxDBRTodYBXbvuZV5vKM/EG X-Gm-Message-State: AOJu0YzgLnQ0CD3LPzQKmWVoAI8ClB3BT7oXYDKNvP+nAZW5VwKQnEY4 gahl0EERjXGDp98NGZxOWO+nADZ2kCN8KxLDqbm7MKjzBAMY/MTX X-Google-Smtp-Source: AGHT+IFa4ttTfbKQGFpDgkeFkiAf/nTCdj5A92xX4+XR8LBf8Q0ctWa5/uMWEr1Ye2nwSdUIywl4GQ== X-Received: by 2002:a9d:6a11:0:b0:6ee:2798:4b95 with SMTP id g17-20020a9d6a11000000b006ee27984b95mr4654535otn.10.1714410335740; Mon, 29 Apr 2024 10:05:35 -0700 (PDT) Received: from localhost.localdomain ([70.114.203.196]) by smtp.gmail.com with ESMTPSA id g1-20020a9d6201000000b006ea20712e66sm4074448otj.17.2024.04.29.10.05.33 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 29 Apr 2024 10:05:35 -0700 (PDT) Sender: John Groves From: John Groves X-Google-Original-From: John Groves To: John Groves , Jonathan Corbet , Jonathan Cameron , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Christian Brauner , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev Cc: John Groves , john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com, Randy Dunlap , Jerome Glisse , Aravind Ramesh , Ajay Joshi , Eishan Mirakhur , Ravi Shankar , Srinivasulu Thanneeru , Luis Chamberlain , Amir Goldstein , Chandan Babu R , Bagas Sanjaya , "Darrick J . Wong" , Kent Overstreet , Steve French , Nathan Lynch , Michael Ellerman , Thomas Zimmermann , Julien Panis , Stanislav Fomichev , Dongsheng Yang , John Groves Subject: [RFC PATCH v2 12/12] famfs: famfs_ioctl and core file-to-memory mapping logic & iomap_ops Date: Mon, 29 Apr 2024 12:04:28 -0500 Message-Id: <5824030d31a853ff591b3e1fb4f206b2fd4d1f9f.1714409084.git.john@groves.net> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: References: Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 * Add uapi include file famfs_ioctl.h. The famfs user space uses ioctl on individual files to pass in mapping information and file size. This would be hard to do via sysfs or other means, since it's file-specific. * Add the per-file ioctl function famfs_file_ioctl() into struct file_operations, and introduces the famfs_file_init_dax() function (which is called by famfs_file_ioct()) * Add the famfs iomap_ops. When either dax_iomap_fault() or dax_iomap_rw() is called, we get a callback via our iomap_begin() handler. The question being asked is "please resolve (file, offset) to (daxdev, offset)". The function famfs_meta_to_dax_offset() does this. * Expose the famfs ABI version as /sys/module/famfs/parameters/famfs_kabi_version The current ioctls are: FAMFS_IOC_MAP_CREATE - famfs_file_init_dax() associates a dax extent list with a file, making it into a proper famfs file.Starting with an empty file (which is not useful), This turns the file into a DAX file backed by the specified extent list from devdax memory. FAMFSIOC_NOP - A convenient way for user space to verify it's a famfs file FAMFSIOC_MAP_GET - Get the header of the metadata for a file FAMFSIOC_MAP_GETEXT - Get the extents for a file The last two, together, are comparable to xfs_bmap. Our user space tools use them primarly in testing. Signed-off-by: John Groves --- MAINTAINERS | 1 + fs/famfs/famfs_file.c | 391 ++++++++++++++++++++++++++++++- fs/famfs/famfs_internal.h | 14 ++ include/uapi/linux/famfs_ioctl.h | 61 +++++ 4 files changed, 461 insertions(+), 6 deletions(-) create mode 100644 include/uapi/linux/famfs_ioctl.h diff --git a/MAINTAINERS b/MAINTAINERS index 365d678e2f40..29d81be488bc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8189,6 +8189,7 @@ L: linux-fsdevel@vger.kernel.org S: Supported F: Documentation/filesystems/famfs.rst F: fs/famfs +F: include/uapi/linux/famfs_ioctl.h FANOTIFY M: Jan Kara diff --git a/fs/famfs/famfs_file.c b/fs/famfs/famfs_file.c index 585b776dd73c..ac34e606ca1b 100644 --- a/fs/famfs/famfs_file.c +++ b/fs/famfs/famfs_file.c @@ -14,8 +14,371 @@ #include #include +#include #include "famfs_internal.h" +/* Expose famfs kernel abi version as a read-only module parameter */ +static int famfs_kabi_version = FAMFS_KABI_VERSION; +module_param(famfs_kabi_version, int, 0444); +MODULE_PARM_DESC(famfs_kabi_version, "famfs kernel abi version"); + +/** + * famfs_meta_alloc() - Allocate famfs file metadata + * @metap: Pointer to an mcache_map_meta pointer + * @ext_count: The number of extents needed + */ +static int +famfs_meta_alloc(struct famfs_file_meta **metap, size_t ext_count) +{ + struct famfs_file_meta *meta; + + meta = kzalloc(struct_size(meta, tfs_extents, ext_count), GFP_KERNEL); + if (!meta) + return -ENOMEM; + + meta->tfs_extent_ct = ext_count; + meta->error = false; + *metap = meta; + + return 0; +} + +static void +famfs_meta_free(struct famfs_file_meta *map) +{ + kfree(map); +} + +/** + * famfs_file_init_dax() - FAMFSIOC_MAP_CREATE ioctl handler + * @file: the un-initialized file + * @arg: ptr to struct mcioc_map in user space + * + * Setup the dax mapping for a file. Files are created empty, and then function + * is called by famfs_file_ioctl() to setup the mapping and set the file size. + */ +static int +famfs_file_init_dax(struct file *file, void __user *arg) +{ + struct famfs_file_meta *meta = NULL; + struct famfs_ioc_map imap; + struct famfs_fs_info *fsi; + size_t extent_total = 0; + int alignment_errs = 0; + struct super_block *sb; + struct inode *inode; + size_t ext_count; + int rc; + int i; + + inode = file_inode(file); + if (!inode) { + rc = -EBADF; + goto errout; + } + + sb = inode->i_sb; + fsi = sb->s_fs_info; + if (fsi->deverror) + return -ENODEV; + + rc = copy_from_user(&imap, arg, sizeof(imap)); + if (rc) + return -EFAULT; + + ext_count = imap.ext_list_count; + if (ext_count < 1) { + rc = -ENOSPC; + goto errout; + } + + if (ext_count > FAMFS_MAX_EXTENTS) { + rc = -E2BIG; + goto errout; + } + + rc = famfs_meta_alloc(&meta, ext_count); + if (rc) + goto errout; + + meta->file_type = imap.file_type; + meta->file_size = imap.file_size; + + /* Fill in the internal file metadata structure */ + for (i = 0; i < imap.ext_list_count; i++) { + size_t len; + off_t offset; + + offset = imap.ext_list[i].offset; + len = imap.ext_list[i].len; + + extent_total += len; + + if (WARN_ON(offset == 0 && meta->file_type != FAMFS_SUPERBLOCK)) { + rc = -EINVAL; + goto errout; + } + + meta->tfs_extents[i].offset = offset; + meta->tfs_extents[i].len = len; + + /* All extent addresses/offsets must be 2MiB aligned, + * and all but the last length must be a 2MiB multiple. + */ + if (!IS_ALIGNED(offset, PMD_SIZE)) { + pr_err("%s: error ext %d hpa %lx not aligned\n", + __func__, i, offset); + alignment_errs++; + } + if (i < (imap.ext_list_count - 1) && !IS_ALIGNED(len, PMD_SIZE)) { + pr_err("%s: error ext %d length %ld not aligned\n", + __func__, i, len); + alignment_errs++; + } + } + + /* + * File size can be <= ext list size, since extent sizes are constrained + * to PMD multiples + */ + if (imap.file_size > extent_total) { + pr_err("%s: file size %lld larger than ext list size %lld\n", + __func__, (u64)imap.file_size, (u64)extent_total); + rc = -EINVAL; + goto errout; + } + + if (alignment_errs > 0) { + pr_err("%s: there were %d alignment errors in the extent list\n", + __func__, alignment_errs); + rc = -EINVAL; + goto errout; + } + + /* Publish the famfs metadata on inode->i_private */ + inode_lock(inode); + if (inode->i_private) { + rc = -EEXIST; /* file already has famfs metadata */ + } else { + inode->i_private = meta; + i_size_write(inode, imap.file_size); + inode->i_flags |= S_DAX; + } + inode_unlock(inode); + + errout: + if (rc) + famfs_meta_free(meta); + + return rc; +} + +/** + * famfs_file_ioctl() - Top-level famfs file ioctl handler + * @file: the file + * @cmd: ioctl opcode + * @arg: ioctl opcode argument (if any) + */ +static long +famfs_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct inode *inode = file_inode(file); + struct famfs_fs_info *fsi = inode->i_sb->s_fs_info; + long rc; + + if (fsi->deverror && (cmd != FAMFSIOC_NOP)) + return -ENODEV; + + switch (cmd) { + case FAMFSIOC_NOP: + rc = 0; + break; + + case FAMFSIOC_MAP_CREATE: + rc = famfs_file_init_dax(file, (void *)arg); + break; + + case FAMFSIOC_MAP_GET: { + struct inode *inode = file_inode(file); + struct famfs_file_meta *meta = inode->i_private; + struct famfs_ioc_map umeta; + + memset(&umeta, 0, sizeof(umeta)); + + if (meta) { + /* TODO: do more to harmonize these structures */ + umeta.extent_type = meta->tfs_extent_type; + umeta.file_size = i_size_read(inode); + umeta.ext_list_count = meta->tfs_extent_ct; + + rc = copy_to_user((void __user *)arg, &umeta, + sizeof(umeta)); + if (rc) + pr_err("%s: copy_to_user returned %ld\n", + __func__, rc); + + } else { + rc = -EINVAL; + } + break; + } + case FAMFSIOC_MAP_GETEXT: { + struct inode *inode = file_inode(file); + struct famfs_file_meta *meta = inode->i_private; + + if (meta) + rc = copy_to_user((void __user *)arg, meta->tfs_extents, + meta->tfs_extent_ct * sizeof(struct famfs_extent)); + else + rc = -EINVAL; + break; + } + default: + rc = -ENOTTY; + break; + } + + return rc; +} + +/********************************************************************* + * iomap_operations + * + * This stuff uses the iomap (dax-related) helpers to resolve file offsets to + * offsets within a dax device. + */ + +static ssize_t famfs_file_invalid(struct inode *inode); + +/** + * famfs_meta_to_dax_offset() - Resolve (file, offset, len) to (daxdev, offset, len) + * + * This function is called by famfs_iomap_begin() to resolve an offset in a + * file to an offset in a dax device. This is upcalled from dax from calls to + * both * dax_iomap_fault() and dax_iomap_rw(). Dax finishes the job resolving + * a fault to a specific physical page (the fault case) or doing a memcpy + * variant (the rw case) + * + * Pages can be PTE (4k), PMD (2MiB) or (theoretically) PuD (1GiB) + * (these sizes are for X86; may vary on other cpu architectures + * + * @inode: The file where the fault occurred + * @iomap: To be filled in to indicate where to find the right memory, + * relative to a dax device. + * @file_offset: Within the file where the fault occurred (will be page boundary) + * @len: The length of the faulted mapping (will be a page multiple) + * (will be trimmed in *iomap if it's disjoint in the extent list) + * @flags: + * + * Return values: 0. (info is returned in a modified @iomap struct) + */ +static int +famfs_meta_to_dax_offset(struct inode *inode, struct iomap *iomap, + loff_t file_offset, off_t len, unsigned int flags) +{ + struct famfs_file_meta *meta = inode->i_private; + int i; + loff_t local_offset = file_offset; + struct famfs_fs_info *fsi = inode->i_sb->s_fs_info; + + if (fsi->deverror || famfs_file_invalid(inode)) + goto err_out; + + iomap->offset = file_offset; + + for (i = 0; i < meta->tfs_extent_ct; i++) { + loff_t dax_ext_offset = meta->tfs_extents[i].offset; + loff_t dax_ext_len = meta->tfs_extents[i].len; + + if ((dax_ext_offset == 0) && + (meta->file_type != FAMFS_SUPERBLOCK)) + pr_warn("%s: zero offset on non-superblock file!!\n", + __func__); + + /* local_offset is the offset minus the size of extents skipped + * so far; If local_offset < dax_ext_len, the data of interest + * starts in this extent + */ + if (local_offset < dax_ext_len) { + loff_t ext_len_remainder = dax_ext_len - local_offset; + + /* + * OK, we found the file metadata extent where this + * data begins + * @local_offset - The offset within the current + * extent + * @ext_len_remainder - Remaining length of ext after + * skipping local_offset + * Outputs: + * iomap->addr: the offset within the dax device where + * the data starts + * iomap->offset: the file offset + * iomap->length: the valid length resolved here + */ + iomap->addr = dax_ext_offset + local_offset; + iomap->offset = file_offset; + iomap->length = min_t(loff_t, len, ext_len_remainder); + iomap->dax_dev = fsi->dax_devp; + iomap->type = IOMAP_MAPPED; + iomap->flags = flags; + + return 0; + } + local_offset -= dax_ext_len; /* Get ready for the next extent */ + } + + err_out: + /* We fell out the end of the extent list. + * Set iomap to zero length in this case, and return 0 + * This just means that the r/w is past EOF + */ + iomap->addr = 0; /* there is no valid dax device offset */ + iomap->offset = file_offset; /* file offset */ + iomap->length = 0; /* this had better result in no access to dax mem */ + iomap->dax_dev = fsi->dax_devp; + iomap->type = IOMAP_MAPPED; + iomap->flags = flags; + + return 0; +} + +/** + * famfs_iomap_begin() - Handler for iomap_begin upcall from dax + * + * This function is pretty simple because files are + * * never partially allocated + * * never have holes (never sparse) + * * never "allocate on write" + * + * @inode: inode for the file being accessed + * @offset: offset within the file + * @length: Length being accessed at offset + * @flags: + * @iomap: iomap struct to be filled in, resolving (offset, length) to + * (daxdev, offset, len) + * @srcmap: + */ +static int +famfs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, + unsigned int flags, struct iomap *iomap, struct iomap *srcmap) +{ + struct famfs_file_meta *meta = inode->i_private; + size_t size; + + size = i_size_read(inode); + + WARN_ON(size != meta->file_size); + + return famfs_meta_to_dax_offset(inode, iomap, offset, length, flags); +} + +/* Note: We never need a special set of write_iomap_ops because famfs never + * performs allocation on write. + */ +const struct iomap_ops famfs_iomap_ops = { + .iomap_begin = famfs_iomap_begin, +}; + /********************************************************************* * vm_operations */ @@ -42,7 +405,7 @@ __famfs_filemap_fault(struct vm_fault *vmf, unsigned int pe_size, file_update_time(vmf->vma->vm_file); } - ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, NULL /*&famfs_iomap_ops */); + ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &famfs_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); @@ -106,9 +469,25 @@ const struct vm_operations_struct famfs_file_vm_ops = { static ssize_t famfs_file_invalid(struct inode *inode) { + struct famfs_file_meta *meta = inode->i_private; + size_t i_size = i_size_read(inode); + + if (!meta) { + pr_debug("%s: un-initialized famfs file\n", __func__); + return -EIO; + } + if (meta->error) { + pr_debug("%s: previously detected metadata errors\n", __func__); + return -EIO; + } + if (i_size != meta->file_size) { + pr_warn("%s: i_size overwritten from %ld to %ld\n", + __func__, meta->file_size, i_size); + meta->error = true; + return -ENXIO; + } if (!IS_DAX(inode)) { - pr_debug("%s: inode %llx IS_DAX is false\n", - __func__, (u64)inode); + pr_debug("%s: inode %llx IS_DAX is false\n", __func__, (u64)inode); return -ENXIO; } return 0; @@ -155,7 +534,7 @@ famfs_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) if (!iov_iter_count(to)) return 0; - rc = dax_iomap_rw(iocb, to, NULL /*&famfs_iomap_ops */); + rc = dax_iomap_rw(iocb, to, &famfs_iomap_ops); file_accessed(iocb->ki_filp); return rc; @@ -181,7 +560,7 @@ famfs_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) if (!iov_iter_count(from)) return 0; - return dax_iomap_rw(iocb, from, NULL /*&famfs_iomap_ops*/); + return dax_iomap_rw(iocb, from, &famfs_iomap_ops); } static int @@ -211,7 +590,7 @@ const struct file_operations famfs_file_operations = { /* Custom famfs operations */ .write_iter = famfs_dax_write_iter, .read_iter = famfs_dax_read_iter, - .unlocked_ioctl = NULL /*famfs_file_ioctl*/, + .unlocked_ioctl = famfs_file_ioctl, .mmap = famfs_file_mmap, /* Force PMD alignment for mmap */ diff --git a/fs/famfs/famfs_internal.h b/fs/famfs/famfs_internal.h index 36efaef425e7..a45757d4cdea 100644 --- a/fs/famfs/famfs_internal.h +++ b/fs/famfs/famfs_internal.h @@ -11,8 +11,22 @@ #ifndef FAMFS_INTERNAL_H #define FAMFS_INTERNAL_H +#include + extern const struct file_operations famfs_file_operations; +/* + * Each famfs dax file has this hanging from its inode->i_private. + */ +struct famfs_file_meta { + bool error; + enum famfs_file_type file_type; + size_t file_size; + enum famfs_extent_type tfs_extent_type; + size_t tfs_extent_ct; + struct famfs_extent tfs_extents[]; +}; + struct famfs_mount_opts { umode_t mode; }; diff --git a/include/uapi/linux/famfs_ioctl.h b/include/uapi/linux/famfs_ioctl.h new file mode 100644 index 000000000000..97ff5a2a8d13 --- /dev/null +++ b/include/uapi/linux/famfs_ioctl.h @@ -0,0 +1,61 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * famfs - dax file system for shared fabric-attached memory + * + * Copyright 2023-2024 Micron Technology, Inc. + * + * This file system, originally based on ramfs the dax support from xfs, + * is intended to allow multiple host systems to mount a common file system + * view of dax files that map to shared memory. + */ +#ifndef FAMFS_IOCTL_H +#define FAMFS_IOCTL_H + +#include +#include + +#define FAMFS_KABI_VERSION 42 +#define FAMFS_MAX_EXTENTS 2 + +/* We anticipate the possiblity of supporting additional types of extents */ +enum famfs_extent_type { + SIMPLE_DAX_EXTENT, + INVALID_EXTENT_TYPE, +}; + +struct famfs_extent { + __u64 offset; + __u64 len; +}; + +enum famfs_file_type { + FAMFS_REG, + FAMFS_SUPERBLOCK, + FAMFS_LOG, +}; + +/** + * struct famfs_ioc_map - the famfs per-file metadata structure + * @extent_type: what type of extents are in this ext_list + * @file_type: Mark the superblock and log as special files. Maybe more later. + * @file_size: Size of the file, which is <= the size of the ext_list + * @ext_list_count: Number of extents + * @ext_list: 1 or more extents + */ +struct famfs_ioc_map { + enum famfs_extent_type extent_type; + enum famfs_file_type file_type; + __u64 file_size; + __u64 ext_list_count; + struct famfs_extent ext_list[FAMFS_MAX_EXTENTS]; +}; + +#define FAMFSIOC_MAGIC 'u' + +/* famfs file ioctl opcodes */ +#define FAMFSIOC_MAP_CREATE _IOW(FAMFSIOC_MAGIC, 0x50, struct famfs_ioc_map) +#define FAMFSIOC_MAP_GET _IOR(FAMFSIOC_MAGIC, 0x51, struct famfs_ioc_map) +#define FAMFSIOC_MAP_GETEXT _IOR(FAMFSIOC_MAGIC, 0x52, struct famfs_extent) +#define FAMFSIOC_NOP _IO(FAMFSIOC_MAGIC, 0x53) + +#endif /* FAMFS_IOCTL_H */