From patchwork Mon Dec 11 22:52:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Verma, Vishal L" X-Patchwork-Id: 13488192 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WZk2sOlN" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C524E9; Mon, 11 Dec 2023 14:52:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702335153; x=1733871153; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=j9znkpM/52KvUGuun0RJffYF/wG7fdqoc+th2+hzW9E=; b=WZk2sOlN0LGY5AV7mCaTqGMnCc+HaF8Q4l+c/WFx4Ttoj7YW0SYsmVqW 60T2/v4mzXxwfFkqmo0pD+pcO3Tsbr8LfuTI41dBJGHZtGKPchi4xT1bz CdNUKMuGfqpC0nq0g1RSdXBVVCkdHfdhP7H+BKlwIiyeOTTTAyPOQFlLr x2cRx1isJNSz3jQRxFwE0Ocz1JgcNsL7UbDs+GpJ2AuHzDhxQ/UBz+Elf gUpAx8wzODhMXydVF024eqifexzbaNu6AhomyndwaFJKsljbfSVoZ4xHG oJu3MlwxttIoUKYZNZJoty8O+wo2tXwOMFOhbUTdfMMT4hqXm+ypHC8HG g==; X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="8083758" X-IronPort-AV: E=Sophos;i="6.04,268,1695711600"; d="scan'208";a="8083758" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2023 14:52:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="946511331" X-IronPort-AV: E=Sophos;i="6.04,268,1695711600"; d="scan'208";a="946511331" Received: from tlyon-mobl2.amr.corp.intel.com (HELO [192.168.1.200]) ([10.212.89.19]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2023 14:52:28 -0800 From: Vishal Verma Date: Mon, 11 Dec 2023 15:52:17 -0700 Subject: [PATCH v3 1/2] Documentatiion/ABI: Add ABI documentation for sys-bus-dax Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20231211-vv-dax_abi-v3-1-acf6cc1bde9f@intel.com> References: <20231211-vv-dax_abi-v3-0-acf6cc1bde9f@intel.com> In-Reply-To: <20231211-vv-dax_abi-v3-0-acf6cc1bde9f@intel.com> To: Dan Williams , Vishal Verma , Dave Jiang Cc: linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, David Hildenbrand , Dave Hansen , Huang Ying X-Mailer: b4 0.13-dev-433a8 X-Developer-Signature: v=1; a=openpgp-sha256; l=6438; i=vishal.l.verma@intel.com; h=from:subject:message-id; bh=j9znkpM/52KvUGuun0RJffYF/wG7fdqoc+th2+hzW9E=; b=owGbwMvMwCXGf25diOft7jLG02pJDKnlk9bMuim0mXuWw0nuh/dfZ6inFYn11z7rcLO45yW29 Gj4jA6FjlIWBjEuBlkxRZa/ez4yHpPbns8TmOAIM4eVCWQIAxenAEzEup7hr5Dfo5gkN8ZjR1mL X0wK3bnqUMtpH1YrhQef1H4UrTieqc/wP9zMK6S2T/2VenRtXr3n38mqCRcOOK+y2KT1jFvXKM2 OBQA= X-Developer-Key: i=vishal.l.verma@intel.com; a=openpgp; fpr=F8682BE134C67A12332A2ED07AFA61BEA3B84DFF Add the missing sysfs ABI documentation for the device DAX subsystem. Various ABI attributes under this have been present since v5.1, and more have been added over time. In preparation for adding a new attribute, add this file with the historical details. Cc: Dan Williams Signed-off-by: Vishal Verma --- Documentation/ABI/testing/sysfs-bus-dax | 151 ++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax new file mode 100644 index 000000000000..a61a7b186017 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-dax @@ -0,0 +1,151 @@ +What: /sys/bus/dax/devices/daxX.Y/align +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (RW) Provides a way to specify an alignment for a dax device. + Values allowed are constrained by the physical address ranges + that back the dax device, and also by arch requirements. + +What: /sys/bus/dax/devices/daxX.Y/mapping +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (WO) Provides a way to allocate a mapping range under a dax + device. Specified in the format -. + +What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/start +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (RO) A dax device may have multiple constituent discontiguous + address ranges. These are represented by the different + 'mappingX' subdirectories. The 'start' attribute indicates the + start physical address for the given range. + +What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/end +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (RO) A dax device may have multiple constituent discontiguous + address ranges. These are represented by the different + 'mappingX' subdirectories. The 'end' attribute indicates the + end physical address for the given range. + +What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/page_offset +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (RO) A dax device may have multiple constituent discontiguous + address ranges. These are represented by the different + 'mappingX' subdirectories. The 'page_offset' attribute indicates the + offset of the current range in the dax device. + +What: /sys/bus/dax/devices/daxX.Y/resource +Date: June, 2019 +KernelVersion: v5.3 +Contact: nvdimm@lists.linux.dev +Description: + (RO) The resource attribute indicates the starting physical + address of a dax device. In case of a device with multiple + constituent ranges, it indicates the starting address of the + first range. + +What: /sys/bus/dax/devices/daxX.Y/size +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (RW) The size attribute indicates the total size of a dax + device. For creating subdivided dax devices, or for resizing + an existing device, the new size can be written to this as + part of the reconfiguration process. + +What: /sys/bus/dax/devices/daxX.Y/numa_node +Date: November, 2019 +KernelVersion: v5.5 +Contact: nvdimm@lists.linux.dev +Description: + (RO) If NUMA is enabled and the platform has affinitized the + backing device for this dax device, emit the CPU node + affinity for this device. + +What: /sys/bus/dax/devices/daxX.Y/target_node +Date: February, 2019 +KernelVersion: v5.1 +Contact: nvdimm@lists.linux.dev +Description: + (RO) The target-node attribute is the Linux numa-node that a + device-dax instance may create when it is online. Prior to + being online the device's 'numa_node' property reflects the + closest online cpu node which is the typical expectation of a + device 'numa_node'. Once it is online it becomes its own + distinct numa node. + +What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/available_size +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (RO) The available_size attribute tracks available dax region + capacity. This only applies to volatile hmem devices, not pmem + devices, since pmem devices are defined by nvdimm namespace + boundaries. + +What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/size +Date: July, 2017 +KernelVersion: v5.1 +Contact: nvdimm@lists.linux.dev +Description: + (RO) The size attribute indicates the size of a given dax region + in bytes. + +What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/align +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (RO) The align attribute indicates alignment of the dax region. + Changes on align may not always be valid, when say certain + mappings were created with 2M and then we switch to 1G. This + validates all ranges against the new value being attempted, post + resizing. + +What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/seed +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (RO) The seed device is a concept for dynamic dax regions to be + able to split the region amongst multiple sub-instances. The + seed device, similar to libnvdimm seed devices, is a device + that starts with zero capacity allocated and unbound to a + driver. + +What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/create +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (RW) The create interface to the dax region provides a way to + create a new unconfigured dax device under the given region, which + can then be configured (with a size etc.) and then probed. + +What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/delete +Date: October, 2020 +KernelVersion: v5.10 +Contact: nvdimm@lists.linux.dev +Description: + (WO) The delete interface for a dax region provides for deletion + of any 0-sized and idle dax devices. + +What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/id +Date: July, 2017 +KernelVersion: v5.1 +Contact: nvdimm@lists.linux.dev +Description: + (RO) The id attribute indicates the region id of a dax region. From patchwork Mon Dec 11 22:52:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Verma, Vishal L" X-Patchwork-Id: 13488193 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mnl9ktrZ" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7941BEA; Mon, 11 Dec 2023 14:52:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702335154; x=1733871154; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=ehImgNx/k9AKKVzaSKbOLo1Mb4OGFvOlrdKezvX51Qs=; b=mnl9ktrZGwft0/ztMbnaEwHkUOfGIrXRU4xt86tHV4frmXtPYovdRLoZ IMSC4a8pWqUwRubFfhtYBz0bb/MXhtcvfNmUObtHHoRHk8m643AYvw52S d6Fkkcywz7i3afFoqumBy/h2G8/61m2Nva+0KJ3OsUfeBkC8ZFy0p21HR 6CZx4H08XPXSAtBBnVGHc45SRLlA5KKKVNo56k85wB/HslnQCsOz/SPW1 UO3HX0gAqDxNREqo9mDzWSeqldoIrpzcwoJb795ZFfaBdHUPT//TiSDlH +HIEZnqaJE+m5z9fa7ra2d4q7F/uuGz0D9G2xf0mK6DXa3xCXhZVPzMrh Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="8083760" X-IronPort-AV: E=Sophos;i="6.04,268,1695711600"; d="scan'208";a="8083760" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2023 14:52:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="946511338" X-IronPort-AV: E=Sophos;i="6.04,268,1695711600"; d="scan'208";a="946511338" Received: from tlyon-mobl2.amr.corp.intel.com (HELO [192.168.1.200]) ([10.212.89.19]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2023 14:52:29 -0800 From: Vishal Verma Date: Mon, 11 Dec 2023 15:52:18 -0700 Subject: [PATCH v3 2/2] dax: add a sysfs knob to control memmap_on_memory behavior Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20231211-vv-dax_abi-v3-2-acf6cc1bde9f@intel.com> References: <20231211-vv-dax_abi-v3-0-acf6cc1bde9f@intel.com> In-Reply-To: <20231211-vv-dax_abi-v3-0-acf6cc1bde9f@intel.com> To: Dan Williams , Vishal Verma , Dave Jiang Cc: linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, David Hildenbrand , Dave Hansen , Huang Ying , Li Zhijian , Jonathan Cameron X-Mailer: b4 0.13-dev-433a8 X-Developer-Signature: v=1; a=openpgp-sha256; l=4067; i=vishal.l.verma@intel.com; h=from:subject:message-id; bh=ehImgNx/k9AKKVzaSKbOLo1Mb4OGFvOlrdKezvX51Qs=; b=owGbwMvMwCXGf25diOft7jLG02pJDKnlk9bs0Q08u36yeE9D5odPpy9eCq97f3DXZimjGV/ut wuXmyXv6ChlYRDjYpAVU2T5u+cj4zG57fk8gQmOMHNYmUCGMHBxCsBEWsUY/sp/05f8vH7V+2LJ dUduyLd3/SlOK1JcbvvFQTuv8drym+kMPxlNnmheye0rqIxzYlwU819lzccdx/O5BCPvnUnwMTD zZAMA X-Developer-Key: i=vishal.l.verma@intel.com; a=openpgp; fpr=F8682BE134C67A12332A2ED07AFA61BEA3B84DFF Add a sysfs knob for dax devices to control the memmap_on_memory setting if the dax device were to be hotplugged as system memory. The default memmap_on_memory setting for dax devices originating via pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to preserve legacy behavior. For dax devices via CXL, the default is on. The sysfs control allows the administrator to override the above defaults if needed. Cc: David Hildenbrand Cc: Dan Williams Cc: Dave Jiang Cc: Dave Hansen Cc: Huang Ying Tested-by: Li Zhijian Reviewed-by: Jonathan Cameron Reviewed-by: David Hildenbrand Signed-off-by: Vishal Verma --- drivers/dax/bus.c | 47 +++++++++++++++++++++++++++++++++ Documentation/ABI/testing/sysfs-bus-dax | 17 ++++++++++++ 2 files changed, 64 insertions(+) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 1ff1ab5fa105..2871e5188f0d 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1270,6 +1270,52 @@ static ssize_t numa_node_show(struct device *dev, } static DEVICE_ATTR_RO(numa_node); +static ssize_t memmap_on_memory_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dev_dax *dev_dax = to_dev_dax(dev); + + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory); +} + +static ssize_t memmap_on_memory_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct device_driver *drv = dev->driver; + struct dev_dax *dev_dax = to_dev_dax(dev); + struct dax_region *dax_region = dev_dax->region; + struct dax_device_driver *dax_drv = to_dax_drv(drv); + ssize_t rc; + bool val; + + rc = kstrtobool(buf, &val); + if (rc) + return rc; + + if (dev_dax->memmap_on_memory == val) + return len; + + device_lock(dax_region->dev); + if (!dax_region->dev->driver) { + device_unlock(dax_region->dev); + return -ENXIO; + } + + if (dax_drv->type == DAXDRV_KMEM_TYPE) { + device_unlock(dax_region->dev); + return -EBUSY; + } + + device_lock(dev); + dev_dax->memmap_on_memory = val; + device_unlock(dev); + + device_unlock(dax_region->dev); + return len; +} +static DEVICE_ATTR_RW(memmap_on_memory); + static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n) { struct device *dev = container_of(kobj, struct device, kobj); @@ -1296,6 +1342,7 @@ static struct attribute *dev_dax_attributes[] = { &dev_attr_align.attr, &dev_attr_resource.attr, &dev_attr_numa_node.attr, + &dev_attr_memmap_on_memory.attr, NULL, }; diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax index a61a7b186017..b1fd8bf8a7de 100644 --- a/Documentation/ABI/testing/sysfs-bus-dax +++ b/Documentation/ABI/testing/sysfs-bus-dax @@ -149,3 +149,20 @@ KernelVersion: v5.1 Contact: nvdimm@lists.linux.dev Description: (RO) The id attribute indicates the region id of a dax region. + +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory +Date: October, 2023 +KernelVersion: v6.8 +Contact: nvdimm@lists.linux.dev +Description: + (RW) Control the memmap_on_memory setting if the dax device + were to be hotplugged as system memory. This determines whether + the 'altmap' for the hotplugged memory will be placed on the + device being hotplugged (memmap_on_memory=1) or if it will be + placed on regular memory (memmap_on_memory=0). This attribute + must be set before the device is handed over to the 'kmem' + driver (i.e. hotplugged into system-ram). Additionally, this + depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled + memmap_on_memory parameter for memory_hotplug. This is + typically set on the kernel command line - + memory_hotplug.memmap_on_memory set to 'true' or 'force'."