From patchwork Thu Aug 1 00:29:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Verma, Vishal L" X-Patchwork-Id: 11069807 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2579213A4 for ; Thu, 1 Aug 2019 00:29:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 12A52282E8 for ; Thu, 1 Aug 2019 00:29:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0255F2834A; Thu, 1 Aug 2019 00:29:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 449F627F97 for ; Thu, 1 Aug 2019 00:29:40 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 8C5BD212FD410; Wed, 31 Jul 2019 17:32:10 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=vishal.l.verma@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 2C719212D2772 for ; Wed, 31 Jul 2019 17:32:08 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Jul 2019 17:29:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,332,1559545200"; d="scan'208";a="256388825" Received: from vverma7-desk1.lm.intel.com ([10.232.112.185]) by orsmga001.jf.intel.com with ESMTP; 31 Jul 2019 17:29:37 -0700 From: Vishal Verma To: Subject: [ndctl PATCH v9 00/13] daxctl: add a new reconfigure-device command Date: Wed, 31 Jul 2019 18:29:19 -0600 Message-Id: <20190801002932.26430-1-vishal.l.verma@intel.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dave Hansen , Pavel Tatashin Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Changes in v9: - Move the device model checking into the library. This way, daxctl-list can correctly determine 'state' which only applies to the dax-bus model. Changes in v8: - rename the --attempt-offline option to --force (Dan) - clarify the messages when device is already in the requested state (Dan) - s/unable/failed/ in device.c error messages (Dan) - daxctl_memory_{on,off}line() instead of daxctl_memory_set_{on,off}line (Dan) - Add an interface to get a count of the memory sections associated with a device (Dan) - As a result, refactor the readdir loop into a common memory_op function that can set the state, get the online state, and get a count of all blocks. - Update the onlining/offlining routines used in both the reconfigure-device and {on,off}line-memory commands to use the new daxctl_memory_num_sections() interface to validate the number of sections for which we changed the state. - Add some small clarifications in the daxctl-reconfigure-device man page (Dan) - In device.c add a verify_dax_bus_model() helper to check for the dax-bus subsystem (Dan). Changes in v7: - Fix a couple of checkaptch type errors in the new lines added in v6 (Dan). - Get rid of daxctl_dev_get_mode. daxctl_dev_get_memory is sufficient to both check the mode and allocate the memory related structures on its first call. (Dan) - Due to the above, daxctl_dev_mode is now private to libdaxctl, and not part of the API exported through libdaxctl.h - Add a large enough buffer at init time to construct dynamic paths, and avoid asprintf() type allocations for memory blocks at runtime (Dan). Changes in v6: - For memory block online/offline operations, the kernel responds with an EINVAL for both 'real' errors, and if the memory was already in the requested state. Since there is a TOCTOU hole between checking the state and storing it, just perform a second check if the store results in an error. If the check shows the state to be the same as the one we're attempting, it means that another agent (usually udev) won the race, but we don't care so long as the state change happened, so don't report an error. (Fan Du) Changes in v5: - device.c: correctly set loglevel for daxctl_ctx for --verbose - drop the subsys caching, its complexity started to exceed its benefit. dax-class device models will simply error out during reconfigure. (Dan) - Add a note to the man page for the above. - Clarify the onlining policy (online_movable) in the man page - rename "numa_node" to "target_node" in device listings (Dan) - When printing a device 'mode', assume devdax if !system-ram, avoiding a "mode: unknown" situation which can be confusing. (Dan) - Add a "state: disabled" attribute to the device listing if a driver is not bound. This is more apt than the previous "mode: unknown" listing. - add an api to get 'dev->resource' parsing /proc/iomem as a fallback for when the kernel doesn't provide the attribute (Dan) - convert node_* apis to 'memory_* apis that act on a new daxctl_memory object (Dan) - online only memory sections belonging to the device in question by cross referencing block indices with the dax device resource (Dan) - Refuse to reconfigure a device that is already in the target mode. Until now, reconfiguring a system-ram device back to system-ram would result in a 'online memory may not be hot-removed' kernel warning. - If the device was already in the system-ram mode, skip disabling/enabling, but still try to online the memory unless the --no-online option is in effect. - In daxctl_unbind, also 'remove_id' to prevent devices automatically binding to the kmem driver on a disable + re-enable, which can be surprising (Dan). - Rewrite the top half of daxctl/device.c to borrow elements from ndctl/namespace.c so that it can support growing additional commands that operate on devices (online-memory and offline-memory) - Refactor the bottom half of daxctl/device.c so we only do the disabling/offlining steps if the device was enabled. - Add new commands to online and offline memory sections (Dan) associated with a given dax device (Dan) - Add a new test - daxctl-device.sh - to test daxctl reconfigure-device, online-memory, and offline-memory commands. - Add an example in documentation demonstrating how to use numactl to bind a process to a node surfaced from a dax device (Andy Rudoff) Changes in v4: - Don't fail add_dax_dev for kmod failures. Instead fail only when the kmod list is actually used, i.e. during daxctl-reconfigure-device Changes in v3: - In daxctl_dev_get_mode(), remove the subsystem warning, detect dax-class and simply make it return devdax Changes in v2: - Add examples to the documentation page (Dave Hansen) - Clarify documentation regarding the conversion from system-ram to devdax - Remove any references to a persistent config from the documentation - those can be added when the feature is added. - device.c: validate option compatibility - daxctl-list: display numa_node for device listings - daxctl-list: display mode for device listings - make the options more consistent by adding a '-O' short option for --attempt-offline Add a new daxctl-reconfigure-device command that lets us reconfigure DAX devices back and forth between 'system-ram' and 'device-dax' modes. It also includes facilities to online any newly hot-plugged memory (default), and attempt to offline memory before converting away from the system-ram mode (not default, requires a --attempt-offline option). Currently missing from this series is a way to persistently store which devices have been 'marked' for use as system-ram. This depends on a config system overhaul in ndctl, and patches for those will follow separately and are independent of this work. Example invocations: 1. Reconfigure dax0.0 to system-ram mode, don’t online the memory # daxctl reconfigure-device --mode=system-ram --no-online dax0.0 [ { "chardev":"dax0.0", "size":16777216000, "target_node":2, "mode":"system-ram" } ] 2. Reconfigure dax0.0 to devdax mode, attempt to offline the memory # daxctl reconfigure-device --human --mode=devdax --attempt-offline dax0.0 { "chardev":"dax0.0", "size":"15.63 GiB (16.78 GB)", "target_node":2, "mode":"devdax" } 3. Reconfigure all dax devices on region0 to system-ram mode # daxctl reconfigure-device --mode=system-ram --region=0 all [ { "chardev":"dax0.0", "size":16777216000, "target_node":2, "mode":"system-ram" }, { "chardev":"dax0.1", "size":16777216000, "target_node":3, "mode":"system-ram" } ] These patches can also be found in the 'kmem-pending' branch on github: https://github.com/pmem/ndctl/tree/kmem-pending Cc: Dan Williams Cc: Dave Hansen Cc: Pavel Tatashin Vishal Verma (13): libdaxctl: add interfaces to get ctx and check device state libdaxctl: add interfaces to enable/disable devices libdaxctl: add an interface to retrieve the device resource libdaxctl: add a 'daxctl_memory' object for memory based operations daxctl/list: add target_node for device listings daxctl/list: display the mode for a dax device daxctl: add a new reconfigure-device command Documentation/daxctl: add a man page for daxctl-reconfigure-device daxctl: add commands to online and offline memory Documentation: Add man pages for daxctl-{on,off}line-memory contrib/ndctl: fix region-id completions for daxctl contrib/ndctl: add bash-completion for the new daxctl commands test: Add a unit test for daxctl-reconfigure-device and friends Documentation/daxctl/Makefile.am | 5 +- .../daxctl/daxctl-offline-memory.txt | 72 ++ Documentation/daxctl/daxctl-online-memory.txt | 80 ++ .../daxctl/daxctl-reconfigure-device.txt | 157 ++++ Makefile.am | 3 +- contrib/ndctl | 38 +- daxctl/Makefile.am | 2 + daxctl/builtin.h | 3 + daxctl/daxctl.c | 3 + daxctl/device.c | 543 +++++++++++++ daxctl/lib/Makefile.am | 5 +- daxctl/lib/libdaxctl-private.h | 40 + daxctl/lib/libdaxctl.c | 712 ++++++++++++++++++ daxctl/lib/libdaxctl.sym | 19 + daxctl/libdaxctl.h | 17 + test/Makefile.am | 3 +- test/common | 19 +- test/daxctl-devices.sh | 81 ++ util/iomem.c | 37 + util/iomem.h | 12 + util/json.c | 22 + 21 files changed, 1859 insertions(+), 14 deletions(-) create mode 100644 Documentation/daxctl/daxctl-offline-memory.txt create mode 100644 Documentation/daxctl/daxctl-online-memory.txt create mode 100644 Documentation/daxctl/daxctl-reconfigure-device.txt create mode 100644 daxctl/device.c create mode 100755 test/daxctl-devices.sh create mode 100644 util/iomem.c create mode 100644 util/iomem.h