From patchwork Thu Apr 25 17:54:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 10917563 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A133292A for ; Thu, 25 Apr 2019 17:54:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9339128B1C for ; Thu, 25 Apr 2019 17:54:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 86F4A28C0C; Thu, 25 Apr 2019 17:54:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9D07428B70 for ; Thu, 25 Apr 2019 17:54:46 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 5F94E2122B967; Thu, 25 Apr 2019 10:54:46 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=2607:f8b0:4864:20::841; helo=mail-qt1-x841.google.com; envelope-from=pasha.tatashin@soleen.com; receiver=linux-nvdimm@lists.01.org Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id E297B2120B130 for ; Thu, 25 Apr 2019 10:54:45 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id s10so1053244qtc.11 for ; Thu, 25 Apr 2019 10:54:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=KNp2ygAC3i6FKmzuLjHlJU1UqV45srtBI635+kYEZ+4=; b=Ak4fyqQhM8BxKff8fJEXf161Q1Q+odtuMhpTnuQUaVgZUmLE5+U2QcMZCNIxNsWpvr qxp/fnxisE1eVMxMg2cO54YQTaynDm09tw7mDW3edY33edar3pSTA/AbaTOZf5EmRzo0 0e/ALVbFhIcBTc110AA1mRfkafLVwM6EMHHHXgIdUFtP1TR+BlePj6R4lsVx+dDuREhg zn2OAnyf6W73zJ490JzMZEY+fP3dVE+pmATmY7Fc42tbURkvCNhWTKISdRowzAiVVUQk aE/eMAcZYyHLGrML4/luEEL04zutva8eybz6doAmBmVNKUwEf2ooj/7ZlFTdgw+VoeRB a9rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=KNp2ygAC3i6FKmzuLjHlJU1UqV45srtBI635+kYEZ+4=; b=L8tBr5u86lARrNymeLswq4/VuuqNoRPwpGoW5BsoPrs7LgKoi3GxzZe2RHErK6WTOg fwiw4TI8KJKGJFtIkuvFGKl5AUcs406DOsDUQ427jWrXjB4wo5iQaBugIiFXa5x6HG1n b1l2d094Hfl0Z63k8hDXUnFoDQ72EPszU0L4OKyln8TITJWk7iaUY7c2OeD2UHXqX6cL 5WP8PVwN5wRXMKcneVLiQEECozFqsIgjEftchWKn2XVHHTRm3Ua2HQey4npU1mJzExAb 2qYhdph1wJHDfVznLt9t6qhGhRaOvxrpMlKITRQ9W3sBp98cYaEJTFe9b6P6WwDQtv8r agbg== X-Gm-Message-State: APjAAAU3C8KUVmbA9an/vosct2kiRqe4INL4zJ5aYU8Ly9aIybesC69y 4aWxs3bD1RyoNeTM3UL7uvdFBg== X-Google-Smtp-Source: APXvYqyr9cp5oN/zmGj2t+i+id0VkfzdkKjneABv6Aqo2hHim6ED7PoKXjQJFBxK7dbJDJzvggzXyQ== X-Received: by 2002:ac8:18ea:: with SMTP id o39mr11398232qtk.290.1556214883289; Thu, 25 Apr 2019 10:54:43 -0700 (PDT) Received: from localhost.localdomain (c-73-69-118-222.hsd1.nh.comcast.net. [73.69.118.222]) by smtp.gmail.com with ESMTPSA id 7sm5950641qtx.20.2019.04.25.10.54.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Apr 2019 10:54:42 -0700 (PDT) From: Pavel Tatashin To: pasha.tatashin@soleen.com, jmorris@namei.org, sashal@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, akpm@linux-foundation.org, mhocko@suse.com, dave.hansen@linux.intel.com, dan.j.williams@intel.com, keith.busch@intel.com, vishal.l.verma@intel.com, dave.jiang@intel.com, zwisler@kernel.org, thomas.lendacky@amd.com, ying.huang@intel.com, fengguang.wu@intel.com, bp@suse.de, bhelgaas@google.com, baiyaowei@cmss.chinamobile.com, tiwai@suse.de, jglisse@redhat.com, david@redhat.com Subject: [v3 0/2] "Hotremove" persistent memory Date: Thu, 25 Apr 2019 13:54:38 -0400 Message-Id: <20190425175440.9354-1-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Changelog: v3 - Addressed comments from David Hildenbrand. Don't release lock_device_hotplug after checking memory status, and rename memblock_offlined_cb() to check_memblock_offlined_cb() v2 - Dan Williams mentioned that drv->remove() return is ignored by unbind. Unbind always succeeds. Because we cannot guarantee that memory can be offlined from the driver, don't even attempt to do so. Simply check that every section is offlined beforehand and only then proceed with removing dax memory. --- Recently, adding a persistent memory to be used like a regular RAM was added to Linux. This work extends this functionality to also allow hot removing persistent memory. We (Microsoft) have an important use case for this functionality. The requirement is for physical machines with small amount of RAM (~8G) to be able to reboot in a very short period of time (<1s). Yet, there is a userland state that is expensive to recreate (~2G). The solution is to boot machines with 2G preserved for persistent memory. Copy the state, and hotadd the persistent memory so machine still has all 8G available for runtime. Before reboot, offline and hotremove device-dax 2G, copy the memory that is needed to be preserved to pmem0 device, and reboot. The series of operations look like this: 1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps. and free ramdisk. 2. Convert raw pmem0 to devdax ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f 3. Hotadd to System RAM echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id echo online_movable > /sys/devices/system/memoryXXX/state 4. Before reboot hotremove device-dax memory from System RAM echo offline > /sys/devices/system/memoryXXX/state echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind 5. Create raw pmem0 device ndctl create-namespace --mode raw -e namespace0.0 -f 6. Copy the state that was stored by apps to ramdisk to pmem device 7. Do kexec reboot or reboot through firmware if firmware does not zero memory in pmem0 region (These machines have only regular volatile memory). So to have pmem0 device either memmap kernel parameter is used, or devices nodes in dtb are specified. Pavel Tatashin (2): device-dax: fix memory and resource leak if hotplug fails device-dax: "Hotremove" persistent memory that is used like normal RAM drivers/dax/dax-private.h | 2 + drivers/dax/kmem.c | 99 +++++++++++++++++++++++++++++++++++++-- 2 files changed, 96 insertions(+), 5 deletions(-)