From patchwork Thu May 7 00:41:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532085 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 017C6139F for ; Thu, 7 May 2020 00:43:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A3E7D2082E for ; Thu, 7 May 2020 00:43:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="F12Ge6+g" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3E7D2082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4EAF9900009; Wed, 6 May 2020 20:43:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 46F83900008; Wed, 6 May 2020 20:43:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27498900009; Wed, 6 May 2020 20:43:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0230.hostedemail.com [216.40.44.230]) by kanga.kvack.org (Postfix) with ESMTP id ED26B900008 for ; Wed, 6 May 2020 20:43:25 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id ACAC8D214 for ; Thu, 7 May 2020 00:43:25 +0000 (UTC) X-FDA: 76788074370.26.hope65_1dde07fa0eb5e X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30005:30029:30034:30045:30054:30064:30079:30090,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: hope65_1dde07fa0eb5e X-Filterd-Recvd-Size: 15101 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:25 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dCV0097456; Thu, 7 May 2020 00:42:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=qVxl2ErnbXf/1UqsteUDXYkeq/Y2P3I+THoVh2C3QXE=; b=F12Ge6+gJxng7kc3BrCPC3D0kmyef1tgkCH3+F5CcoYuRGu24wTmZa3/TcjafzoeZOS9 ikSoU4FJAyDyV/qJn5ar/FsIy1d004q/PoeCv6azJeFDITMxR8h9dp7SVmxnTOZKb9AI 4qlCQbRXjXxaAyw9j1AR/VPEtZx+IQmiKdEjDEX3IZpPldm+BeB1pZsq3vyVcNlfEWLt CO7x1jYRty6WsTkyXiEVohVZmSW+u3B9HhqBXcZuznb3HXtk7IVr3fe0ngyfQIwHDz4i sZZMbqmMg9srVVHz6ravN0p7iGmTbg8vz/JatdOHZHuw5lM6a7Dsq73U+0iLMzH4xok5 qA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 30usgq4gvj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:30 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bU4j098676; Thu, 7 May 2020 00:42:29 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 30sjnma0kq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:29 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470gNXr025322; Thu, 7 May 2020 00:42:23 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:23 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 01/43] mm: add PKRAM API stubs and Kconfig Date: Wed, 6 May 2020 17:41:27 -0700 Message-Id: <1588812129-8596-2-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Preserved-across-kexec memory or PKRAM is a method for saving memory pages of the currently executing kernel and restoring them after kexec boot into a new one. This can be utilized for preserving guest VM state, large in-memory databases, process memory, etc. across reboot. While DRAM-as-PMEM or actual persistent memory could be used to accomplish these things, PKRAM provides the latency of DRAM with the flexibility of dynamically determining the amount of memory to preserve. The proposed API: * Preserved memory is divided into nodes which can be saved or loaded independently of each other. The nodes are identified by unique name strings. A PKRAM node is created when save is initiated by calling pkram_prepare_save(). A PKRAM node is removed when load is initiated by calling pkram_prepare_load(). See below * A node is further divided into objects. An object represents a grouping of associated pages and any relevant metadata preserved with them. For example, the pages and attributes of a file. * For saving/loading data from a PKRAM node/object an instance of the pkram_stream struct is used. The struct is initialized by calling pkram_prepare_save() for saving data or pkram_prepare_load() for loading data. After save (load) is complete, pkram_finish_save() (pkram_finish_load()) must be called. If an error occurred during save, the saved data and the PKRAM node may be freed by calling pkram_discard_save() instead of pkram_finish_save(). * Both page data and byte data can separately be streamed to a PKRAM object. pkram_save_page() and pkram_load_page() are used to stream page data while pkram_write() and pkram_read() are used to stream byte data. A sequence of operations for saving/loading data from PKRAM would look like: * For saving data to PKRAM: /* create a PKRAM node and do initial stream setup */ pkram_prepare_save() /* create a PKRAM object associated with the PKRAM node and complete stream initialization */ pkram_prepare_save_obj() /* save data to the node/object */ pkram_save_page()[,...] /* for page stream, or pkram_write()[,...] * ... for byte stream */ pkram_finish_save_obj() /* commit the save or discard and delete the node */ pkram_finish_save() /* on success, or pkram_discard_save() * ... in case of error */ * For loading data from PKRAM: /* remove a PKRAM node from the list and do initial stream setup */ pkram_prepare_load() /* Remove a PKRAM object from the node and complete stream initializtion for loading data from it. */ pkram_prepare_load_obj() /* load data from the node/object */ pkram_load_page()[,...] /* for page stream, or pkram_read()[,...] * ... for byte stream */ /* free the object */ pkram_finish_load_obj() /* free the node */ pkram_finish_load() Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 32 ++++++++++ mm/Kconfig | 9 +++ mm/Makefile | 1 + mm/pkram.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 211 insertions(+) create mode 100644 include/linux/pkram.h create mode 100644 mm/pkram.c diff --git a/include/linux/pkram.h b/include/linux/pkram.h new file mode 100644 index 000000000000..4c4e13311ec8 --- /dev/null +++ b/include/linux/pkram.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PKRAM_H +#define _LINUX_PKRAM_H + +#include +#include +#include + +struct pkram_stream; + +#define PKRAM_NAME_MAX 256 /* including nul */ + +int pkram_prepare_save(struct pkram_stream *ps, const char *name, + gfp_t gfp_mask); +int pkram_prepare_save_obj(struct pkram_stream *ps); +void pkram_finish_save(struct pkram_stream *ps); +void pkram_finish_save_obj(struct pkram_stream *ps); +void pkram_discard_save(struct pkram_stream *ps); + +int pkram_prepare_load(struct pkram_stream *ps, const char *name); +int pkram_prepare_load_obj(struct pkram_stream *ps); +void pkram_finish_load(struct pkram_stream *ps); +void pkram_finish_load_obj(struct pkram_stream *ps); + +int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags); +struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, + short *flags); + +ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count); +size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count); + +#endif /* _LINUX_PKRAM_H */ diff --git a/mm/Kconfig b/mm/Kconfig index c1acc34c1c35..bddf20ecf6e1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -867,4 +867,13 @@ config ARCH_HAS_HUGEPD config MAPPING_DIRTY_HELPERS bool +config PKRAM + bool "Preserved-over-kexec memory storage" + default n + help + This option adds the kernel API that enables saving memory pages of + the currently executing kernel and restoring them after a kexec in + the newly booted one. This can be utilized for speeding up reboot by + leaving process memory and/or FS caches in-place. + endmenu diff --git a/mm/Makefile b/mm/Makefile index fccd3756b25f..59cd381194af 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -112,3 +112,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o +obj-$(CONFIG_PKRAM) += pkram.o diff --git a/mm/pkram.c b/mm/pkram.c new file mode 100644 index 000000000000..d6f2f79d4852 --- /dev/null +++ b/mm/pkram.c @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +/** + * Create a preserved memory node with name @name and initialize stream @ps + * for saving data to it. + * + * @gfp_mask specifies the memory allocation mask to be used when saving data. + * + * Returns 0 on success, -errno on failure. + * + * After the save has finished, pkram_finish_save() (or pkram_discard_save() in + * case of failure) is to be called. + */ +int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gfp_mask) +{ + return -ENOSYS; +} + +/** + * Create a preserved memory object and initialize stream @ps for saving data + * to it. + * + * Returns 0 on success, -errno on failure. + * + * After the save has finished, pkram_finish_save_obj() (or pkram_discard_save() + * in case of failure) is to be called. + */ +int pkram_prepare_save_obj(struct pkram_stream *ps) +{ + return -ENOSYS; +} + +/** + * Commit the object started with pkram_prepare_save_obj() to preserved memory. + */ +void pkram_finish_save_obj(struct pkram_stream *ps) +{ + BUG(); +} + +/** + * Commit the save to preserved memory started with pkram_prepare_save(). + * After the call, the stream may not be used any more. + */ +void pkram_finish_save(struct pkram_stream *ps) +{ + BUG(); +} + +/** + * Cancel the save to preserved memory started with pkram_prepare_save() and + * destroy the corresponding preserved memory node freeing any data already + * saved to it. + */ +void pkram_discard_save(struct pkram_stream *ps) +{ + BUG(); +} + +/** + * Remove the preserved memory node with name @name and initialize stream @ps + * for loading data from it. + * + * Returns 0 on success, -errno on failure. + * + * After the load has finished, pkram_finish_load() is to be called. + */ +int pkram_prepare_load(struct pkram_stream *ps, const char *name) +{ + return -ENOSYS; +} + +/** + * Remove the next preserved memory object from the stream @ps and + * initialize stream @ps for loading data from it. + * + * Returns 0 on success, -errno on failure. + * + * After the load has finished, pkram_finish_load_obj() is to be called. + */ +int pkram_prepare_load_obj(struct pkram_stream *ps) +{ + return -ENOSYS; +} + +/** + * Finish the load of a preserved memory object started with + * pkram_prepare_load_obj() freeing the object and any data that has not + * been loaded from it. + */ +void pkram_finish_load_obj(struct pkram_stream *ps) +{ + BUG(); +} + +/** + * Finish the load from preserved memory started with pkram_prepare_load() + * freeing the corresponding preserved memory node and any data that has + * not been loaded from it. + */ +void pkram_finish_load(struct pkram_stream *ps) +{ + BUG(); +} + +/** + * Save page @page to the preserved memory node and object associated with + * stream @ps. The stream must have been initialized with pkram_prepare_save() + * and pkram_prepare_save_obj(). + * + * @flags specifies supplemental page state to be preserved. + * + * Returns 0 on success, -errno on failure. + */ +int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags) +{ + return -ENOSYS; +} + +/** + * Load the next page from the preserved memory node and object associated + * with stream @ps. The stream must have been initialized with + * pkram_prepare_load() and pkram_prepare_load_obj(). + * + * If not NULL, @index is initialized with the preserved mapping offset of the + * page loaded. + * If not NULL, @flags is initialized with preserved supplemental state of the + * page loaded. + * + * Returns the page loaded or NULL if the node is empty. + * + * The page loaded has its refcount incremented. + */ +struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, short *flags) +{ + return NULL; +} + +/** + * Copy @count bytes from @buf to the preserved memory node and object + * associated with stream @ps. The stream must have been initialized with + * pkram_prepare_save() and pkram_prepare_save_obj(). + * + * On success, returns the number of bytes written, which is always equal to + * @count. On failure, -errno is returned. + */ +ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count) +{ + return -ENOSYS; +} + +/** + * Copy up to @count bytes from the preserved memory node and object + * associated with stream @ps to @buf. The stream must have been initialized + * with pkram_prepare_load() and pkram_prepare_load_obj(). + * + * Returns the number of bytes read, which may be less than @count if the node + * has fewer bytes available. + */ +size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count) +{ + return 0; +}