From patchwork Thu May 7 00:41:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532085 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 017C6139F for ; Thu, 7 May 2020 00:43:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A3E7D2082E for ; Thu, 7 May 2020 00:43:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="F12Ge6+g" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3E7D2082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4EAF9900009; Wed, 6 May 2020 20:43:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 46F83900008; Wed, 6 May 2020 20:43:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27498900009; Wed, 6 May 2020 20:43:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0230.hostedemail.com [216.40.44.230]) by kanga.kvack.org (Postfix) with ESMTP id ED26B900008 for ; Wed, 6 May 2020 20:43:25 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id ACAC8D214 for ; Thu, 7 May 2020 00:43:25 +0000 (UTC) X-FDA: 76788074370.26.hope65_1dde07fa0eb5e X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30005:30029:30034:30045:30054:30064:30079:30090,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: hope65_1dde07fa0eb5e X-Filterd-Recvd-Size: 15101 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:25 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dCV0097456; Thu, 7 May 2020 00:42:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=qVxl2ErnbXf/1UqsteUDXYkeq/Y2P3I+THoVh2C3QXE=; b=F12Ge6+gJxng7kc3BrCPC3D0kmyef1tgkCH3+F5CcoYuRGu24wTmZa3/TcjafzoeZOS9 ikSoU4FJAyDyV/qJn5ar/FsIy1d004q/PoeCv6azJeFDITMxR8h9dp7SVmxnTOZKb9AI 4qlCQbRXjXxaAyw9j1AR/VPEtZx+IQmiKdEjDEX3IZpPldm+BeB1pZsq3vyVcNlfEWLt CO7x1jYRty6WsTkyXiEVohVZmSW+u3B9HhqBXcZuznb3HXtk7IVr3fe0ngyfQIwHDz4i sZZMbqmMg9srVVHz6ravN0p7iGmTbg8vz/JatdOHZHuw5lM6a7Dsq73U+0iLMzH4xok5 qA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 30usgq4gvj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:30 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bU4j098676; Thu, 7 May 2020 00:42:29 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 30sjnma0kq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:29 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470gNXr025322; Thu, 7 May 2020 00:42:23 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:23 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 01/43] mm: add PKRAM API stubs and Kconfig Date: Wed, 6 May 2020 17:41:27 -0700 Message-Id: <1588812129-8596-2-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Preserved-across-kexec memory or PKRAM is a method for saving memory pages of the currently executing kernel and restoring them after kexec boot into a new one. This can be utilized for preserving guest VM state, large in-memory databases, process memory, etc. across reboot. While DRAM-as-PMEM or actual persistent memory could be used to accomplish these things, PKRAM provides the latency of DRAM with the flexibility of dynamically determining the amount of memory to preserve. The proposed API: * Preserved memory is divided into nodes which can be saved or loaded independently of each other. The nodes are identified by unique name strings. A PKRAM node is created when save is initiated by calling pkram_prepare_save(). A PKRAM node is removed when load is initiated by calling pkram_prepare_load(). See below * A node is further divided into objects. An object represents a grouping of associated pages and any relevant metadata preserved with them. For example, the pages and attributes of a file. * For saving/loading data from a PKRAM node/object an instance of the pkram_stream struct is used. The struct is initialized by calling pkram_prepare_save() for saving data or pkram_prepare_load() for loading data. After save (load) is complete, pkram_finish_save() (pkram_finish_load()) must be called. If an error occurred during save, the saved data and the PKRAM node may be freed by calling pkram_discard_save() instead of pkram_finish_save(). * Both page data and byte data can separately be streamed to a PKRAM object. pkram_save_page() and pkram_load_page() are used to stream page data while pkram_write() and pkram_read() are used to stream byte data. A sequence of operations for saving/loading data from PKRAM would look like: * For saving data to PKRAM: /* create a PKRAM node and do initial stream setup */ pkram_prepare_save() /* create a PKRAM object associated with the PKRAM node and complete stream initialization */ pkram_prepare_save_obj() /* save data to the node/object */ pkram_save_page()[,...] /* for page stream, or pkram_write()[,...] * ... for byte stream */ pkram_finish_save_obj() /* commit the save or discard and delete the node */ pkram_finish_save() /* on success, or pkram_discard_save() * ... in case of error */ * For loading data from PKRAM: /* remove a PKRAM node from the list and do initial stream setup */ pkram_prepare_load() /* Remove a PKRAM object from the node and complete stream initializtion for loading data from it. */ pkram_prepare_load_obj() /* load data from the node/object */ pkram_load_page()[,...] /* for page stream, or pkram_read()[,...] * ... for byte stream */ /* free the object */ pkram_finish_load_obj() /* free the node */ pkram_finish_load() Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 32 ++++++++++ mm/Kconfig | 9 +++ mm/Makefile | 1 + mm/pkram.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 211 insertions(+) create mode 100644 include/linux/pkram.h create mode 100644 mm/pkram.c diff --git a/include/linux/pkram.h b/include/linux/pkram.h new file mode 100644 index 000000000000..4c4e13311ec8 --- /dev/null +++ b/include/linux/pkram.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PKRAM_H +#define _LINUX_PKRAM_H + +#include +#include +#include + +struct pkram_stream; + +#define PKRAM_NAME_MAX 256 /* including nul */ + +int pkram_prepare_save(struct pkram_stream *ps, const char *name, + gfp_t gfp_mask); +int pkram_prepare_save_obj(struct pkram_stream *ps); +void pkram_finish_save(struct pkram_stream *ps); +void pkram_finish_save_obj(struct pkram_stream *ps); +void pkram_discard_save(struct pkram_stream *ps); + +int pkram_prepare_load(struct pkram_stream *ps, const char *name); +int pkram_prepare_load_obj(struct pkram_stream *ps); +void pkram_finish_load(struct pkram_stream *ps); +void pkram_finish_load_obj(struct pkram_stream *ps); + +int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags); +struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, + short *flags); + +ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count); +size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count); + +#endif /* _LINUX_PKRAM_H */ diff --git a/mm/Kconfig b/mm/Kconfig index c1acc34c1c35..bddf20ecf6e1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -867,4 +867,13 @@ config ARCH_HAS_HUGEPD config MAPPING_DIRTY_HELPERS bool +config PKRAM + bool "Preserved-over-kexec memory storage" + default n + help + This option adds the kernel API that enables saving memory pages of + the currently executing kernel and restoring them after a kexec in + the newly booted one. This can be utilized for speeding up reboot by + leaving process memory and/or FS caches in-place. + endmenu diff --git a/mm/Makefile b/mm/Makefile index fccd3756b25f..59cd381194af 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -112,3 +112,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o +obj-$(CONFIG_PKRAM) += pkram.o diff --git a/mm/pkram.c b/mm/pkram.c new file mode 100644 index 000000000000..d6f2f79d4852 --- /dev/null +++ b/mm/pkram.c @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +/** + * Create a preserved memory node with name @name and initialize stream @ps + * for saving data to it. + * + * @gfp_mask specifies the memory allocation mask to be used when saving data. + * + * Returns 0 on success, -errno on failure. + * + * After the save has finished, pkram_finish_save() (or pkram_discard_save() in + * case of failure) is to be called. + */ +int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gfp_mask) +{ + return -ENOSYS; +} + +/** + * Create a preserved memory object and initialize stream @ps for saving data + * to it. + * + * Returns 0 on success, -errno on failure. + * + * After the save has finished, pkram_finish_save_obj() (or pkram_discard_save() + * in case of failure) is to be called. + */ +int pkram_prepare_save_obj(struct pkram_stream *ps) +{ + return -ENOSYS; +} + +/** + * Commit the object started with pkram_prepare_save_obj() to preserved memory. + */ +void pkram_finish_save_obj(struct pkram_stream *ps) +{ + BUG(); +} + +/** + * Commit the save to preserved memory started with pkram_prepare_save(). + * After the call, the stream may not be used any more. + */ +void pkram_finish_save(struct pkram_stream *ps) +{ + BUG(); +} + +/** + * Cancel the save to preserved memory started with pkram_prepare_save() and + * destroy the corresponding preserved memory node freeing any data already + * saved to it. + */ +void pkram_discard_save(struct pkram_stream *ps) +{ + BUG(); +} + +/** + * Remove the preserved memory node with name @name and initialize stream @ps + * for loading data from it. + * + * Returns 0 on success, -errno on failure. + * + * After the load has finished, pkram_finish_load() is to be called. + */ +int pkram_prepare_load(struct pkram_stream *ps, const char *name) +{ + return -ENOSYS; +} + +/** + * Remove the next preserved memory object from the stream @ps and + * initialize stream @ps for loading data from it. + * + * Returns 0 on success, -errno on failure. + * + * After the load has finished, pkram_finish_load_obj() is to be called. + */ +int pkram_prepare_load_obj(struct pkram_stream *ps) +{ + return -ENOSYS; +} + +/** + * Finish the load of a preserved memory object started with + * pkram_prepare_load_obj() freeing the object and any data that has not + * been loaded from it. + */ +void pkram_finish_load_obj(struct pkram_stream *ps) +{ + BUG(); +} + +/** + * Finish the load from preserved memory started with pkram_prepare_load() + * freeing the corresponding preserved memory node and any data that has + * not been loaded from it. + */ +void pkram_finish_load(struct pkram_stream *ps) +{ + BUG(); +} + +/** + * Save page @page to the preserved memory node and object associated with + * stream @ps. The stream must have been initialized with pkram_prepare_save() + * and pkram_prepare_save_obj(). + * + * @flags specifies supplemental page state to be preserved. + * + * Returns 0 on success, -errno on failure. + */ +int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags) +{ + return -ENOSYS; +} + +/** + * Load the next page from the preserved memory node and object associated + * with stream @ps. The stream must have been initialized with + * pkram_prepare_load() and pkram_prepare_load_obj(). + * + * If not NULL, @index is initialized with the preserved mapping offset of the + * page loaded. + * If not NULL, @flags is initialized with preserved supplemental state of the + * page loaded. + * + * Returns the page loaded or NULL if the node is empty. + * + * The page loaded has its refcount incremented. + */ +struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, short *flags) +{ + return NULL; +} + +/** + * Copy @count bytes from @buf to the preserved memory node and object + * associated with stream @ps. The stream must have been initialized with + * pkram_prepare_save() and pkram_prepare_save_obj(). + * + * On success, returns the number of bytes written, which is always equal to + * @count. On failure, -errno is returned. + */ +ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count) +{ + return -ENOSYS; +} + +/** + * Copy up to @count bytes from the preserved memory node and object + * associated with stream @ps to @buf. The stream must have been initialized + * with pkram_prepare_load() and pkram_prepare_load_obj(). + * + * Returns the number of bytes read, which may be less than @count if the node + * has fewer bytes available. + */ +size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count) +{ + return 0; +} From patchwork Thu May 7 00:41:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532077 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 74429139F for ; Thu, 7 May 2020 00:43:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 33245207DD for ; Thu, 7 May 2020 00:43:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="R7/5c2I5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33245207DD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C3BEC900004; Wed, 6 May 2020 20:43:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B2258900006; Wed, 6 May 2020 20:43:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 780AF900002; Wed, 6 May 2020 20:43:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id 4FDB3900004 for ; Wed, 6 May 2020 20:43:23 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F1F62D214 for ; Thu, 7 May 2020 00:43:22 +0000 (UTC) X-FDA: 76788074244.28.arch36_1d77c3a45b212 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: arch36_1d77c3a45b212 X-Filterd-Recvd-Size: 10768 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:22 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bhgt064591; Thu, 7 May 2020 00:42:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=psiEr0BjUwGfJSVN/B86fSYWhjKbmypK5l+ecN1OxxM=; b=R7/5c2I5iJXHOrpMqsRsPsUY3YR1Eb6PigW5wVdcEPriZiQNIRJI1YTZNLf05qgQqxdw /jGJDui8JsJ3E6FWQZxyQXQrAgzs5Gv2peYa8o6sGC3CtC5lKgbwxuH+aJK7y0HbSfDO YbaDIfLoLf+l0vFOtbz/jTTbnpXsU3FGCR9WgA9o9kp3PGvMP2748pfjDk5YB0ExGYVh EP8mXMfE6+YSKEyCW2Mfh0YKMduDSjyMP0xdJovzXnL87s+k8+YteZjPDFH7pdmDstAM qsHU+6liXhnaHJN9iKnpLHjUDIUMqDCoywiB/0X6ZW4JUVKeqjOiHzYSmIQTxQ4PU9Zw aw== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 30s09rdf5d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:29 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470aoVc136163; Thu, 7 May 2020 00:42:28 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3030.oracle.com with ESMTP id 30sjdwrpvd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:28 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470gQlv019614; Thu, 7 May 2020 00:42:26 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:26 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 02/43] mm: PKRAM: implement node load and save functions Date: Wed, 6 May 2020 17:41:28 -0700 Message-Id: <1588812129-8596-3-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1011 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Preserved memory is divided into nodes which can be saved and loaded independently of each other. PKRAM nodes are kept on a list and identified by unique names. Whenever a save operation is initiated by calling pkram_prepare_save(), a new node is created and linked to the list. When the save operation has been committed by calling pkram_finish_save(), the node becomes loadable. A load operation can be then initiated by calling pkram_prepare_load() which deletes the node from the list and prepares the corresponding stream for loading data from it. After the load has been finished, the pkram_finish_load() function must be called to free the node. Nodes are also deleted when a save operation is discarded, i.e. pkram_discard_save() is called instead of pkram_finish_save(). Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 7 ++- mm/pkram.c | 148 ++++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 149 insertions(+), 6 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 4c4e13311ec8..83a0579e4c1c 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -6,7 +6,12 @@ #include #include -struct pkram_stream; +struct pkram_node; + +struct pkram_stream { + gfp_t gfp_mask; + struct pkram_node *node; +}; #define PKRAM_NAME_MAX 256 /* including nul */ diff --git a/mm/pkram.c b/mm/pkram.c index d6f2f79d4852..5c57126353ff 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -2,16 +2,85 @@ #include #include #include +#include #include +#include #include +#include #include +/* + * Preserved memory is divided into nodes that can be saved or loaded + * independently of each other. The nodes are identified by unique name + * strings. + * + * The structure occupies a memory page. + */ +struct pkram_node { + __u32 flags; + + __u8 name[PKRAM_NAME_MAX]; +}; + +#define PKRAM_SAVE 1 +#define PKRAM_LOAD 2 +#define PKRAM_ACCMODE_MASK 3 + +static LIST_HEAD(pkram_nodes); /* linked through page::lru */ +static DEFINE_MUTEX(pkram_mutex); /* serializes open/close */ + +static inline struct page *pkram_alloc_page(gfp_t gfp_mask) +{ + return alloc_page(gfp_mask); +} + +static inline void pkram_free_page(void *addr) +{ + free_page((unsigned long)addr); +} + +static inline void pkram_insert_node(struct pkram_node *node) +{ + list_add(&virt_to_page(node)->lru, &pkram_nodes); +} + +static inline void pkram_delete_node(struct pkram_node *node) +{ + list_del(&virt_to_page(node)->lru); +} + +static struct pkram_node *pkram_find_node(const char *name) +{ + struct page *page; + struct pkram_node *node; + + list_for_each_entry(page, &pkram_nodes, lru) { + node = page_address(page); + if (strcmp(node->name, name) == 0) + return node; + } + return NULL; +} + +static void pkram_stream_init(struct pkram_stream *ps, + struct pkram_node *node, gfp_t gfp_mask) +{ + memset(ps, 0, sizeof(*ps)); + ps->gfp_mask = gfp_mask; + ps->node = node; +} + /** * Create a preserved memory node with name @name and initialize stream @ps * for saving data to it. * * @gfp_mask specifies the memory allocation mask to be used when saving data. * + * Error values: + * %ENAMETOOLONG: name len >= PKRAM_NAME_MAX + * %ENOMEM: insufficient memory available + * %EEXIST: node with specified name already exists + * * Returns 0 on success, -errno on failure. * * After the save has finished, pkram_finish_save() (or pkram_discard_save() in @@ -19,7 +88,34 @@ */ int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gfp_mask) { - return -ENOSYS; + struct page *page; + struct pkram_node *node; + int err = 0; + + if (strlen(name) >= PKRAM_NAME_MAX) + return -ENAMETOOLONG; + + page = pkram_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return -ENOMEM; + node = page_address(page); + + node->flags = PKRAM_SAVE; + strcpy(node->name, name); + + mutex_lock(&pkram_mutex); + if (!pkram_find_node(name)) + pkram_insert_node(node); + else + err = -EEXIST; + mutex_unlock(&pkram_mutex); + if (err) { + __free_page(page); + return err; + } + + pkram_stream_init(ps, node, gfp_mask); + return 0; } /** @@ -50,7 +146,12 @@ void pkram_finish_save_obj(struct pkram_stream *ps) */ void pkram_finish_save(struct pkram_stream *ps) { - BUG(); + struct pkram_node *node = ps->node; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_SAVE); + + smp_wmb(); + node->flags &= ~PKRAM_ACCMODE_MASK; } /** @@ -60,7 +161,15 @@ void pkram_finish_save(struct pkram_stream *ps) */ void pkram_discard_save(struct pkram_stream *ps) { - BUG(); + struct pkram_node *node = ps->node; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_SAVE); + + mutex_lock(&pkram_mutex); + pkram_delete_node(node); + mutex_unlock(&pkram_mutex); + + pkram_free_page(node); } /** @@ -69,11 +178,36 @@ void pkram_discard_save(struct pkram_stream *ps) * * Returns 0 on success, -errno on failure. * + * Error values: + * %ENOENT: node with specified name does not exist + * %EBUSY: save to required node has not finished yet + * * After the load has finished, pkram_finish_load() is to be called. */ int pkram_prepare_load(struct pkram_stream *ps, const char *name) { - return -ENOSYS; + struct pkram_node *node; + int err = 0; + + mutex_lock(&pkram_mutex); + node = pkram_find_node(name); + if (!node) { + err = -ENOENT; + goto out_unlock; + } + if (node->flags & PKRAM_ACCMODE_MASK) { + err = -EBUSY; + goto out_unlock; + } + pkram_delete_node(node); +out_unlock: + mutex_unlock(&pkram_mutex); + if (err) + return err; + + node->flags |= PKRAM_LOAD; + pkram_stream_init(ps, node, 0); + return 0; } /** @@ -106,7 +240,11 @@ void pkram_finish_load_obj(struct pkram_stream *ps) */ void pkram_finish_load(struct pkram_stream *ps) { - BUG(); + struct pkram_node *node = ps->node; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_LOAD); + + pkram_free_page(node); } /** From patchwork Thu May 7 00:41:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532149 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E9D4C139A for ; Thu, 7 May 2020 00:45:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A54522082E for ; Thu, 7 May 2020 00:45:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="q0YZFZs5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A54522082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CD4DC900016; Wed, 6 May 2020 20:45:02 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C84CC900003; Wed, 6 May 2020 20:45:02 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B99BE900016; Wed, 6 May 2020 20:45:02 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0148.hostedemail.com [216.40.44.148]) by kanga.kvack.org (Postfix) with ESMTP id 9E418900003 for ; Wed, 6 May 2020 20:45:02 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5C558180AD811 for ; Thu, 7 May 2020 00:45:02 +0000 (UTC) X-FDA: 76788078444.14.meat65_2beeac460fc42 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30029:30054:30064,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: meat65_2beeac460fc42 X-Filterd-Recvd-Size: 9750 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:01 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dPdL097637; Thu, 7 May 2020 00:44:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=1Bmtu9b1EQ6RlXRXyCYx+5Wul/Cgv66PAY+KBFTTzmA=; b=q0YZFZs5h7XIW8HJYTzZsL5L3nYa8h3YEneyLm7sBPO8PNakw7I55XIpiJbhfov8LQyS s0IPhqXHyetzu/uiDcvdAf7KB81CmiyZPnlGfB3fzqvXlbC0YzFyEhZ5WI7VMjs/Jq/V 6DFD8tHUn3dNCVwx8P4IzgdDb3MZfbHkhgOjByJRcZIZjn5laIk4/S7z/jCdxiIprwlc Ax2XInFVGmSzckUrDSqfFnTdC634xgx0X9EnssZX8Ss0FUzXcKM6+9zZI/AWXces2T7A k/O5iUDBMFwqDtChOZ3byOot3qpmPDtn455qqvcgVjvlIiZWPpg4XOL61Gfrh2EKaH+T oA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 30usgq4h1q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:31 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470biu3098934; Thu, 7 May 2020 00:42:31 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 30sjnma0nx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:31 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470gTGx029257; Thu, 7 May 2020 00:42:29 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:29 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 03/43] mm: PKRAM: implement object load and save functions Date: Wed, 6 May 2020 17:41:29 -0700 Message-Id: <1588812129-8596-4-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: PKRAM nodes are further divided into a list of objects. After a save operation has been initiated for a node, a save operation for an object associated with the node is initiated by calling pkram_prepare_save_obj(). A new object is created and linked to the node. The save operation for the object is committed by calling pkram_finish_save_obj(). After a load operation has been initiated, pkram_prepare_load_obj() is called to delete the next object from the node and prepare the corresponding stream for loading data from it. After the load of object has been finished, pkram_finish_load_obj() is called to free the object. Objects are also deleted when a save operation is discarded. Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 1 + mm/pkram.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 74 insertions(+), 4 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 83a0579e4c1c..fabde2cd8203 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -11,6 +11,7 @@ struct pkram_node; struct pkram_stream { gfp_t gfp_mask; struct pkram_node *node; + struct pkram_obj *obj; }; #define PKRAM_NAME_MAX 256 /* including nul */ diff --git a/mm/pkram.c b/mm/pkram.c index 5c57126353ff..4934ffd8b019 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -6,9 +6,14 @@ #include #include #include +#include #include #include +struct pkram_obj { + __u64 obj_pfn; /* points to the next object in the list */ +}; + /* * Preserved memory is divided into nodes that can be saved or loaded * independently of each other. The nodes are identified by unique name @@ -18,6 +23,7 @@ */ struct pkram_node { __u32 flags; + __u64 obj_pfn; /* points to the first obj of the node */ __u8 name[PKRAM_NAME_MAX]; }; @@ -62,6 +68,21 @@ static struct pkram_node *pkram_find_node(const char *name) return NULL; } +static void pkram_truncate_node(struct pkram_node *node) +{ + unsigned long obj_pfn; + struct pkram_obj *obj; + + obj_pfn = node->obj_pfn; + while (obj_pfn) { + obj = pfn_to_kaddr(obj_pfn); + obj_pfn = obj->obj_pfn; + pkram_free_page(obj); + cond_resched(); + } + node->obj_pfn = 0; +} + static void pkram_stream_init(struct pkram_stream *ps, struct pkram_node *node, gfp_t gfp_mask) { @@ -70,6 +91,11 @@ static void pkram_stream_init(struct pkram_stream *ps, ps->node = node; } +static void pkram_stream_init_obj(struct pkram_stream *ps, struct pkram_obj *obj) +{ + ps->obj = obj; +} + /** * Create a preserved memory node with name @name and initialize stream @ps * for saving data to it. @@ -124,12 +150,31 @@ int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gfp_mask * * Returns 0 on success, -errno on failure. * + * Error values: + * %ENOMEM: insufficient memory available + * * After the save has finished, pkram_finish_save_obj() (or pkram_discard_save() * in case of failure) is to be called. */ int pkram_prepare_save_obj(struct pkram_stream *ps) { - return -ENOSYS; + struct pkram_node *node = ps->node; + struct pkram_obj *obj; + struct page *page; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_SAVE); + + page = pkram_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return -ENOMEM; + obj = page_address(page); + + if (node->obj_pfn) + obj->obj_pfn = node->obj_pfn; + node->obj_pfn = page_to_pfn(page); + + pkram_stream_init_obj(ps, obj); + return 0; } /** @@ -137,7 +182,9 @@ int pkram_prepare_save_obj(struct pkram_stream *ps) */ void pkram_finish_save_obj(struct pkram_stream *ps) { - BUG(); + struct pkram_node *node = ps->node; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_SAVE); } /** @@ -169,6 +216,7 @@ void pkram_discard_save(struct pkram_stream *ps) pkram_delete_node(node); mutex_unlock(&pkram_mutex); + pkram_truncate_node(node); pkram_free_page(node); } @@ -216,11 +264,26 @@ int pkram_prepare_load(struct pkram_stream *ps, const char *name) * * Returns 0 on success, -errno on failure. * + * Error values: + * %ENODATA: Stream @ps has no preserved memory objects + * * After the load has finished, pkram_finish_load_obj() is to be called. */ int pkram_prepare_load_obj(struct pkram_stream *ps) { - return -ENOSYS; + struct pkram_node *node = ps->node; + struct pkram_obj *obj; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_LOAD); + + if (!node->obj_pfn) + return -ENODATA; + + obj = pfn_to_kaddr(node->obj_pfn); + node->obj_pfn = obj->obj_pfn; + + pkram_stream_init_obj(ps, obj); + return 0; } /** @@ -230,7 +293,12 @@ int pkram_prepare_load_obj(struct pkram_stream *ps) */ void pkram_finish_load_obj(struct pkram_stream *ps) { - BUG(); + struct pkram_node *node = ps->node; + struct pkram_obj *obj = ps->obj; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_LOAD); + + pkram_free_page(obj); } /** @@ -244,6 +312,7 @@ void pkram_finish_load(struct pkram_stream *ps) BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_LOAD); + pkram_truncate_node(node); pkram_free_page(node); } From patchwork Thu May 7 00:41:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532163 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F10D139F for ; Thu, 7 May 2020 00:45:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0EDE0215A4 for ; Thu, 7 May 2020 00:45:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="z0I83G48" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0EDE0215A4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B4EA590001A; Wed, 6 May 2020 20:45:13 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B2483900003; Wed, 6 May 2020 20:45:13 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99F2290001A; Wed, 6 May 2020 20:45:13 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0046.hostedemail.com [216.40.44.46]) by kanga.kvack.org (Postfix) with ESMTP id 75AE8900003 for ; Wed, 6 May 2020 20:45:13 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 31521181AEF30 for ; Thu, 7 May 2020 00:45:13 +0000 (UTC) X-FDA: 76788078906.08.hand66_2d80f13b21e0d X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30005:30029:30045:30051:30054:30064:30070,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: hand66_2d80f13b21e0d X-Filterd-Recvd-Size: 13429 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:12 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470d0Rh076188; Thu, 7 May 2020 00:44:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=f+HGLsLngPlAifDSO//pUsSOr/CSejc37nW9hYK4Qnc=; b=z0I83G489kwqDnyZRxNRtWp9rtxWw7wN408iq27oUAdeZsReLpruWyziVHztS77bwNn6 x4VzjF0+NjhFtXQj7HgxodLRuPF/MU+RRfM9sfl3x9rMUUrsBdX22KNaFVQkYvR4gb0G GIjDZSvY41UDl8EgddIeJnvfziKYGA2CqzFNcdYx9zoSVRhTJwRy5n2uX5d47aRqOrk+ 1BeJQ/DCmF1Ob90s3tFwY/LSrv38CU1FjPItSL993Dk2IvBm7qocagzg7fQ24s7z6obz EbD4dw7gn8tsfkOOTFjypce7GoaQE0XDekxtlI9ypFBbwZbeY8Pw1s872vI9ObIqDmab DQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 30s09rdfc3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:36 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470anji170838; Thu, 7 May 2020 00:42:36 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 30us7p2kmn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:35 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470gWjC025345; Thu, 7 May 2020 00:42:32 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:32 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 04/43] mm: PKRAM: implement page stream operations Date: Wed, 6 May 2020 17:41:30 -0700 Message-Id: <1588812129-8596-5-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Using the pkram_save_page() function, one can populate PKRAM objects with memory pages which can later be loaded using the pkram_load_page() function. Saving a memory page to PKRAM is accomplished by recording its pfn and incrementing its refcount so that it will not be freed after the last user puts it. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 5 ++ mm/pkram.c | 219 +++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 221 insertions(+), 3 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index fabde2cd8203..f338d1c2aeb6 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -12,6 +12,11 @@ struct pkram_stream { gfp_t gfp_mask; struct pkram_node *node; struct pkram_obj *obj; + + struct pkram_link *link; /* current link */ + unsigned int entry_idx; /* next entry in link */ + + unsigned long next_index; }; #define PKRAM_NAME_MAX 256 /* including nul */ diff --git a/mm/pkram.c b/mm/pkram.c index 4934ffd8b019..ab3053ca3539 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -10,8 +11,38 @@ #include #include +#include "internal.h" + + +/* + * Represents a reference to a data page saved to PKRAM. + */ +typedef __u64 pkram_entry_t; + +#define PKRAM_ENTRY_FLAGS_SHIFT 0x5 +#define PKRAM_ENTRY_FLAGS_MASK 0x7f + +/* + * Keeps references to data pages saved to PKRAM. + * The structure occupies a memory page. + */ +struct pkram_link { + __u64 link_pfn; /* points to the next link of the object */ + __u64 index; /* mapping index of first pkram_entry_t */ + + /* + * the array occupies the rest of the link page; if the link is not + * full, the rest of the array must be filled with zeros + */ + pkram_entry_t entry[0]; +}; + +#define PKRAM_LINK_ENTRIES_MAX \ + ((PAGE_SIZE-sizeof(struct pkram_link))/sizeof(pkram_entry_t)) + struct pkram_obj { - __u64 obj_pfn; /* points to the next object in the list */ + __u64 link_pfn; /* points to the first link of the object */ + __u64 obj_pfn; /* points to the next object in the list */ }; /* @@ -19,6 +50,10 @@ struct pkram_obj { * independently of each other. The nodes are identified by unique name * strings. * + * References to data pages saved to a preserved memory node are kept in a + * singly-linked list of PKRAM link structures (see above), the node has a + * pointer to the head of. + * * The structure occupies a memory page. */ struct pkram_node { @@ -68,6 +103,37 @@ static struct pkram_node *pkram_find_node(const char *name) return NULL; } +static void pkram_truncate_link(struct pkram_link *link) +{ + struct page *page; + pkram_entry_t p; + int i; + + for (i = 0; i < PKRAM_LINK_ENTRIES_MAX; i++) { + p = link->entry[i]; + if (!p) + continue; + page = pfn_to_page(PHYS_PFN(p)); + put_page(page); + } +} + +static void pkram_truncate_obj(struct pkram_obj *obj) +{ + unsigned long link_pfn; + struct pkram_link *link; + + link_pfn = obj->link_pfn; + while (link_pfn) { + link = pfn_to_kaddr(link_pfn); + pkram_truncate_link(link); + link_pfn = link->link_pfn; + pkram_free_page(link); + cond_resched(); + } + obj->link_pfn = 0; +} + static void pkram_truncate_node(struct pkram_node *node) { unsigned long obj_pfn; @@ -76,6 +142,7 @@ static void pkram_truncate_node(struct pkram_node *node) obj_pfn = node->obj_pfn; while (obj_pfn) { obj = pfn_to_kaddr(obj_pfn); + pkram_truncate_obj(obj); obj_pfn = obj->obj_pfn; pkram_free_page(obj); cond_resched(); @@ -83,6 +150,26 @@ static void pkram_truncate_node(struct pkram_node *node) node->obj_pfn = 0; } +static void pkram_add_link(struct pkram_link *link, struct pkram_obj *obj) +{ + link->link_pfn = obj->link_pfn; + obj->link_pfn = page_to_pfn(virt_to_page(link)); +} + +static struct pkram_link *pkram_remove_link(struct pkram_obj *obj) +{ + struct pkram_link *current_link; + + if (!obj->link_pfn) + return NULL; + + current_link = pfn_to_kaddr(obj->link_pfn); + obj->link_pfn = current_link->link_pfn; + current_link->link_pfn = 0; + + return current_link; +} + static void pkram_stream_init(struct pkram_stream *ps, struct pkram_node *node, gfp_t gfp_mask) { @@ -94,6 +181,9 @@ static void pkram_stream_init(struct pkram_stream *ps, static void pkram_stream_init_obj(struct pkram_stream *ps, struct pkram_obj *obj) { ps->obj = obj; + ps->link = NULL; + ps->entry_idx = 0; + ps->next_index = 0; } /** @@ -295,9 +385,28 @@ void pkram_finish_load_obj(struct pkram_stream *ps) { struct pkram_node *node = ps->node; struct pkram_obj *obj = ps->obj; + struct pkram_link *link = ps->link; BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_LOAD); + /* + * If link is not null, then loading stopped within a pkram_link + * unexpectedly. + */ + if (link) { + unsigned long link_pfn; + + link_pfn = page_to_pfn(virt_to_page(link)); + while (link_pfn) { + link = pfn_to_kaddr(link_pfn); + pkram_truncate_link(link); + link_pfn = link->link_pfn; + pkram_free_page(link); + cond_resched(); + } + } + + pkram_truncate_obj(obj); pkram_free_page(obj); } @@ -316,6 +425,44 @@ void pkram_finish_load(struct pkram_stream *ps) pkram_free_page(node); } +/* + * Insert page to PKRAM node allocating a new PKRAM link if necessary. + */ +static int __pkram_save_page(struct pkram_stream *ps, + struct page *page, short flags, unsigned long index) +{ + struct pkram_link *link = ps->link; + struct pkram_obj *obj = ps->obj; + pkram_entry_t p; + + if (!link || ps->entry_idx >= PKRAM_LINK_ENTRIES_MAX || + index != ps->next_index) { + struct page *link_page; + + link_page = pkram_alloc_page((ps->gfp_mask & GFP_RECLAIM_MASK) | + __GFP_ZERO); + if (!link_page) + return -ENOMEM; + + ps->link = link = page_address(link_page); + pkram_add_link(link, obj); + + ps->entry_idx = 0; + + ps->next_index = link->index = index; + } + + ps->next_index++; + + get_page(page); + p = page_to_phys(page); + p |= ((flags & PKRAM_ENTRY_FLAGS_MASK) << PKRAM_ENTRY_FLAGS_SHIFT); + link->entry[ps->entry_idx] = p; + ps->entry_idx++; + + return 0; +} + /** * Save page @page to the preserved memory node and object associated with * stream @ps. The stream must have been initialized with pkram_prepare_save() @@ -324,10 +471,72 @@ void pkram_finish_load(struct pkram_stream *ps) * @flags specifies supplemental page state to be preserved. * * Returns 0 on success, -errno on failure. + * + * Error values: + * %ENOMEM: insufficient amount of memory available + * + * Saving a page to preserved memory is simply incrementing its refcount so + * that it will not get freed after the last user puts it. That means it is + * safe to use the page as usual after it has been saved. */ int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags) { - return -ENOSYS; + struct pkram_node *node = ps->node; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_SAVE); + + BUG_ON(PageCompound(page)); + + return __pkram_save_page(ps, page, flags, page->index); +} + +/* + * Extract the next page from preserved memory freeing a PKRAM link if it + * becomes empty. + */ +static struct page *__pkram_load_page(struct pkram_stream *ps, unsigned long *index, short *flags) +{ + struct pkram_link *link = ps->link; + struct page *page; + pkram_entry_t p; + short flgs; + + if (!link) { + link = pkram_remove_link(ps->obj); + if (!link) + return NULL; + + ps->link = link; + ps->entry_idx = 0; + ps->next_index = link->index; + } + + BUG_ON(ps->entry_idx >= PKRAM_LINK_ENTRIES_MAX); + + p = link->entry[ps->entry_idx]; + BUG_ON(!p); + + flgs = (p >> PKRAM_ENTRY_FLAGS_SHIFT) & PKRAM_ENTRY_FLAGS_MASK; + page = pfn_to_page(PHYS_PFN(p)); + + if (flags) + *flags = flgs; + if (index) + *index = ps->next_index; + + ps->next_index++; + + /* clear to avoid double free (see pkram_truncate_link()) */ + link->entry[ps->entry_idx] = 0; + + ps->entry_idx++; + if (ps->entry_idx >= PKRAM_LINK_ENTRIES_MAX || + !link->entry[ps->entry_idx]) { + ps->link = NULL; + pkram_free_page(link); + } + + return page; } /** @@ -346,7 +555,11 @@ int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags) */ struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, short *flags) { - return NULL; + struct pkram_node *node = ps->node; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_LOAD); + + return __pkram_load_page(ps, index, flags); } /** From patchwork Thu May 7 00:41:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532083 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7EC4A139F for ; Thu, 7 May 2020 00:43:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3E6EA2064A for ; Thu, 7 May 2020 00:43:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="R133pfJP" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E6EA2064A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 92DA6900002; Wed, 6 May 2020 20:43:25 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8B908900008; Wed, 6 May 2020 20:43:25 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 709F0900002; Wed, 6 May 2020 20:43:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0127.hostedemail.com [216.40.44.127]) by kanga.kvack.org (Postfix) with ESMTP id 4A00E900008 for ; Wed, 6 May 2020 20:43:25 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 08ACD181AEF30 for ; Thu, 7 May 2020 00:43:25 +0000 (UTC) X-FDA: 76788074370.21.toe97_1dc4f8181de3e X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: toe97_1dc4f8181de3e X-Filterd-Recvd-Size: 7545 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:24 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470eGfM098246; Thu, 7 May 2020 00:42:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=rwgijllW6M4XeYhmgrj3fFunHwUOc+7hS8CUFmbdL+Y=; b=R133pfJPycmJwf2nkOwOYFHZc44QZzHbHBW+GOTn/ard0CwPxFr1B5n0o2GLQy9oM38i HZE8RpmIPtKkvpdd5MKkyYJhpzkhaF3dJcsXnWRKGChHTCZNIUWkvV+5PjaK4cUQun9t Y56I+jDYIJcdEE8u1OnTCbSuRbZnSYOhcJQA3d/t0x1aiUlpWnvz2Ek1PEQyTT50cFnR QIrrys3PpUBpnz6QcXuClvEAQ/0styrGCrveOGqL2sc2a7+VJ3zrrnYu7Z80jADYzIb7 tyUfLzsFfZA88gGJjIe6f7HTeRRJO909+XwIkCu1/FQlDhfbzDB7Vj/RIM/XxLdmflX6 Ug== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 30usgq4gw0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:39 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bmCE131715; Thu, 7 May 2020 00:42:38 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30t1r957v3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:38 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470gZvx025355; Thu, 7 May 2020 00:42:35 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:35 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 05/43] mm: PKRAM: support preserving transparent hugepages Date: Wed, 6 May 2020 17:41:31 -0700 Message-Id: <1588812129-8596-6-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Support preserving a transparent hugepage by recording the page order and a flag indicating it is a THP. Use these values when the page is restored to reconstruct the THP. Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 2 ++ mm/pkram.c | 20 ++++++++++++++++---- 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index f338d1c2aeb6..584cadb662b4 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -33,6 +33,8 @@ int pkram_prepare_load_obj(struct pkram_stream *ps); void pkram_finish_load(struct pkram_stream *ps); void pkram_finish_load_obj(struct pkram_stream *ps); +#define PKRAM_PAGE_TRANS_HUGE 0x1 /* page is a transparent hugepage */ + int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags); struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, short *flags); diff --git a/mm/pkram.c b/mm/pkram.c index ab3053ca3539..9164060e36f5 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -21,6 +21,7 @@ typedef __u64 pkram_entry_t; #define PKRAM_ENTRY_FLAGS_SHIFT 0x5 #define PKRAM_ENTRY_FLAGS_MASK 0x7f +#define PKRAM_ENTRY_ORDER_MASK 0x1f /* * Keeps references to data pages saved to PKRAM. @@ -434,6 +435,7 @@ static int __pkram_save_page(struct pkram_stream *ps, struct pkram_link *link = ps->link; struct pkram_obj *obj = ps->obj; pkram_entry_t p; + int order; if (!link || ps->entry_idx >= PKRAM_LINK_ENTRIES_MAX || index != ps->next_index) { @@ -452,10 +454,15 @@ static int __pkram_save_page(struct pkram_stream *ps, ps->next_index = link->index = index; } - ps->next_index++; + if (PageTransHuge(page)) + flags |= PKRAM_PAGE_TRANS_HUGE; + + order = compound_order(page); + ps->next_index += (1 << order); get_page(page); p = page_to_phys(page); + p |= order; p |= ((flags & PKRAM_ENTRY_FLAGS_MASK) << PKRAM_ENTRY_FLAGS_SHIFT); link->entry[ps->entry_idx] = p; ps->entry_idx++; @@ -485,8 +492,6 @@ int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags) BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_SAVE); - BUG_ON(PageCompound(page)); - return __pkram_save_page(ps, page, flags, page->index); } @@ -499,6 +504,7 @@ static struct page *__pkram_load_page(struct pkram_stream *ps, unsigned long *in struct pkram_link *link = ps->link; struct page *page; pkram_entry_t p; + int order; short flgs; if (!link) { @@ -517,14 +523,20 @@ static struct page *__pkram_load_page(struct pkram_stream *ps, unsigned long *in BUG_ON(!p); flgs = (p >> PKRAM_ENTRY_FLAGS_SHIFT) & PKRAM_ENTRY_FLAGS_MASK; + order = p & PKRAM_ENTRY_ORDER_MASK; page = pfn_to_page(PHYS_PFN(p)); + if (flgs & PKRAM_PAGE_TRANS_HUGE) { + prep_compound_page(page, order); + prep_transhuge_page(page); + } + if (flags) *flags = flgs; if (index) *index = ps->next_index; - ps->next_index++; + ps->next_index += (1 << order); /* clear to avoid double free (see pkram_truncate_link()) */ link->entry[ps->entry_idx] = 0; From patchwork Thu May 7 00:41:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532079 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D061092A for ; Thu, 7 May 2020 00:43:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8E67820838 for ; Thu, 7 May 2020 00:43:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="NYghlWua" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8E67820838 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 18CCB900006; Wed, 6 May 2020 20:43:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 13D8C900002; Wed, 6 May 2020 20:43:24 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8164900007; Wed, 6 May 2020 20:43:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0217.hostedemail.com [216.40.44.217]) by kanga.kvack.org (Postfix) with ESMTP id BB7CB900002 for ; Wed, 6 May 2020 20:43:23 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 80DFDD214 for ; Thu, 7 May 2020 00:43:23 +0000 (UTC) X-FDA: 76788074286.16.music36_1d8c90c76a818 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30005:30034:30054:30064,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: music36_1d8c90c76a818 X-Filterd-Recvd-Size: 8641 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:22 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470cDPd093126; Thu, 7 May 2020 00:42:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=zbg9yJyCJmJxhHSmdBbD25d4JQwY7L3Qzo2gYkXKc+M=; b=NYghlWuaON7yX8JJzdmXURt2zP24mBwfNQAwXmmaGa2VItI56EnRCd8ip9nnfT+v9qBS bAQFDt1JBrpfkTL+nlnKsXdCT3dK8TOM2pW5F+IgGxw36XLBjWj54jMm/n3Wc0yoJNiL 27g7H9L5tjT2ZbEWfHLiCQzKsNymuEKVw+tPVI1xwdAXrkl7FSBj3x0na85W1of6unKg Zi7XoRxMHGDdORF1iacWwnOfTsrShL+dgwK5+clmmm2mPtX3Dbn/6ctRLRxTJnX0r48r TtQNNZcEPEOOzGp0wJr6GMMrB65ym30KuUEh5RH952jmMMH3uOc5xnw4xv/88sd84gzh ZQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 30s1gnd8hn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:41 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470alqm170682; Thu, 7 May 2020 00:42:41 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 30us7p2kqb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:41 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470gcTP024000; Thu, 7 May 2020 00:42:38 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:38 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 06/43] mm: PKRAM: implement byte stream operations Date: Wed, 6 May 2020 17:41:32 -0700 Message-Id: <1588812129-8596-7-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=2 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds the ability to save arbitrary byte streams up to a total length of one page to a PKRAM object using pkram_write() to be restored later using pkram_read(). Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 4 +++ mm/pkram.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 86 insertions(+), 2 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 584cadb662b4..a58dd2ea835a 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -17,6 +17,10 @@ struct pkram_stream { unsigned int entry_idx; /* next entry in link */ unsigned long next_index; + + /* byte data */ + struct page *data_page; + unsigned int data_offset; }; #define PKRAM_NAME_MAX 256 /* including nul */ diff --git a/mm/pkram.c b/mm/pkram.c index 9164060e36f5..06b471eea0b0 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -42,6 +43,8 @@ struct pkram_link { ((PAGE_SIZE-sizeof(struct pkram_link))/sizeof(pkram_entry_t)) struct pkram_obj { + __u64 data_pfn; /* points to the byte data */ + __u64 data_len; /* byte data size */ __u64 link_pfn; /* points to the first link of the object */ __u64 obj_pfn; /* points to the next object in the list */ }; @@ -407,6 +410,9 @@ void pkram_finish_load_obj(struct pkram_stream *ps) } } + if (ps->data_page) + pkram_free_page(page_address(ps->data_page)); + pkram_truncate_obj(obj); pkram_free_page(obj); } @@ -422,6 +428,9 @@ void pkram_finish_load(struct pkram_stream *ps) BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_LOAD); + if (ps->data_page) + put_page(ps->data_page); + pkram_truncate_node(node); pkram_free_page(node); } @@ -581,10 +590,41 @@ struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, shor * * On success, returns the number of bytes written, which is always equal to * @count. On failure, -errno is returned. + * + * Error values: + * %ENOMEM: insufficient amount of memory available */ ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count) { - return -ENOSYS; + struct pkram_node *node = ps->node; + struct pkram_obj *obj = ps->obj; + void *addr; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_SAVE); + + if (!ps->data_page) { + struct page *page; + + page = pkram_alloc_page((ps->gfp_mask & GFP_RECLAIM_MASK) | + __GFP_HIGHMEM | __GFP_ZERO); + if (!page) + return -ENOMEM; + + ps->data_page = page; + ps->data_offset = 0; + obj->data_pfn = page_to_pfn(page); + } + + BUG_ON(count > PAGE_SIZE - ps->data_offset); + + addr = kmap_atomic(ps->data_page); + memcpy(addr + ps->data_offset, buf, count); + kunmap_atomic(addr); + + obj->data_len += count; + ps->data_offset += count; + + return count; } /** @@ -597,5 +637,45 @@ ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count) */ size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count) { - return 0; + struct pkram_node *node = ps->node; + struct pkram_obj *obj = ps->obj; + size_t copy_count; + char *addr; + + BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_LOAD); + + if (!count || !obj->data_len) + return 0; + + if (!ps->data_page) { + struct page *page; + + page = pfn_to_page(obj->data_pfn); + if (!page) + return 0; + + ps->data_page = page; + ps->data_offset = 0; + obj->data_pfn = 0; + } + + BUG_ON(count > PAGE_SIZE - ps->data_offset); + + copy_count = min_t(size_t, count, PAGE_SIZE - ps->data_offset); + if (copy_count > obj->data_len) + copy_count = obj->data_len; + + addr = kmap_atomic(ps->data_page); + memcpy(buf, addr + ps->data_offset, copy_count); + kunmap_atomic(addr); + + obj->data_len -= copy_count; + ps->data_offset += copy_count; + + if (!obj->data_len) { + pkram_free_page(page_address(ps->data_page)); + ps->data_page = NULL; + } + + return copy_count; } From patchwork Thu May 7 00:41:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532075 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 869C7139F for ; Thu, 7 May 2020 00:43:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 49A4C207DD for ; Thu, 7 May 2020 00:43:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="x757WWre" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 49A4C207DD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8DEE8900005; Wed, 6 May 2020 20:43:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8664F900004; Wed, 6 May 2020 20:43:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66B65900005; Wed, 6 May 2020 20:43:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id 4DF0A900002 for ; Wed, 6 May 2020 20:43:23 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0D025180AD811 for ; Thu, 7 May 2020 00:43:23 +0000 (UTC) X-FDA: 76788074286.23.wheel35_1d7c2d426b100 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30029:30054:30064:30069:30070,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:22,LUA_SUMMARY:none X-HE-Tag: wheel35_1d7c2d426b100 X-Filterd-Recvd-Size: 7259 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:22 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bkiG092912; Thu, 7 May 2020 00:42:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=qc+w0ITasi/mb3MmcnJ1ObIG7dFmjPsfgIFkmHA89sE=; b=x757WWreLpSB8OokJRHyKcsAINaoS+wAlmyXFSB5WEnR7/JytDBoYA7fVffl1/WaJZzM WTvFlN1X5a/646jzBQUqhGzpB9f0DZV2bi9G7kcHYqylAFS57UXN8/Sz4VqeTNwenOWO 6f11GKt6IayaGa1ho5jyAdW8h5XY9zARuGZtvrmYfFT+Io56pR5TxYWKKzFCkfdWJt1H o9PItYjbOF0Z71naBcabV3cYuFQG6nIJb4V6Kkbb9mcoRU/PQzwi5dNe4OkNmepnwBD6 WS+bIC1qLe1lJ057ydC4DAm0ogInMTJfg8eQ/jk7G+KtujXjrSrHf+EB2NHGa9ltZHZQ OQ== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2120.oracle.com with ESMTP id 30s1gnd8hs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:43 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bUDY098699; Thu, 7 May 2020 00:42:42 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 30sjnma12g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:42 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470gfnn019681; Thu, 7 May 2020 00:42:41 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:41 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 07/43] mm: PKRAM: link nodes by pfn before reboot Date: Wed, 6 May 2020 17:41:33 -0700 Message-Id: <1588812129-8596-8-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since page structs are used for linking PKRAM nodes and cleared on boot, organize all PKRAM nodes into a list singly-linked by pfns before reboot to facilitate the node list restore in the new kernel. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- mm/pkram.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/mm/pkram.c b/mm/pkram.c index 06b471eea0b0..44fadb70acf6 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -2,12 +2,16 @@ #include #include #include +#include #include #include #include #include +#include #include +#include #include +#include #include #include #include @@ -58,11 +62,15 @@ struct pkram_obj { * singly-linked list of PKRAM link structures (see above), the node has a * pointer to the head of. * + * To facilitate data restore in the new kernel, before reboot all PKRAM nodes + * are organized into a list singly-linked by pfn's (see pkram_reboot()). + * * The structure occupies a memory page. */ struct pkram_node { __u32 flags; __u64 obj_pfn; /* points to the first obj of the node */ + __u64 node_pfn; /* points to the next node in the node list */ __u8 name[PKRAM_NAME_MAX]; }; @@ -71,6 +79,10 @@ struct pkram_node { #define PKRAM_LOAD 2 #define PKRAM_ACCMODE_MASK 3 +/* + * For convenience sake PKRAM nodes are kept in an auxiliary doubly-linked list + * connected through the lru field of the page struct. + */ static LIST_HEAD(pkram_nodes); /* linked through page::lru */ static DEFINE_MUTEX(pkram_mutex); /* serializes open/close */ @@ -679,3 +691,41 @@ size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count) return copy_count; } + +/* + * Build the list of PKRAM nodes. + */ +static void __pkram_reboot(void) +{ + struct page *page; + struct pkram_node *node; + unsigned long node_pfn = 0; + + list_for_each_entry_reverse(page, &pkram_nodes, lru) { + node = page_address(page); + if (WARN_ON(node->flags & PKRAM_ACCMODE_MASK)) + continue; + node->node_pfn = node_pfn; + node_pfn = page_to_pfn(page); + } +} + +static int pkram_reboot(struct notifier_block *notifier, + unsigned long val, void *v) +{ + if (val != SYS_RESTART) + return NOTIFY_DONE; + __pkram_reboot(); + return NOTIFY_OK; +} + +static struct notifier_block pkram_reboot_notifier = { + .notifier_call = pkram_reboot, +}; + +static int __init pkram_init(void) +{ + register_reboot_notifier(&pkram_reboot_notifier); + return 0; +} +module_init(pkram_init); From patchwork Thu May 7 00:41:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532073 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 495C592A for ; Thu, 7 May 2020 00:43:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F3F422087E for ; Thu, 7 May 2020 00:43:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="CSeBKpK4" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F3F422087E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 07873900003; Wed, 6 May 2020 20:43:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 001D2900002; Wed, 6 May 2020 20:43:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE4B6900003; Wed, 6 May 2020 20:43:21 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103]) by kanga.kvack.org (Postfix) with ESMTP id C3826900002 for ; Wed, 6 May 2020 20:43:21 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 78AFA180AD811 for ; Thu, 7 May 2020 00:43:21 +0000 (UTC) X-FDA: 76788074202.05.dime80_1d39fd5b73d0f X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30025:30029:30054:30064:30069,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: dime80_1d39fd5b73d0f X-Filterd-Recvd-Size: 10228 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:20 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470cCJC093122; Thu, 7 May 2020 00:42:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=+WPbVUdq2a+8AXzyuFHH+zc6fkEarucWFmgHtGOMk2w=; b=CSeBKpK4zvdmqSkTF0tlgWIOpn7jRko9AOMasFemQyaE9+E4hFUwm+mX7lt5LncGf9QZ 0jvl43zyP1+ic+Y/8Vvm0O1GbSb50qgoiQH+TUKvGMUeWhAYvhIttvAXX7tPJ8RzI3cP V+8AxyCE7Z5Hv/EeHjrIda+SkSL3ZlGRFUqf7kKBLdJLriITLbHXHyKHyKPPztgkguTU 9twByuLCBpN+txYHjg0y4e6qEJxsSGryn+nr1c596S9ZddA3OZ3Cqw7RkhBBhYOys0gL yDVWPbB4N/A9X9jmbxWLlOcaMYUg0IiItlcNRaY9KUfuIU09QsfGvH2pBpc/0zjiW2Js Ww== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 30s1gnd8j1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:48 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470b2P0171067; Thu, 7 May 2020 00:42:47 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 30us7p2kt7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:47 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470giro025509; Thu, 7 May 2020 00:42:44 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:44 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 08/43] mm: PKRAM: introduce super block Date: Wed, 6 May 2020 17:41:34 -0700 Message-Id: <1588812129-8596-9-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The PKRAM super block is the starting point for restoring preserved memory. By providing the super block to the new kernel at boot time, preserved memory can be reserved and made available to be restored. To point the kernel to the location of the super block, one passes its pfn via the 'pkram' boot param. For that purpose, the pkram super block pfn is exported via /sys/kernel/pkram. If none is passed, any preserved memory will not be kept, and a new super block will be allocated. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- mm/pkram.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 94 insertions(+), 2 deletions(-) diff --git a/mm/pkram.c b/mm/pkram.c index 44fadb70acf6..70f2219e6218 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -5,15 +5,18 @@ #include #include #include +#include #include #include #include #include #include +#include #include #include #include #include +#include #include #include "internal.h" @@ -80,12 +83,38 @@ struct pkram_node { #define PKRAM_ACCMODE_MASK 3 /* + * The PKRAM super block contains data needed to restore the preserved memory + * structure on boot. The pointer to it (pfn) should be passed via the 'pkram' + * boot param if one wants to restore preserved data saved by the previously + * executing kernel. For that purpose the kernel exports the pfn via + * /sys/kernel/pkram. If none is passed, preserved memory if any will not be + * preserved and a new clean page will be allocated for the super block. + * + * The structure occupies a memory page. + */ +struct pkram_super_block { + __u64 node_pfn; /* first element of the node list */ +}; + +static unsigned long pkram_sb_pfn __initdata; +static struct pkram_super_block *pkram_sb; + +/* * For convenience sake PKRAM nodes are kept in an auxiliary doubly-linked list * connected through the lru field of the page struct. */ static LIST_HEAD(pkram_nodes); /* linked through page::lru */ static DEFINE_MUTEX(pkram_mutex); /* serializes open/close */ +/* + * The PKRAM super block pfn, see above. + */ +static int __init parse_pkram_sb_pfn(char *arg) +{ + return kstrtoul(arg, 16, &pkram_sb_pfn); +} +early_param("pkram", parse_pkram_sb_pfn); + static inline struct page *pkram_alloc_page(gfp_t gfp_mask) { return alloc_page(gfp_mask); @@ -209,6 +238,7 @@ static void pkram_stream_init_obj(struct pkram_stream *ps, struct pkram_obj *obj * @gfp_mask specifies the memory allocation mask to be used when saving data. * * Error values: + * %ENODEV: PKRAM not available * %ENAMETOOLONG: name len >= PKRAM_NAME_MAX * %ENOMEM: insufficient memory available * %EEXIST: node with specified name already exists @@ -224,6 +254,9 @@ int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gfp_mask struct pkram_node *node; int err = 0; + if (!pkram_sb) + return -ENODEV; + if (strlen(name) >= PKRAM_NAME_MAX) return -ENAMETOOLONG; @@ -333,6 +366,7 @@ void pkram_discard_save(struct pkram_stream *ps) * Returns 0 on success, -errno on failure. * * Error values: + * %ENODEV: PKRAM not available * %ENOENT: node with specified name does not exist * %EBUSY: save to required node has not finished yet * @@ -343,6 +377,9 @@ int pkram_prepare_load(struct pkram_stream *ps, const char *name) struct pkram_node *node; int err = 0; + if (!pkram_sb) + return -ENODEV; + mutex_lock(&pkram_mutex); node = pkram_find_node(name); if (!node) { @@ -708,6 +745,7 @@ static void __pkram_reboot(void) node->node_pfn = node_pfn; node_pfn = page_to_pfn(page); } + pkram_sb->node_pfn = node_pfn; } static int pkram_reboot(struct notifier_block *notifier, @@ -715,7 +753,8 @@ static int pkram_reboot(struct notifier_block *notifier, { if (val != SYS_RESTART) return NOTIFY_DONE; - __pkram_reboot(); + if (pkram_sb) + __pkram_reboot(); return NOTIFY_OK; } @@ -723,9 +762,62 @@ static struct notifier_block pkram_reboot_notifier = { .notifier_call = pkram_reboot, }; +static ssize_t show_pkram_sb_pfn(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + unsigned long pfn = pkram_sb ? PFN_DOWN(__pa(pkram_sb)) : 0; + + return sprintf(buf, "%lx\n", pfn); +} + +static struct kobj_attribute pkram_sb_pfn_attr = + __ATTR(pkram, 0444, show_pkram_sb_pfn, NULL); + +static struct attribute *pkram_attrs[] = { + &pkram_sb_pfn_attr.attr, + NULL, +}; + +static struct attribute_group pkram_attr_group = { + .attrs = pkram_attrs, +}; + +/* returns non-zero on success */ +static int __init pkram_init_sb(void) +{ + unsigned long pfn; + struct pkram_node *node; + + if (!pkram_sb) { + struct page *page; + + page = pkram_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) { + pr_err("PKRAM: Failed to allocate super block\n"); + return 0; + } + pkram_sb = page_address(page); + } + + /* + * Build auxiliary doubly-linked list of nodes connected through + * page::lru for convenience sake. + */ + pfn = pkram_sb->node_pfn; + while (pfn) { + node = pfn_to_kaddr(pfn); + pkram_insert_node(node); + pfn = node->node_pfn; + } + return 1; +} + static int __init pkram_init(void) { - register_reboot_notifier(&pkram_reboot_notifier); + if (pkram_init_sb()) { + register_reboot_notifier(&pkram_reboot_notifier); + sysfs_update_group(kernel_kobj, &pkram_attr_group); + } return 0; } module_init(pkram_init); From patchwork Thu May 7 00:41:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532081 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3017792A for ; Thu, 7 May 2020 00:43:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D68E2207DD for ; Thu, 7 May 2020 00:43:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="MhrQqCAo" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D68E2207DD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 36489900007; Wed, 6 May 2020 20:43:25 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2ECBD900002; Wed, 6 May 2020 20:43:25 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C9C5900007; Wed, 6 May 2020 20:43:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0001.hostedemail.com [216.40.44.1]) by kanga.kvack.org (Postfix) with ESMTP id D3339900002 for ; Wed, 6 May 2020 20:43:24 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8B11C181AEF30 for ; Thu, 7 May 2020 00:43:24 +0000 (UTC) X-FDA: 76788074328.30.bath01_1db241e92184b X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30070,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: bath01_1db241e92184b X-Filterd-Recvd-Size: 13091 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:23 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470c1Db064681; Thu, 7 May 2020 00:42:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=cRokmIrDLimjlBUK/N95gfOnui9YBNCKZCsu0Wq2qw8=; b=MhrQqCAoxi0wVRQcC+XPK0A+7PcFgkTJH0gIarTVJaU4dOTsBN3HzX3lQSggnAAe/4Zt SxJhil3zqI1dbma9HaBvdoJ0Cfbs1Kyd3H5vTZTlaSoLTAf3h3fBj6HRoEaHBmURzSVe 16nlhttDRCDzvesj1t+RUuhX5cNwI3QpHzJFaDaAvFdpt8KkdTqucGb8wz+CnBTn8IGi gE4JajV8tzZM2lN2OMa2hCfarx4uJHp1EgXFXjr8XF8gLVES+iECVFUUoyvzcFrjnZRR xJVT1js58IJHgW+Teouxel65B5IuxJDDtaCFw/b9iq5+JQGxrdJLexhre6+vi5SSiCEX hQ== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 30s09rdf6c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:51 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bmpv131670; Thu, 7 May 2020 00:42:50 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30t1r95894-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:50 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470glXd025517; Thu, 7 May 2020 00:42:47 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:47 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 09/43] PKRAM: build a physical mapping pagetable of pages to be preserved Date: Wed, 6 May 2020 17:41:35 -0700 Message-Id: <1588812129-8596-10-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Future patches will need a way to efficiently identify physically contiguous ranges of preserved pages regardless of their virtual addresses as well as a way to identify ranges that do not contain preserved pages. To facilitate this all pages to be preserved across kexec are added to an identity mapping-style pagetable that is passed to the next kernel. The pagetable makes use of the existing architecture definitions for building a memory mapping pagetable with the primary difference being that a bitmap is used to represent the presence or absence of preserved pages at the PTE level. In general both metadata pages and data pages must be added to the pagetable. A mapping for a metadata page can be added when the page is allocated, but there is an exception: for the pagetable pages themselves mappings are added after they are allocated to avoid recursion. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 233 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 230 insertions(+), 3 deletions(-) diff --git a/mm/pkram.c b/mm/pkram.c index 70f2219e6218..5a7b8f61a55d 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -99,6 +99,12 @@ struct pkram_super_block { static unsigned long pkram_sb_pfn __initdata; static struct pkram_super_block *pkram_sb; +static pgd_t *pkram_pgd; +static DEFINE_SPINLOCK(pkram_pgd_lock); + +static int pkram_add_identity_map(struct page *page); +static void pkram_remove_identity_map(struct page *page); + /* * For convenience sake PKRAM nodes are kept in an auxiliary doubly-linked list * connected through the lru field of the page struct. @@ -115,13 +121,31 @@ static int __init parse_pkram_sb_pfn(char *arg) } early_param("pkram", parse_pkram_sb_pfn); +static inline struct page *__pkram_alloc_page(gfp_t gfp_mask, bool add_to_map) +{ + struct page *page; + int err; + + page = alloc_page(gfp_mask); + if (page && add_to_map) { + err = pkram_add_identity_map(page); + if (err) { + __free_page(page); + page = NULL; + } + } + + return page; +} + static inline struct page *pkram_alloc_page(gfp_t gfp_mask) { - return alloc_page(gfp_mask); + return __pkram_alloc_page(gfp_mask, true); } static inline void pkram_free_page(void *addr) { + pkram_remove_identity_map(virt_to_page(addr)); free_page((unsigned long)addr); } @@ -159,6 +183,7 @@ static void pkram_truncate_link(struct pkram_link *link) if (!p) continue; page = pfn_to_page(PHYS_PFN(p)); + pkram_remove_identity_map(page); put_page(page); } } @@ -547,10 +572,15 @@ static int __pkram_save_page(struct pkram_stream *ps, int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags) { struct pkram_node *node = ps->node; + int err; BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_SAVE); - return __pkram_save_page(ps, page, flags, page->index); + err = __pkram_save_page(ps, page, flags, page->index); + if (!err) + err = pkram_add_identity_map(page); + + return err; } /* @@ -599,6 +629,8 @@ static struct page *__pkram_load_page(struct pkram_stream *ps, unsigned long *in /* clear to avoid double free (see pkram_truncate_link()) */ link->entry[ps->entry_idx] = 0; + pkram_remove_identity_map(page); + ps->entry_idx++; if (ps->entry_idx >= PKRAM_LINK_ENTRIES_MAX || !link->entry[ps->entry_idx]) { @@ -791,7 +823,7 @@ static int __init pkram_init_sb(void) if (!pkram_sb) { struct page *page; - page = pkram_alloc_page(GFP_KERNEL | __GFP_ZERO); + page = __pkram_alloc_page(GFP_KERNEL | __GFP_ZERO, false); if (!page) { pr_err("PKRAM: Failed to allocate super block\n"); return 0; @@ -821,3 +853,198 @@ static int __init pkram_init(void) return 0; } module_init(pkram_init); + +static unsigned long *pkram_alloc_pte_bitmap(void) +{ + return page_address(__pkram_alloc_page(GFP_KERNEL | __GFP_ZERO, false)); +} + +static void pkram_free_pte_bitmap(void *bitmap) +{ + pkram_remove_identity_map(virt_to_page(bitmap)); + free_page((unsigned long)bitmap); +} + +#define set_p4d(p4dp, p4d) WRITE_ONCE(*(p4dp), (p4d)) + +static int pkram_add_identity_map(struct page *page) +{ + unsigned long orig_paddr, paddr; + unsigned long *bitmap; + int result = -ENOMEM; + unsigned int index; + struct page *pg; + LIST_HEAD(list); + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + + if (!pkram_pgd) { + spin_lock(&pkram_pgd_lock); + if (!pkram_pgd) { + pg = __pkram_alloc_page(GFP_KERNEL | __GFP_ZERO, false); + if (!pg) + goto err; + pkram_pgd = page_address(pg); + } + spin_unlock(&pkram_pgd_lock); + } + + orig_paddr = paddr = __pa(page_address(page)); +again: + pgd = pkram_pgd; + pgd += pgd_index(paddr); + if (pgd_none(*pgd)) { + spin_lock(&pkram_pgd_lock); + if (pgd_none(*pgd)) { + pg = __pkram_alloc_page(GFP_KERNEL|__GFP_ZERO, false); + if (!pg) + goto err; + list_add(&pg->lru, &list); + p4d = page_address(pg); + set_pgd(pgd, __pgd(__pa(p4d))); + } + spin_unlock(&pkram_pgd_lock); + } + p4d = p4d_offset(pgd, paddr); + if (p4d_none(*p4d)) { + spin_lock(&pkram_pgd_lock); + if (p4d_none(*p4d)) { + pg = __pkram_alloc_page(GFP_KERNEL|__GFP_ZERO, false); + if (!pg) + goto err; + list_add(&pg->lru, &list); + pud = page_address(pg); + set_p4d(p4d, __p4d(__pa(pud))); + } + spin_unlock(&pkram_pgd_lock); + } + pud = pud_offset(p4d, paddr); + if (pud_none(*pud)) { + spin_lock(&pkram_pgd_lock); + if (pud_none(*pud)) { + pg = __pkram_alloc_page(GFP_KERNEL|__GFP_ZERO, false); + if (!pg) + goto err; + list_add(&pg->lru, &list); + pmd = page_address(pg); + set_pud(pud, __pud(__pa(pmd))); + } + spin_unlock(&pkram_pgd_lock); + } + pmd = pmd_offset(pud, paddr); + if (pmd_none(*pmd)) { + spin_lock(&pkram_pgd_lock); + if (pmd_none(*pmd)) { + if (PageTransHuge(page)) { + set_pmd(pmd, pmd_mkhuge(*pmd)); + spin_unlock(&pkram_pgd_lock); + goto next; + } + bitmap = pkram_alloc_pte_bitmap(); + if (!bitmap) + goto err; + pg = virt_to_page(bitmap); + list_add(&pg->lru, &list); + set_pmd(pmd, __pmd(__pa(bitmap))); + } else { + BUG_ON(pmd_large(*pmd)); + bitmap = __va(pmd_val(*pmd)); + } + spin_unlock(&pkram_pgd_lock); + } else { + BUG_ON(pmd_large(*pmd)); + bitmap = __va(pmd_val(*pmd)); + } + + index = pte_index(paddr); + BUG_ON(test_bit(index, bitmap)); + set_bit(index, bitmap); + smp_mb__after_atomic(); + if (bitmap_full(bitmap, PTRS_PER_PTE)) + set_pmd(pmd, pmd_mkhuge(*pmd)); +next: + /* Add mappings for any pagetable pages that were allocated */ + if (!list_empty(&list)) { + page = list_first_entry(&list, struct page, lru); + list_del_init(&page->lru); + paddr = __pa(page_address(page)); + goto again; + } + + return 0; +err: + spin_unlock(&pkram_pgd_lock); + while (!list_empty(&list)) { + pg = list_first_entry(&list, struct page, lru); + list_del_init(&pg->lru); + } + return result; +} + +static void pkram_remove_identity_map(struct page *page) +{ + unsigned long *bitmap; + unsigned long paddr; + unsigned int index; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + + /* + * pkram_pgd will be null when freeing metadata pages after a reboot + */ + if (!pkram_pgd) + return; + + paddr = __pa(page_address(page)); + pgd = pkram_pgd; + pgd += pgd_index(paddr); + if (pgd_none(*pgd)) { + WARN_ONCE(1, "PKRAM: %s: no pgd for 0x%lx\n", __func__, paddr); + return; + } + p4d = p4d_offset(pgd, paddr); + if (p4d_none(*p4d)) { + WARN_ONCE(1, "PKRAM: %s: no p4d for 0x%lx\n", __func__, paddr); + return; + } + pud = pud_offset(p4d, paddr); + if (pud_none(*pud)) { + WARN_ONCE(1, "PKRAM: %s: no pud for 0x%lx\n", __func__, paddr); + return; + } + pmd = pmd_offset(pud, paddr); + if (pmd_none(*pmd)) { + WARN_ONCE(1, "PKRAM: %s: no pmd for 0x%lx\n", __func__, paddr); + return; + } + if (PageTransHuge(page)) { + BUG_ON(!pmd_large(*pmd)); + pmd_clear(pmd); + return; + } + + if (pmd_large(*pmd)) { + spin_lock(&pkram_pgd_lock); + if (pmd_large(*pmd)) + set_pmd(pmd, __pmd(pte_val(pte_clrhuge(*(pte_t *)pmd)))); + spin_unlock(&pkram_pgd_lock); + } + + bitmap = __va(pmd_val(*pmd)); + index = pte_index(paddr); + clear_bit(index, bitmap); + smp_mb__after_atomic(); + + spin_lock(&pkram_pgd_lock); + if (!pmd_none(*pmd) && bitmap_empty(bitmap, PTRS_PER_PTE)) { + pmd_clear(pmd); + spin_unlock(&pkram_pgd_lock); + pkram_free_pte_bitmap(bitmap); + } else { + spin_unlock(&pkram_pgd_lock); + } +} From patchwork Thu May 7 00:41:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532181 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0620481 for ; Thu, 7 May 2020 00:45:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BAAC520736 for ; Thu, 7 May 2020 00:45:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="H1det3gM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BAAC520736 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3AB69900020; Wed, 6 May 2020 20:45:28 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 387A690001B; Wed, 6 May 2020 20:45:28 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24D1E900020; Wed, 6 May 2020 20:45:28 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0253.hostedemail.com [216.40.44.253]) by kanga.kvack.org (Postfix) with ESMTP id 0643F90001B for ; Wed, 6 May 2020 20:45:28 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C5B8FD234 for ; Thu, 7 May 2020 00:45:27 +0000 (UTC) X-FDA: 76788079494.14.stage47_2fa063904e802 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30012:30029:30045:30054:30064:30070,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: stage47_2fa063904e802 X-Filterd-Recvd-Size: 10576 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:27 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470befJ064522; Thu, 7 May 2020 00:44:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=cg4oKTlkz7COgexHCFzVJnE42b7eeQownh5JuIB+IVo=; b=H1det3gMfdv8Nw3uK+9bx7D1A8RL3637W4FILQ5ULMPvVpbUEtQITqpXr4+XhPK9zRiK tN+YtTCWcgh/a2asuQqUOQhmNLxg90x07tg/FXahQhekcuk1eJaJfZsrxHNSJmE+RhkY 5+aPjy98gm4Vsqu++p0KOgzDfqxT7HtS3ieI8FuYttljU0J4BqilSgeWNUrVO803/Igm pLPSqbMCekuQJXTezhoBDdwyWLzQc0AWcg7Jdu4cH12b4Nw88bHSlV+wFvoZVsfqfzy2 TcwOgt0ZwUNcd9sLaRDnyfiMreokb+K23IwOHa0KXvhozdKyJMLhzgtCZU+LtD4ZoVOi GA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 30s09rdfd0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:52 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bmIF131679; Thu, 7 May 2020 00:42:52 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 30t1r958af-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:52 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470go88019800; Thu, 7 May 2020 00:42:50 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:50 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 10/43] PKRAM: add code for walking the preserved pages pagetable Date: Wed, 6 May 2020 17:41:36 -0700 Message-Id: <1588812129-8596-11-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=0 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add the ability to walk the pkram pagetable from high to low addresses and execute a callback for each contiguous range of preserved or not preserved memory found. The reason for walking high to low is to align with high to low memblock allocation when finding holes that memblocks can safely be allocated from as will be seen in a later patch. Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 15 +++++ mm/Makefile | 2 +- mm/pkram_pagetable.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 185 insertions(+), 1 deletion(-) create mode 100644 mm/pkram_pagetable.c diff --git a/include/linux/pkram.h b/include/linux/pkram.h index a58dd2ea835a..b6fa973d37cc 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -25,6 +25,21 @@ struct pkram_stream { #define PKRAM_NAME_MAX 256 /* including nul */ +struct pkram_pg_state { + int (*range_cb)(struct pkram_pg_state *state, unsigned long base, + unsigned long size); + unsigned long curr_addr; + unsigned long end_addr; + unsigned long min_addr; + unsigned long max_addr; + unsigned long min_size; + bool tracking; + bool find_holes; + unsigned long retval; +}; + +void pkram_walk_pgt_rev(struct pkram_pg_state *st, pgd_t *pgd); + int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gfp_mask); int pkram_prepare_save_obj(struct pkram_stream *ps); diff --git a/mm/Makefile b/mm/Makefile index 59cd381194af..c4ad1c56e237 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -112,4 +112,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o -obj-$(CONFIG_PKRAM) += pkram.o +obj-$(CONFIG_PKRAM) += pkram.o pkram_pagetable.o diff --git a/mm/pkram_pagetable.c b/mm/pkram_pagetable.c new file mode 100644 index 000000000000..d31aa36207ba --- /dev/null +++ b/mm/pkram_pagetable.c @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +#define pgd_none(a) (pgtable_l5_enabled() ? pgd_none(a) : p4d_none(__p4d(pgd_val(a)))) + +static int note_page_rev(struct pkram_pg_state *st, unsigned long curr_size, bool present) +{ + unsigned long curr_addr = st->curr_addr; + bool track_page = present ^ st->find_holes; + + if (!st->tracking && track_page) { + unsigned long end_addr = curr_addr + curr_size; + + if (end_addr <= st->min_addr) + return 1; + + st->end_addr = min(end_addr, st->max_addr); + st->tracking = true; + } else if (st->tracking) { + unsigned long base, size; + + /* Continue tracking if lower bound has not been reached */ + if (track_page && curr_addr && curr_addr >= st->min_addr) + return 0; + + if (!track_page) + base = max(curr_addr + curr_size, st->min_addr); + else + base = st->min_addr; + + size = st->end_addr - base; + st->tracking = false; + + return st->range_cb(st, base, size); + } + + return 0; +} + +static int walk_pte_level_rev(struct pkram_pg_state *st, pmd_t addr, unsigned long P) +{ + unsigned long *bitmap; + int present; + int i, ret; + + bitmap = __va(pmd_val(addr)); + for (i = PTRS_PER_PTE - 1; i >= 0; i--) { + unsigned long curr_addr = P + i * PAGE_SIZE; + + if (curr_addr >= st->max_addr) + continue; + st->curr_addr = curr_addr; + + present = test_bit(i, bitmap); + ret = note_page_rev(st, PAGE_SIZE, present); + if (ret) + break; + } + + return ret; +} + +static int walk_pmd_level_rev(struct pkram_pg_state *st, pud_t addr, unsigned long P) +{ + pmd_t *start; + int i, ret; + + start = (pmd_t *)pud_page_vaddr(addr) + PTRS_PER_PMD - 1; + for (i = PTRS_PER_PMD - 1; i >= 0; i--, start--) { + unsigned long curr_addr = P + i * PMD_SIZE; + + if (curr_addr >= st->max_addr) + continue; + st->curr_addr = curr_addr; + + if (!pmd_none(*start)) { + if (pmd_large(*start)) + ret = note_page_rev(st, PMD_SIZE, true); + else + ret = walk_pte_level_rev(st, *start, curr_addr); + } else + ret = note_page_rev(st, PMD_SIZE, false); + if (ret) + break; + } + + return ret; +} + +static int walk_pud_level_rev(struct pkram_pg_state *st, p4d_t addr, unsigned long P) +{ + pud_t *start; + int i, ret; + + start = (pud_t *)p4d_page_vaddr(addr) + PTRS_PER_PUD - 1; + for (i = PTRS_PER_PUD - 1; i >= 0 ; i--, start--) { + unsigned long curr_addr = P + i * PUD_SIZE; + + if (curr_addr >= st->max_addr) + continue; + st->curr_addr = curr_addr; + + if (!pud_none(*start)) { + if (pud_large(*start)) + ret = note_page_rev(st, PUD_SIZE, true); + else + ret = walk_pmd_level_rev(st, *start, curr_addr); + } else + ret = note_page_rev(st, PUD_SIZE, false); + if (ret) + break; + } + + return ret; +} + +static int walk_p4d_level_rev(struct pkram_pg_state *st, pgd_t addr, unsigned long P) +{ + p4d_t *start; + int i, ret; + + if (PTRS_PER_P4D == 1) + return walk_pud_level_rev(st, __p4d(pgd_val(addr)), P); + + start = (p4d_t *)pgd_page_vaddr(addr) + PTRS_PER_P4D - 1; + for (i = PTRS_PER_P4D - 1; i >= 0; i--, start--) { + unsigned long curr_addr = P + i * P4D_SIZE; + + if (curr_addr >= st->max_addr) + continue; + st->curr_addr = curr_addr; + + if (!p4d_none(*start)) { + if (p4d_large(*start)) + ret = note_page_rev(st, P4D_SIZE, true); + else + ret = walk_pud_level_rev(st, *start, curr_addr); + } else + ret = note_page_rev(st, P4D_SIZE, false); + if (ret) + break; + } + + return ret; +} + +void pkram_walk_pgt_rev(struct pkram_pg_state *st, pgd_t *pgd) +{ + pgd_t *start; + int i, ret; + + start = pgd + PTRS_PER_PGD - 1; + for (i = PTRS_PER_PGD - 1; i >= 0; i--, start--) { + unsigned long curr_addr = i * PGDIR_SIZE; + + if (curr_addr >= st->max_addr) + continue; + st->curr_addr = curr_addr; + + if (!pgd_none(*start)) + ret = walk_p4d_level_rev(st, *start, curr_addr); + else + ret = note_page_rev(st, PGDIR_SIZE, false); + if (ret) + break; + } +} From patchwork Thu May 7 00:41:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532087 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B85C92A for ; Thu, 7 May 2020 00:43:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 18AEE2064A for ; Thu, 7 May 2020 00:43:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="UrmNLP0/" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 18AEE2064A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5847F90000A; Wed, 6 May 2020 20:43:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5398790000B; Wed, 6 May 2020 20:43:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3605290000A; Wed, 6 May 2020 20:43:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id 1829C900008 for ; Wed, 6 May 2020 20:43:31 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B4FE18249980 for ; Thu, 7 May 2020 00:43:30 +0000 (UTC) X-FDA: 76788074580.09.toys38_1e9ace3105f3a X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30036:30054:30064:30070,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: toys38_1e9ace3105f3a X-Filterd-Recvd-Size: 5747 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:30 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bsEQ064662; Thu, 7 May 2020 00:42:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=SpL3oKOP1UxupMjs93me4aiFXOO1A2VSszu1s3lkQmE=; b=UrmNLP0/tNsOo2Ji29J5AZ9A7RkHNfzxRzF3rav8+hRASZW4Duvh8uT+pZDF6H3zWKU2 /202rP9YThZNIHoKaOGQBIyfgkQEuU6tmMG1rYH38Of8YlNR2FX2ITHoSeQXbBsV2QRQ kXjPmaZxZXfdvPG7uqvUo0xNFd3P4DJR3CNhKsyWnrWT2q7/cheB+C/KWDV7ShneEy5u ZUgmUsqRPA1/e9u9Ab1oiHezM4HS/AjdypfYZ9nzyc0+usWobKBzF6fPWv9a247nmqB9 1uq24OdbiUFFzA49N05/7YtGMk/afxEI0JU5tidZ970dhtUfzTYIDxMWQw2hWIWi0XmK AA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 30s09rdf6s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:56 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bUYE098714; Thu, 7 May 2020 00:42:55 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 30sjnma1gw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:42:55 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470gsjj019822; Thu, 7 May 2020 00:42:54 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:53 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 11/43] PKRAM: pass the preserved pages pagetable to the next kernel Date: Wed, 6 May 2020 17:41:37 -0700 Message-Id: <1588812129-8596-12-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=0 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a pointer to the pagetable to the pkram_super_block page. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/mm/pkram.c b/mm/pkram.c index 5a7b8f61a55d..54b2779d0813 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -94,6 +94,7 @@ struct pkram_node { */ struct pkram_super_block { __u64 node_pfn; /* first element of the node list */ + __u64 pgd_pfn; }; static unsigned long pkram_sb_pfn __initdata; @@ -769,15 +770,20 @@ static void __pkram_reboot(void) struct page *page; struct pkram_node *node; unsigned long node_pfn = 0; - - list_for_each_entry_reverse(page, &pkram_nodes, lru) { - node = page_address(page); - if (WARN_ON(node->flags & PKRAM_ACCMODE_MASK)) - continue; - node->node_pfn = node_pfn; - node_pfn = page_to_pfn(page); + unsigned long pgd_pfn = 0; + + if (pkram_pgd) { + list_for_each_entry_reverse(page, &pkram_nodes, lru) { + node = page_address(page); + if (WARN_ON(node->flags & PKRAM_ACCMODE_MASK)) + continue; + node->node_pfn = node_pfn; + node_pfn = page_to_pfn(page); + } + pgd_pfn = page_to_pfn(virt_to_page(pkram_pgd)); } pkram_sb->node_pfn = node_pfn; + pkram_sb->pgd_pfn = pgd_pfn; } static int pkram_reboot(struct notifier_block *notifier, From patchwork Thu May 7 00:41:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532187 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E32781 for ; Thu, 7 May 2020 00:45:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E62822082E for ; Thu, 7 May 2020 00:45:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="hXArlhMR" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E62822082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4BF87900021; Wed, 6 May 2020 20:45:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 42087900002; Wed, 6 May 2020 20:45:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C0B4900021; Wed, 6 May 2020 20:45:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0003.hostedemail.com [216.40.44.3]) by kanga.kvack.org (Postfix) with ESMTP id 15567900002 for ; Wed, 6 May 2020 20:45:33 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D0A771191E for ; Thu, 7 May 2020 00:45:32 +0000 (UTC) X-FDA: 76788079704.03.kick43_3059d2581505b X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30003:30054:30056:30064:30070:30090,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: kick43_3059d2581505b X-Filterd-Recvd-Size: 14102 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:32 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bnlu093003; Thu, 7 May 2020 00:45:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=bv6mPYRvBlGJaPvIxvryncCtChgbVQh2Nj0efsq/tRk=; b=hXArlhMRvINvgicHIomniImjUPo2Eu6gImaZKxuEYHSWrFoMG1wfx97Nd8iTg7orSrdu osl8hSK64dZ9q6DGBX1pgOBi6X5m8HBhXKKl5TWIRcYrNW6ZrCL+1bGyXW8Jd6eHurYN LXTnNRbvPCLrJ8LV/hzXqYx+8/Rro0x2uJ6A2ukUqnsC2/miYThQsfeOMoASgV9UgwDM e7Bqa/9S6oKRVepJHscLpdgqGp3j+iVKPL560csj+4iZLhmzt+rfnUAeqAORWbXCSG+E UZkBKkjjFNT62+rIc2aoeAVwNdtR9BL0lP72l/ScXGzlTPedIp52pMywAVaicl3aO0Nv aA== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 30s1gnd8s1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:45:00 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470akjj170679; Thu, 7 May 2020 00:43:00 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 30us7p2m1e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:00 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470gvDB025596; Thu, 7 May 2020 00:42:57 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:42:57 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 12/43] mm: PKRAM: reserve preserved memory at boot Date: Wed, 6 May 2020 17:41:38 -0700 Message-Id: <1588812129-8596-13-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=2 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Keep preserved pages from being recycled during boot by adding them to the memblock reserved list during early boot. If memory reservation fails (e.g. a region has already been reserved), all preserved pages are dropped. For efficiency the preserved pages pagetable is used to identify and reserve by the contiguous ranges present rather than a page at a time. Signed-off-by: Anthony Yznaga --- arch/x86/kernel/setup.c | 3 + arch/x86/mm/init_64.c | 2 + include/linux/pkram.h | 8 +++ mm/pkram.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 189 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 4b3fa6cd3106..851515753ad9 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -1158,6 +1159,8 @@ void __init setup_arch(char **cmdline_p) initmem_init(); dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT); + pkram_reserve(); + if (boot_cpu_has(X86_FEATURE_GBPAGES)) hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT); diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 3b289c2f75cd..ae569ef6bd7d 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -1244,6 +1245,7 @@ void __init mem_init(void) after_bootmem = 1; x86_init.hyper.init_after_bootmem(); + totalram_pages_add(pkram_reserved_pages); /* * Must be done after boot memory is put on freelist, because here we * might set fields in deferred struct pages that have not yet been diff --git a/include/linux/pkram.h b/include/linux/pkram.h index b6fa973d37cc..1b475f6e1598 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -61,4 +61,12 @@ struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count); size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count); +#ifdef CONFIG_PKRAM +extern unsigned long pkram_reserved_pages; +void pkram_reserve(void); +#else +#define pkram_reserved_pages 0UL +static inline void pkram_reserve(void) { } +#endif + #endif /* _LINUX_PKRAM_H */ diff --git a/mm/pkram.c b/mm/pkram.c index 54b2779d0813..2c323154df76 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -105,6 +106,7 @@ static DEFINE_SPINLOCK(pkram_pgd_lock); static int pkram_add_identity_map(struct page *page); static void pkram_remove_identity_map(struct page *page); +static int pkram_reserve_page_ranges(pgd_t *pgd); /* * For convenience sake PKRAM nodes are kept in an auxiliary doubly-linked list @@ -113,6 +115,9 @@ static void pkram_remove_identity_map(struct page *page); static LIST_HEAD(pkram_nodes); /* linked through page::lru */ static DEFINE_MUTEX(pkram_mutex); /* serializes open/close */ +unsigned long __initdata pkram_reserved_pages; +static bool pkram_reservation_in_progress; + /* * The PKRAM super block pfn, see above. */ @@ -122,6 +127,102 @@ static int __init parse_pkram_sb_pfn(char *arg) } early_param("pkram", parse_pkram_sb_pfn); +static void * __init pkram_map_meta(unsigned long pfn) +{ + if (pfn >= max_low_pfn) + return ERR_PTR(-EINVAL); + return pfn_to_kaddr(pfn); +} + +static int __init pkram_reserve_page(unsigned long pfn) +{ + phys_addr_t base, size; + int err = 0; + + if (pfn >= max_pfn) + return -EINVAL; + + base = PFN_PHYS(pfn); + size = PAGE_SIZE; + + if (memblock_is_region_reserved(base, size) || + memblock_reserve(base, size) < 0) + err = -EBUSY; + + if (!err) + pkram_reserved_pages++; + + return err; +} + +static void __init pkram_unreserve_page(unsigned long pfn) +{ + memblock_free(PFN_PHYS(pfn), PAGE_SIZE); + pkram_reserved_pages--; +} + +/* + * Reserved pages that belong to preserved memory. + * + * This function should be called at boot time as early as possible to prevent + * preserved memory from being recycled. + */ +void __init pkram_reserve(void) +{ + int err = 0; + + if (!pkram_sb_pfn) + return; + + pr_info("PKRAM: Examining preserved memory...\n"); + pkram_reservation_in_progress = true; + + err = pkram_reserve_page(pkram_sb_pfn); + if (err) + goto out; + pkram_sb = pkram_map_meta(pkram_sb_pfn); + if (IS_ERR(pkram_sb)) { + pkram_unreserve_page(pkram_sb_pfn); + err = PTR_ERR(pkram_sb); + goto out; + } + + /* An empty pkram_sb is not an error */ + if (!pkram_sb->node_pfn) { + pkram_unreserve_page(pkram_sb_pfn); + pkram_sb = NULL; + goto done; + } + + err = pkram_reserve_page(pkram_sb->pgd_pfn); + if (err) { + pr_warn("PKRAM: pgd_pfn=0x%llx already reserved\n", + pkram_sb->pgd_pfn); + pkram_unreserve_page(pkram_sb_pfn); + goto out; + } + pkram_pgd = pfn_to_kaddr(pkram_sb->pgd_pfn); + err = pkram_reserve_page_ranges(pkram_pgd); + if (err) { + pkram_unreserve_page(pkram_sb->pgd_pfn); + pkram_unreserve_page(pkram_sb_pfn); + pkram_pgd = NULL; + } + +out: + pkram_reservation_in_progress = false; + + if (err) { + pr_err("PKRAM: Reservation failed: %d\n", err); + WARN_ON(pkram_reserved_pages > 0); + pkram_sb = NULL; + return; + } + +done: + pr_info("PKRAM: %lu pages reserved\n", pkram_reserved_pages); +} + static inline struct page *__pkram_alloc_page(gfp_t gfp_mask, bool add_to_map) { struct page *page; @@ -146,6 +247,11 @@ static inline struct page *pkram_alloc_page(gfp_t gfp_mask) static inline void pkram_free_page(void *addr) { + /* + * The page may have the reserved bit set since preserved pages + * are reserved early in boot. + */ + ClearPageReserved(virt_to_page(addr)); pkram_remove_identity_map(virt_to_page(addr)); free_page((unsigned long)addr); } @@ -184,6 +290,11 @@ static void pkram_truncate_link(struct pkram_link *link) if (!p) continue; page = pfn_to_page(PHYS_PFN(p)); + /* + * The page may have the reserved bit set since preserved pages + * are reserved early in boot. + */ + ClearPageReserved(page); pkram_remove_identity_map(page); put_page(page); } @@ -593,7 +704,7 @@ static struct page *__pkram_load_page(struct pkram_stream *ps, unsigned long *in struct pkram_link *link = ps->link; struct page *page; pkram_entry_t p; - int order; + int i, order; short flgs; if (!link) { @@ -615,6 +726,12 @@ static struct page *__pkram_load_page(struct pkram_stream *ps, unsigned long *in order = p & PKRAM_ENTRY_ORDER_MASK; page = pfn_to_page(PHYS_PFN(p)); + for (i = 0; i < (1 << order); i++) { + struct page *pg = page + i; + + ClearPageReserved(pg); + } + if (flgs & PKRAM_PAGE_TRANS_HUGE) { prep_compound_page(page, order); prep_transhuge_page(page); @@ -735,6 +852,7 @@ size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count) page = pfn_to_page(obj->data_pfn); if (!page) return 0; + ClearPageReserved(page); ps->data_page = page; ps->data_offset = 0; @@ -782,8 +900,15 @@ static void __pkram_reboot(void) } pgd_pfn = page_to_pfn(virt_to_page(pkram_pgd)); } - pkram_sb->node_pfn = node_pfn; - pkram_sb->pgd_pfn = pgd_pfn; + /* + * Zero out pkram_sb completely since it may have been passed from + * the previous boot. + */ + memset(pkram_sb, 0, PAGE_SIZE); + if (node_pfn) { + pkram_sb->node_pfn = node_pfn; + pkram_sb->pgd_pfn = pgd_pfn; + } } static int pkram_reboot(struct notifier_block *notifier, @@ -867,6 +992,7 @@ static unsigned long *pkram_alloc_pte_bitmap(void) static void pkram_free_pte_bitmap(void *bitmap) { + ClearPageReserved(virt_to_page(bitmap)); pkram_remove_identity_map(virt_to_page(bitmap)); free_page((unsigned long)bitmap); } @@ -1054,3 +1180,50 @@ static void pkram_remove_identity_map(struct page *page) spin_unlock(&pkram_pgd_lock); } } + +static int __init pkram_reserve_range_cb(struct pkram_pg_state *st, unsigned long base, unsigned long size) +{ + if (memblock_is_region_reserved(base, size) || + memblock_reserve(base, size) < 0) { + pr_warn("PKRAM: reservations exist in [0x%lx,0x%lx]\n", base, base + size - 1); + /* + * Set a lower bound so another walk can undo the earlier, + * successful reservations. + */ + st->min_addr = base + size; + st->retval = -EBUSY; + return 1; + } + + pkram_reserved_pages += (size >> PAGE_SHIFT); + return 0; +} + +static int __init pkram_unreserve_range_cb(struct pkram_pg_state *st, unsigned long base, unsigned long size) +{ + memblock_free(base, size); + pkram_reserved_pages -= (size >> PAGE_SHIFT); + return 0; +} + +/* + * Walk the preserved pages pagetable and reserve each present address range. + */ +static int __init pkram_reserve_page_ranges(pgd_t *pgd) +{ + struct pkram_pg_state st = { + .range_cb = pkram_reserve_range_cb, + .max_addr = PHYS_ADDR_MAX, + }; + int err = 0; + + pkram_walk_pgt_rev(&st, pgd); + if ((int)st.retval < 0) { + err = st.retval; + st.retval = 0; + st.range_cb = pkram_unreserve_range_cb; + pkram_walk_pgt_rev(&st, pgd); + } + + return err; +} From patchwork Thu May 7 00:41:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532091 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 53517139F for ; Thu, 7 May 2020 00:43:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1784D2064A for ; Thu, 7 May 2020 00:43:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="paQStL8P" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1784D2064A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 88F6D90000C; Wed, 6 May 2020 20:43:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 81BC7900008; Wed, 6 May 2020 20:43:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6931090000C; Wed, 6 May 2020 20:43:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id 4A5AB900008 for ; Wed, 6 May 2020 20:43:35 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 045633A87 for ; Thu, 7 May 2020 00:43:35 +0000 (UTC) X-FDA: 76788074790.26.cough78_1f375af19350b X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30029:30034:30054:30064:30070,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: cough78_1f375af19350b X-Filterd-Recvd-Size: 9011 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:34 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470befE064522; Thu, 7 May 2020 00:43:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=FB0OnNXJ4i/M/+h8LQO56pQ0/rOm3gP03sALm2dSqQ0=; b=paQStL8PTEqfO6fbwWpdX7ph/jPw52JforzGHSJy6xCRA7TlKE9rgam/ENPCfzk9euum Sjd0h3Kq+5hyYIizY1UV95kGxFr+kUjwBRtlGE4dS7wKcTc0MDxI6AzPRy4S9H0BboZy 5iyq/+JtZKfXzfHVOfgd+OWjF9iP3dp2S51nbUcp2Lap+ah4HcaANnmMG5tIByCXDrDg jSvr9T9OLTgKIAumWEar+p656nFRq3XoHKoFhZheviqy0x9jOx9nPubeqQtSS20gUILr sjlSGRAXqdqABm8PE4bcPxPY0UbZwgKgKtirZ85i6jjBkT88h9WOuaYwwNoAmBmKdN3G Tg== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 30s09rdf7e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:03 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bTMj098599; Thu, 7 May 2020 00:43:02 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 30sjnma1ty-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:02 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470h1VZ019957; Thu, 7 May 2020 00:43:01 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:01 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 13/43] mm: PKRAM: free preserved pages pagetable Date: Wed, 6 May 2020 17:41:39 -0700 Message-Id: <1588812129-8596-14-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After the page ranges in the pagetable have been reserved the pagetable is no longer needed. Rather than free it during early boot by unreserving page-sized blocks which can be inefficient when dealing with a large number of blocks, wait until the page structs have been initialized and free them as pages. Signed-off-by: Anthony Yznaga --- arch/x86/mm/init_64.c | 1 + include/linux/pkram.h | 3 ++ mm/pkram.c | 11 +++++++ mm/pkram_pagetable.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 97 insertions(+) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index ae569ef6bd7d..72662615977b 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1245,6 +1245,7 @@ void __init mem_init(void) after_bootmem = 1; x86_init.hyper.init_after_bootmem(); + pkram_free_pgt(); totalram_pages_add(pkram_reserved_pages); /* * Must be done after boot memory is put on freelist, because here we diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 1b475f6e1598..edc5d8bef9d3 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -39,6 +39,7 @@ struct pkram_pg_state { }; void pkram_walk_pgt_rev(struct pkram_pg_state *st, pgd_t *pgd); +void pkram_free_pgt_walk_pgd(pgd_t *pgd); int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gfp_mask); @@ -64,9 +65,11 @@ size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count); #ifdef CONFIG_PKRAM extern unsigned long pkram_reserved_pages; void pkram_reserve(void); +void pkram_free_pgt(void); #else #define pkram_reserved_pages 0UL static inline void pkram_reserve(void) { } +static inline void pkram_free_pgt(void) { } #endif #endif /* _LINUX_PKRAM_H */ diff --git a/mm/pkram.c b/mm/pkram.c index 2c323154df76..dd3c89614010 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1227,3 +1227,14 @@ static int __init pkram_reserve_page_ranges(pgd_t *pgd) return err; } + +void pkram_free_pgt(void) +{ + if (!pkram_pgd) + return; + + pkram_free_pgt_walk_pgd(pkram_pgd); + + __free_pages_core(virt_to_page(pkram_pgd), 0); + pkram_pgd = NULL; +} diff --git a/mm/pkram_pagetable.c b/mm/pkram_pagetable.c index d31aa36207ba..7033e9b1c47f 100644 --- a/mm/pkram_pagetable.c +++ b/mm/pkram_pagetable.c @@ -3,6 +3,8 @@ #include #include +#include "internal.h" + #define pgd_none(a) (pgtable_l5_enabled() ? pgd_none(a) : p4d_none(__p4d(pgd_val(a)))) static int note_page_rev(struct pkram_pg_state *st, unsigned long curr_size, bool present) @@ -167,3 +169,83 @@ void pkram_walk_pgt_rev(struct pkram_pg_state *st, pgd_t *pgd) break; } } + +static void pkram_free_pgt_walk_pmd(pud_t addr) +{ + unsigned long bitmap_pa; + struct page *page; + pmd_t *start; + int i; + + start = (pmd_t *)pud_page_vaddr(addr); + for (i = 0; i < PTRS_PER_PMD; i++, start++) { + if (!pmd_none(*start)) { + bitmap_pa = pte_val(pte_clrhuge(*(pte_t *)start)); + if (pmd_large(*start) && !bitmap_pa) + continue; + page = virt_to_page(__va(bitmap_pa)); + __free_pages_core(page, 0); + } + } +} + +static void pkram_free_pgt_walk_pud(p4d_t addr) +{ + struct page *page; + pud_t *start; + int i; + + start = (pud_t *)p4d_page_vaddr(addr); + for (i = 0; i < PTRS_PER_PUD; i++, start++) { + if (!pud_none(*start)) { + if (pud_large(*start)) { + WARN_ONCE(1, "PKRAM: unexpected pud hugepage\n"); + continue; + } + pkram_free_pgt_walk_pmd(*start); + page = virt_to_page(__va(pud_val(*start))); + __free_pages_core(page, 0); + } + } +} + +static void pkram_free_pgt_walk_p4d(pgd_t addr) +{ + struct page *page; + p4d_t *start; + int i; + + if (PTRS_PER_P4D == 1) + return pkram_free_pgt_walk_pud(__p4d(pgd_val(addr))); + + start = (p4d_t *)pgd_page_vaddr(addr); + for (i = 0; i < PTRS_PER_P4D; i++, start++) { + if (!p4d_none(*start)) { + if (p4d_large(*start)) { + WARN_ONCE(1, "PKRAM: unexpected p4d hugepage\n"); + continue; + } + pkram_free_pgt_walk_pud(*start); + page = virt_to_page(__va(p4d_val(*start))); + __free_pages_core(page, 0); + } + } +} + +/* + * Free the pagetable passed from the previous boot. + */ +void pkram_free_pgt_walk_pgd(pgd_t *pgd) +{ + pgd_t *start = pgd; + struct page *page; + int i; + + for (i = 0; i < PTRS_PER_PGD; i++, start++) { + if (!pgd_none(*start)) { + pkram_free_pgt_walk_p4d(*start); + page = virt_to_page(__va(pgd_val(*start))); + __free_pages_core(page, 0); + } + } +} From patchwork Thu May 7 00:41:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532093 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B646B92A for ; Thu, 7 May 2020 00:43:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 751402064A for ; Thu, 7 May 2020 00:43:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="kcHW/TPO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 751402064A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C644790000D; Wed, 6 May 2020 20:43:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B77FF900008; Wed, 6 May 2020 20:43:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A18A90000D; Wed, 6 May 2020 20:43:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0043.hostedemail.com [216.40.44.43]) by kanga.kvack.org (Postfix) with ESMTP id 7DD21900008 for ; Wed, 6 May 2020 20:43:43 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 45B34D234 for ; Thu, 7 May 2020 00:43:43 +0000 (UTC) X-FDA: 76788075126.22.soda52_206f42510c937 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30090,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:22,LUA_SUMMARY:none X-HE-Tag: soda52_206f42510c937 X-Filterd-Recvd-Size: 9002 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:42 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bn87092995; Thu, 7 May 2020 00:43:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=7FlNkRF1oblAeXIUyymeHAqshs+dkh2quuYtBUHergU=; b=kcHW/TPO1YMNUkHh0blDwat1m2OeaHhivZ//2tjB83CvkHLWVfxwAMzKQjxlyklkvESi xSqemzMlgECWBv4ppxad67bEfH1xwDDIoxsBgLWFpZ4RPaz7deQs6LQFMr4rGoQNM0HY iCRzbXY6UJCYRXPu8Ab20cpqQZv3UBKqrPv2MBDT6rohUz5+JcXi2NTYEgB2aILy3ykj OhkqeIxFrE/mglKShKR10Gp2Vdq15HYSME8zuWBCGdtB5uZ0YOuRPCNkK5mcgxXtzDLm Gr0aeDT2ATIBEPPK6peQvmiTmlhx09vYA8AMq0GMJ3eapI+0BRIbyrVzyGUHC8OAi/5m NA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 30s1gnd8kd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:07 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bm3P131725; Thu, 7 May 2020 00:43:06 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 30t1r958s2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:06 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470h5NU019964; Thu, 7 May 2020 00:43:05 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:04 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 14/43] mm: memblock: PKRAM: prevent memblock resize from clobbering preserved pages Date: Wed, 6 May 2020 17:41:40 -0700 Message-Id: <1588812129-8596-15-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=2 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The size of the memblock reserved array may be increased while preserved pages are being reserved. When this happens, preserved pages that have not yet been reserved are at risk for being clobbered when space for a larger array is allocated. When called from memblock_double_array(), a wrapper around memblock_find_in_range() walks the preserved pages pagetable to find sufficiently sized ranges without preserved pages and passes them to memblock_find_in_range(). Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 3 +++ mm/memblock.c | 15 +++++++++++++-- mm/pkram.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 67 insertions(+), 2 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index edc5d8bef9d3..409022e1472f 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -62,6 +62,9 @@ struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count); size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count); +phys_addr_t pkram_memblock_find_in_range(phys_addr_t start, phys_addr_t end, + phys_addr_t size, phys_addr_t align); + #ifdef CONFIG_PKRAM extern unsigned long pkram_reserved_pages; void pkram_reserve(void); diff --git a/mm/memblock.c b/mm/memblock.c index c79ba6f9920c..69ae883b8d21 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -349,6 +350,16 @@ phys_addr_t __init_memblock memblock_find_in_range(phys_addr_t start, return ret; } +phys_addr_t __init_memblock __memblock_find_in_range(phys_addr_t start, + phys_addr_t end, phys_addr_t size, + phys_addr_t align) +{ + if (IS_ENABLED(CONFIG_PKRAM)) + return pkram_memblock_find_in_range(start, end, size, align); + else + return memblock_find_in_range(start, end, size, align); +} + static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r) { type->total_size -= type->regions[r].size; @@ -447,11 +458,11 @@ static int __init_memblock memblock_double_array(struct memblock_type *type, if (type != &memblock.reserved) new_area_start = new_area_size = 0; - addr = memblock_find_in_range(new_area_start + new_area_size, + addr = __memblock_find_in_range(new_area_start + new_area_size, memblock.current_limit, new_alloc_size, PAGE_SIZE); if (!addr && new_area_size) - addr = memblock_find_in_range(0, + addr = __memblock_find_in_range(0, min(new_area_start, memblock.current_limit), new_alloc_size, PAGE_SIZE); diff --git a/mm/pkram.c b/mm/pkram.c index dd3c89614010..e49c9bcd3854 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1238,3 +1238,54 @@ void pkram_free_pgt(void) __free_pages_core(virt_to_page(pkram_pgd), 0); pkram_pgd = NULL; } + +static int __init_memblock pkram_memblock_find_cb(struct pkram_pg_state *st, unsigned long base, unsigned long size) +{ + unsigned long end = base + size; + unsigned long addr; + + if (size < st->min_size) + return 0; + + addr = memblock_find_in_range(base, end, st->min_size, PAGE_SIZE); + if (!addr) + return 0; + + st->retval = addr; + return 1; +} + +/* + * It may be necessary to allocate a larger reserved memblock array + * while populating it with ranges of preserved pages. To avoid + * trampling preserved pages that have not yet been added to the + * memblock reserved list this function implements a wrapper around + * memblock_find_in_range() that restricts searches to subranges + * that do not contain preserved pages. + */ +phys_addr_t __init_memblock pkram_memblock_find_in_range(phys_addr_t start, + phys_addr_t end, phys_addr_t size, + phys_addr_t align) +{ + struct pkram_pg_state st = { + .range_cb = pkram_memblock_find_cb, + .min_addr = start, + .max_addr = end, + .min_size = PAGE_ALIGN(size), + .find_holes = true, + }; + + if (!pkram_reservation_in_progress) + return memblock_find_in_range(start, end, size, align); + + if (!pkram_pgd) { + WARN_ONCE(1, "No preserved pages pagetable\n"); + return memblock_find_in_range(start, end, size, align); + } + + WARN_ONCE(memblock_bottom_up(), "PKRAM: bottom up memblock allocation not yet supported\n"); + + pkram_walk_pgt_rev(&st, pkram_pgd); + + return st.retval; +} From patchwork Thu May 7 00:41:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532197 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B18815AB for ; Thu, 7 May 2020 00:45:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DF20B215A4 for ; Thu, 7 May 2020 00:45:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="J02jXttg" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF20B215A4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 18C0B900025; Wed, 6 May 2020 20:45:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 11475900002; Wed, 6 May 2020 20:45:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA7E1900024; Wed, 6 May 2020 20:45:41 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0105.hostedemail.com [216.40.44.105]) by kanga.kvack.org (Postfix) with ESMTP id CB6C8900002 for ; Wed, 6 May 2020 20:45:41 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 96A668249980 for ; Thu, 7 May 2020 00:45:41 +0000 (UTC) X-FDA: 76788080082.20.art16_31a0b22965908 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30034:30054:30056:30064:30090,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: art16_31a0b22965908 X-Filterd-Recvd-Size: 13066 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:40 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470df6v097703; Thu, 7 May 2020 00:45:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=rlQi5rxYCHxqA5qKb9P4QkKEJHfUxot3+XWhWAfqKME=; b=J02jXttgdJsysTmwPlOH9qcFyn2Xdj2uzmEI9wYoaoGNBVVioCcduz4kKweVJwWKtIqe iTX/Sv7u1tTl9c0B0Skgy6tSoIjBpwFhgcW/fklhutm6C4tTTfx1cMD9qGmZVwEuoLVW yzv8ogAuvdxbGp1sVDkNtIhXpRgxGPKCnP4yV5WLbr3/KiNXtEB7OFDENr7L8IFQI2YG EUgy94b5O3v3Io7vnBQCBDnde/DLFRhq3pWS6QwWr4VZfllWZWZCo0d13OG8o8B23tyn HWW4n9OmhmDHk07IeECWwO+yn8FIvdVlNmR/jdEHWhT0klhZzQNyHsNntBoEwzGiJhDR Cw== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 30usgq4h3y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:45:12 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470alr9170682; Thu, 7 May 2020 00:43:11 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 30us7p2m8e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:11 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470h8ph024282; Thu, 7 May 2020 00:43:08 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:08 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 15/43] PKRAM: provide a way to ban pages from use by PKRAM Date: Wed, 6 May 2020 17:41:41 -0700 Message-Id: <1588812129-8596-16-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Not all memory ranges can be used for saving preserved over-kexec data. For example, a kexec kernel may be loaded before pages are preserved. The memory regions where the kexec segments will be copied to on kexec must not contain preserved pages or else they will be clobbered. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 2 + mm/pkram.c | 210 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 212 insertions(+) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 409022e1472f..1ba48442ef8e 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -69,10 +69,12 @@ phys_addr_t pkram_memblock_find_in_range(phys_addr_t start, phys_addr_t end, extern unsigned long pkram_reserved_pages; void pkram_reserve(void); void pkram_free_pgt(void); +void pkram_ban_region(unsigned long start, unsigned long end); #else #define pkram_reserved_pages 0UL static inline void pkram_reserve(void) { } static inline void pkram_free_pgt(void) { } +static inline void pkram_ban_region(unsigned long start, unsigned long end) { } #endif #endif /* _LINUX_PKRAM_H */ diff --git a/mm/pkram.c b/mm/pkram.c index e49c9bcd3854..60863c8ecbab 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -119,6 +119,28 @@ unsigned long __initdata pkram_reserved_pages; static bool pkram_reservation_in_progress; /* + * For tracking a region of memory that PKRAM is not allowed to use. + */ +struct banned_region { + unsigned long start, end; /* pfn, inclusive */ +}; + +#define MAX_NR_BANNED (32 + MAX_NUMNODES * 2) + +static unsigned int nr_banned; /* number of banned regions */ + +/* banned regions; arranged in ascending order, do not overlap */ +static struct banned_region banned[MAX_NR_BANNED]; +/* + * If a page allocated for PKRAM turns out to belong to a banned region, + * it is placed on the banned_pages list so subsequent allocation attempts + * do not encounter it again. The list is shrunk when system memory is low. + */ +static LIST_HEAD(banned_pages); /* linked through page::lru */ +static DEFINE_SPINLOCK(banned_pages_lock); +static unsigned long nr_banned_pages; + +/* * The PKRAM super block pfn, see above. */ static int __init parse_pkram_sb_pfn(char *arg) @@ -223,12 +245,120 @@ void __init pkram_reserve(void) pr_info("PKRAM: %lu pages reserved\n", pkram_reserved_pages); } +/* + * Ban pfn range [start..end] (inclusive) from use in PKRAM. + */ +void pkram_ban_region(unsigned long start, unsigned long end) +{ + int i, merged = -1; + + if (pkram_reservation_in_progress) + return; + + /* first try to merge the region with an existing one */ + for (i = nr_banned - 1; i >= 0 && start <= banned[i].end + 1; i--) { + if (end + 1 >= banned[i].start) { + start = min(banned[i].start, start); + end = max(banned[i].end, end); + if (merged < 0) + merged = i; + } else + /* + * Regions are arranged in ascending order and do not + * intersect so the merged region cannot jump over its + * predecessors. + */ + BUG_ON(merged >= 0); + } + + i++; + + if (merged >= 0) { + banned[i].start = start; + banned[i].end = end; + /* shift if merged with more than one region */ + memmove(banned + i + 1, banned + merged + 1, + sizeof(*banned) * (nr_banned - merged - 1)); + nr_banned -= merged - i; + return; + } + + /* + * The region does not intersect with an existing one; + * try to create a new one. + */ + if (nr_banned == MAX_NR_BANNED) { + pr_err("PKRAM: Failed to ban %lu-%lu: " + "Too many banned regions\n", start, end); + return; + } + + memmove(banned + i + 1, banned + i, + sizeof(*banned) * (nr_banned - i)); + banned[i].start = start; + banned[i].end = end; + nr_banned++; +} + +static void pkram_show_banned(void) +{ + int i; + unsigned long n, total = 0; + + pr_info("PKRAM: banned regions:\n"); + for (i = 0; i < nr_banned; i++) { + n = banned[i].end - banned[i].start + 1; + pr_info("%4d: [%08lx - %08lx] %ld pages\n", + i, banned[i].start, banned[i].end, n); + total += n; + } + pr_info("Total banned: %ld pages in %d regions\n", + total, nr_banned); +} + +/* + * Returns true if the page may not be used for storing preserved data. + */ +static bool pkram_page_banned(struct page *page) +{ + unsigned long epfn, pfn = page_to_pfn(page); + int l = 0, r = nr_banned - 1, m; + + epfn = pfn + compound_nr(page) - 1; + + /* do binary search */ + while (l <= r) { + m = (l + r) / 2; + if (epfn < banned[m].start) + r = m - 1; + else if (pfn > banned[m].end) + l = m + 1; + else + return true; + } + return false; +} + static inline struct page *__pkram_alloc_page(gfp_t gfp_mask, bool add_to_map) { struct page *page; + LIST_HEAD(list); + unsigned long len = 0; int err; page = alloc_page(gfp_mask); + while (page && pkram_page_banned(page)) { + len++; + list_add(&page->lru, &list); + page = alloc_page(gfp_mask); + } + if (len > 0) { + spin_lock(&banned_pages_lock); + nr_banned_pages += len; + list_splice(&list, &banned_pages); + spin_unlock(&banned_pages_lock); + } + if (page && add_to_map) { err = pkram_add_identity_map(page); if (err) { @@ -256,6 +386,53 @@ static inline void pkram_free_page(void *addr) free_page((unsigned long)addr); } +static void __banned_pages_shrink(unsigned long nr_to_scan) +{ + struct page *page; + + if (nr_to_scan <= 0) + return; + + while (nr_banned_pages > 0) { + BUG_ON(list_empty(&banned_pages)); + page = list_first_entry(&banned_pages, struct page, lru); + list_del(&page->lru); + __free_page(page); + nr_banned_pages--; + nr_to_scan--; + if (!nr_to_scan) + break; + } +} + +static unsigned long +banned_pages_count(struct shrinker *shrink, struct shrink_control *sc) +{ + return nr_banned_pages; +} + +static unsigned long +banned_pages_scan(struct shrinker *shrink, struct shrink_control *sc) +{ + int nr_left = nr_banned_pages; + + if (!sc->nr_to_scan || !nr_left) + return nr_left; + + spin_lock(&banned_pages_lock); + __banned_pages_shrink(sc->nr_to_scan); + nr_left = nr_banned_pages; + spin_unlock(&banned_pages_lock); + + return nr_left; +} + +static struct shrinker banned_pages_shrinker = { + .count_objects = banned_pages_count, + .scan_objects = banned_pages_scan, + .seeks = DEFAULT_SEEKS, +}; + static inline void pkram_insert_node(struct pkram_node *node) { list_add(&virt_to_page(node)->lru, &pkram_nodes); @@ -665,6 +842,32 @@ static int __pkram_save_page(struct pkram_stream *ps, return 0; } +static int __pkram_save_page_copy(struct pkram_stream *ps, struct page *page, + short flags) +{ + int nr_pages = compound_nr(page); + pgoff_t index = page->index; + int i, err; + + for (i = 0; i < nr_pages; i++, index++) { + struct page *p = page + i; + struct page *new; + + new = pkram_alloc_page(ps->gfp_mask); + if (!new) + return -ENOMEM; + + copy_highpage(new, p); + err = __pkram_save_page(ps, new, flags, index); + put_page(new); + + if (err) + return err; + } + + return 0; +} + /** * Save page @page to the preserved memory node and object associated with * stream @ps. The stream must have been initialized with pkram_prepare_save() @@ -688,6 +891,10 @@ int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags) BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_SAVE); + /* if page is banned, relocate it */ + if (pkram_page_banned(page)) + return __pkram_save_page_copy(ps, page, flags); + err = __pkram_save_page(ps, page, flags, page->index); if (!err) err = pkram_add_identity_map(page); @@ -891,6 +1098,7 @@ static void __pkram_reboot(void) unsigned long pgd_pfn = 0; if (pkram_pgd) { + pkram_show_banned(); list_for_each_entry_reverse(page, &pkram_nodes, lru) { node = page_address(page); if (WARN_ON(node->flags & PKRAM_ACCMODE_MASK)) @@ -957,6 +1165,7 @@ static int __init pkram_init_sb(void) page = __pkram_alloc_page(GFP_KERNEL | __GFP_ZERO, false); if (!page) { pr_err("PKRAM: Failed to allocate super block\n"); + __banned_pages_shrink(ULONG_MAX); return 0; } pkram_sb = page_address(page); @@ -979,6 +1188,7 @@ static int __init pkram_init(void) { if (pkram_init_sb()) { register_reboot_notifier(&pkram_reboot_notifier); + register_shrinker(&banned_pages_shrinker); sysfs_update_group(kernel_kobj, &pkram_attr_group); } return 0; From patchwork Thu May 7 00:41:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532095 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03281139F for ; Thu, 7 May 2020 00:43:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BA4B02064A for ; Thu, 7 May 2020 00:43:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Osuou/V5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA4B02064A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4238890000E; Wed, 6 May 2020 20:43:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 36000900008; Wed, 6 May 2020 20:43:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13C9390000E; Wed, 6 May 2020 20:43:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0098.hostedemail.com [216.40.44.98]) by kanga.kvack.org (Postfix) with ESMTP id E0057900008 for ; Wed, 6 May 2020 20:43:46 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A86CE124F for ; Thu, 7 May 2020 00:43:46 +0000 (UTC) X-FDA: 76788075252.24.rub06_20ef1c641f939 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30070,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:0,LUA_SUMMARY:none X-HE-Tag: rub06_20ef1c641f939 X-Filterd-Recvd-Size: 6573 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:43:46 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bkGi092899; Thu, 7 May 2020 00:43:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=Y+m+s4ZCNURBXj766gTT4gE1Fj6SOZhPn+YCjxoGrp4=; b=Osuou/V5BI3MblqKTJFUzzNZy5OpWrI3Z+5ocCwhhrXYW+JJzCx0NkO8086TAGBTLoYT s3fYplzuOKAnB5WqFvph6yEsPcA060kp2bvo4eNFLyiTYAFZ7ZUw63SqJFZoVX0Dl8h1 dMwjlznlZQAijOgiMVgOoPPt7hnNh0xZfB+dxRn7N285j88JFRLnI3jeBeo9l1g2S41n AlJwLFakJc4Ic1WhMIhDiwJtwuOc0bYX90EdYdetzBpWkKcNwTSOgGJYDuqG/Vky1+GZ rvMIc9aZLxizoWSytjjXotor/lxxqCB9KUE5nSF+Zf7EAwKsQeT0gNJwbUTt9nDHf6t8 TA== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 30s1gnd8kv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:15 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470auqu170921; Thu, 7 May 2020 00:43:14 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 30us7p2ma0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:14 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470hB01025731; Thu, 7 May 2020 00:43:11 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:11 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 16/43] kexec: PKRAM: prevent kexec clobbering preserved pages in some cases Date: Wed, 6 May 2020 17:41:42 -0700 Message-Id: <1588812129-8596-17-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=2 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When loading a kernel for kexec, dynamically update the list of physical ranges that are not to be used for storing preserved pages with the ranges where kexec segments will be copied to on reboot. This ensures no pages preserved after the new kernel has been loaded will reside in these ranges on reboot. Not yet handled is the case where pages have been preserved before a kexec kernel is loaded. This will be covered by a later patch. Signed-off-by: Anthony Yznaga --- kernel/kexec.c | 9 +++++++++ kernel/kexec_file.c | 10 ++++++++++ 2 files changed, 19 insertions(+) diff --git a/kernel/kexec.c b/kernel/kexec.c index f977786fe498..c44598fc42a1 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "kexec_internal.h" @@ -163,6 +164,14 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments, if (ret) goto out; + for (i = 0; i < nr_segments; i++) { + unsigned long mem = image->segment[i].mem; + size_t memsz = image->segment[i].memsz; + + if (memsz) + pkram_ban_region(PFN_DOWN(mem), PFN_UP(mem + memsz) - 1); + } + /* Install the new kernel and uninstall the old */ image = xchg(dest_image, image); diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index faa74d5f6941..f57f72237859 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -26,6 +26,8 @@ #include #include #include +#include + #include "kexec_internal.h" static int kexec_calculate_store_digests(struct kimage *image); @@ -445,6 +447,14 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, if (ret) goto out; + for (i = 0; i < image->nr_segments; i++) { + unsigned long mem = image->segment[i].mem; + size_t memsz = image->segment[i].memsz; + + if (memsz) + pkram_ban_region(PFN_DOWN(mem), PFN_UP(mem + memsz) - 1); + } + /* * Free up any temporary buffers allocated which are not needed * after image has been loaded From patchwork Thu May 7 00:41:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532207 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DBE4A81 for ; Thu, 7 May 2020 00:46:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9EC8F20736 for ; Thu, 7 May 2020 00:46:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="QTLMQYBV" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9EC8F20736 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CABDF900024; Wed, 6 May 2020 20:46:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C8338900023; Wed, 6 May 2020 20:46:11 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B982B900024; Wed, 6 May 2020 20:46:11 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id 9F9F1900023 for ; Wed, 6 May 2020 20:46:11 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 69401AF73 for ; Thu, 7 May 2020 00:46:11 +0000 (UTC) X-FDA: 76788081342.20.cable81_35ff54b9c9a0c X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30070:30090,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: cable81_35ff54b9c9a0c X-Filterd-Recvd-Size: 6366 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:46:10 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bkGp092899; Thu, 7 May 2020 00:45:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=DO62vjvZePhp7wYXZdhX/gUz6VFY4InjbCQaC+l5MmI=; b=QTLMQYBVlf732m4qafoDJtBSR4qykUyLHnFro/p7XnvylnJPd8Jc4j1qWmXFSPSx1R/G Vw8aYkCpDZiNN0DhG2fShGOBDYqfWq4BTMKXSgagCCiHi8SBEWK2xQSDRFmLpY4Ju0lA 0FAXG7XhxY5FD6M1VNO5WE1vAjT+EdhE8f5E5VJ3rjoJE5lESCvB6WwAxP/hVilBA0k/ CfMoBs0UcylgpewtUqXlvcvdhSNcxp0kkEUmKNeJxmCcOkrb68VTRbyVkBYOQ4RRrvmx s/gHtt6NFPuupKIZUgeOiVzuGotf2mzcA4Cfpgf/hJAiGLEE76ON3fDFbBYMT5Bfp/h6 hg== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 30s1gnd8te-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:45:39 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470am3g170760; Thu, 7 May 2020 00:43:38 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3020.oracle.com with ESMTP id 30us7p2mtv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:38 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470hak6029744; Thu, 7 May 2020 00:43:36 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:14 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 17/43] PKRAM: provide a way to check if a memory range has preserved pages Date: Wed, 6 May 2020 17:41:43 -0700 Message-Id: <1588812129-8596-18-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=2 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a kernel is loaded for kexec the address ranges where the kexec segments will be copied to may conflict with pages already set to be preserved. Provide a way to determine if preserved pages exist in a specified range. Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 2 ++ mm/pkram.c | 25 +++++++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 1ba48442ef8e..1cd518843d7a 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -70,11 +70,13 @@ extern unsigned long pkram_reserved_pages; void pkram_reserve(void); void pkram_free_pgt(void); void pkram_ban_region(unsigned long start, unsigned long end); +int pkram_has_preserved_pages(unsigned long start, unsigned long end); #else #define pkram_reserved_pages 0UL static inline void pkram_reserve(void) { } static inline void pkram_free_pgt(void) { } static inline void pkram_ban_region(unsigned long start, unsigned long end) { } +static inline int pkram_has_preserved_pages(unsigned long start, unsigned long end) { return 0; } #endif #endif /* _LINUX_PKRAM_H */ diff --git a/mm/pkram.c b/mm/pkram.c index 60863c8ecbab..0aaaf9b79682 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1499,3 +1499,28 @@ phys_addr_t __init_memblock pkram_memblock_find_in_range(phys_addr_t start, return st.retval; } + +static int pkram_has_preserved_pages_cb(struct pkram_pg_state *st, unsigned long base, unsigned long size) +{ + st->retval = 1; + return 1; +} + +/* + * Check whether the memory range [start, end) contains preserved pages. + */ +int pkram_has_preserved_pages(unsigned long start, unsigned long end) +{ + struct pkram_pg_state st = { + .range_cb = pkram_has_preserved_pages_cb, + .min_addr = start, + .max_addr = end, + }; + + if (!pkram_pgd) + return 0; + + pkram_walk_pgt_rev(&st, pkram_pgd); + + return st.retval; +} From patchwork Thu May 7 00:41:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532107 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3A7092A for ; Thu, 7 May 2020 00:44:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8F28F20736 for ; Thu, 7 May 2020 00:44:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="mjtxeEvE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F28F20736 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9C7E690000F; Wed, 6 May 2020 20:44:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 976F0900008; Wed, 6 May 2020 20:44:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8182490000F; Wed, 6 May 2020 20:44:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com [216.40.44.152]) by kanga.kvack.org (Postfix) with ESMTP id 67A3A900008 for ; Wed, 6 May 2020 20:44:16 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1AB3A181AEF30 for ; Thu, 7 May 2020 00:44:16 +0000 (UTC) X-FDA: 76788076512.20.laugh69_2537520dd5f37 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30069:30070,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: laugh69_2537520dd5f37 X-Filterd-Recvd-Size: 5950 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:44:15 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bnDA064643; Thu, 7 May 2020 00:43:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=6P9Ea8+L0/6u0LKsSX8nQ7X+dyBEkyzViml0EHL0W1M=; b=mjtxeEvEOhg5lneS/5F9CvlrhzEJXQnhCL4+ghi6GcQtqHvLoy5bJniVXhVRDmOrXxHs kKVwq0uMi8fK6/FFsb2MHVKg81uXx+5nh04u01P7mjdV7uqPCWWujCy+mj3FCeIq+ucH 52pLV0hf1v8wyeKtWsCu4WWyZNL0rXtShMHwHGDsxSLkdolpfb4eqRdTwpMbKMV4O/86 E+R/ncLrRQbIwhzLJAQLPlxylXk5f+XI4PxRfz71rqrKSIW8517arkugBDGvJXfo9yvu KhJ65ZnOlGziT5I261OAmKbMIMP9/sqPQgoOfhQKMdeHtwls33J3ROcVRl5RTYU20H7a ng== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 30s09rdf9m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:43 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bm9F131707; Thu, 7 May 2020 00:43:43 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30t1r95a8f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:43 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470hd2R025819; Thu, 7 May 2020 00:43:40 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:39 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 18/43] kexec: PKRAM: avoid clobbering already preserved pages Date: Wed, 6 May 2020 17:41:44 -0700 Message-Id: <1588812129-8596-19-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=0 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Ensure destination ranges of the kexec segments do not overlap with any kernel pages marked to be preserved across kexec. For kexec_load, return EADDRNOTAVAIL if overlap is detected. For kexec_file_load, skip ranges containing preserved pages when seaching for available ranges to use. Signed-off-by: Anthony Yznaga --- kernel/kexec_core.c | 3 +++ kernel/kexec_file.c | 5 +++++ 2 files changed, 8 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index c19c0dad1ebe..8c24b546352e 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include @@ -176,6 +177,8 @@ int sanity_check_segment_list(struct kimage *image) return -EADDRNOTAVAIL; if (mend >= KEXEC_DESTINATION_MEMORY_LIMIT) return -EADDRNOTAVAIL; + if (pkram_has_preserved_pages(mstart, mend)) + return -EADDRNOTAVAIL; } /* Verify our destination addresses do not overlap. diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index f57f72237859..7b14e1b1a178 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -498,6 +498,11 @@ static int locate_mem_hole_top_down(unsigned long start, unsigned long end, continue; } + if (pkram_has_preserved_pages(temp_start, temp_end + 1)) { + temp_start = temp_start - PAGE_SIZE; + continue; + } + /* We found a suitable memory range */ break; } while (1); From patchwork Thu May 7 00:41:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532109 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B830D139F for ; Thu, 7 May 2020 00:44:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 84A9820736 for ; Thu, 7 May 2020 00:44:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="ZyZVIh+M" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 84A9820736 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8B645900010; Wed, 6 May 2020 20:44:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 86696900008; Wed, 6 May 2020 20:44:18 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E0D2900010; Wed, 6 May 2020 20:44:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0072.hostedemail.com [216.40.44.72]) by kanga.kvack.org (Postfix) with ESMTP id 4DFFE900008 for ; Wed, 6 May 2020 20:44:18 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 04EE58249980 for ; Thu, 7 May 2020 00:44:18 +0000 (UTC) X-FDA: 76788076596.06.grip31_257a5e2230a04 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: grip31_257a5e2230a04 X-Filterd-Recvd-Size: 6363 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:44:17 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470d0Rc076188; Thu, 7 May 2020 00:43:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=J7Q+8SZ1gV5T2Of756nJv1YzUrvepjMeioLG71z3S9M=; b=ZyZVIh+MFM4KSfqsEhgzO46tDmQNFz765fTjdy+D4xU3ipcRB4RqVS3Ui8zi49/8Ap5m uxAO0SZOBdfYpSHGBGsUK9p6hQSzdgS+89nD1xz9fhavAyL7kDa4j1OR1SuUMBON8fbs PVxNXpO35dZqRxizwgRFvZ0lao4k0btp9BTTGY1o8N3OnCEDci+NxECx4IwJ+VzMCCzu irdnqptbP0eeiMXPTKaufoXZ92kCvw3vlQyKH2rWoyy+r3Amkwp//FBzsBUefmAR7hJM zZQWi7OuDr/0R7Lb3Ar6Mr3h4BfOX4xxQ1mDCXA9rWFiBnQOxo9XrkS4cm6Mo7VqGSP7 lg== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 30s09rdf9p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:46 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470alNK170704; Thu, 7 May 2020 00:43:46 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 30us7p2n40-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:46 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470hhja024556; Thu, 7 May 2020 00:43:43 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:42 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 19/43] mm: PKRAM: allow preserved memory to be freed from userspace Date: Wed, 6 May 2020 17:41:45 -0700 Message-Id: <1588812129-8596-20-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To free all space utilized for preserved memory, one can write 0 to /sys/kernel/pkram. This will destroy all PKRAM nodes that are not currently being read or written. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- mm/pkram.c | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/mm/pkram.c b/mm/pkram.c index 0aaaf9b79682..95e691382721 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -509,6 +509,32 @@ static void pkram_truncate_node(struct pkram_node *node) node->obj_pfn = 0; } +/* + * Free all nodes that are not under operation. + */ +static void pkram_truncate(void) +{ + struct page *page, *tmp; + struct pkram_node *node; + LIST_HEAD(dispose); + + mutex_lock(&pkram_mutex); + list_for_each_entry_safe(page, tmp, &pkram_nodes, lru) { + node = page_address(page); + if (!(node->flags & PKRAM_ACCMODE_MASK)) + list_move(&page->lru, &dispose); + } + mutex_unlock(&pkram_mutex); + + while (!list_empty(&dispose)) { + page = list_first_entry(&dispose, struct page, lru); + list_del(&page->lru); + node = page_address(page); + pkram_truncate_node(node); + pkram_free_page(node); + } +} + static void pkram_add_link(struct pkram_link *link, struct pkram_obj *obj) { link->link_pfn = obj->link_pfn; @@ -1141,8 +1167,19 @@ static ssize_t show_pkram_sb_pfn(struct kobject *kobj, return sprintf(buf, "%lx\n", pfn); } +static ssize_t store_pkram_sb_pfn(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + int val; + + if (kstrtoint(buf, 0, &val) || val) + return -EINVAL; + pkram_truncate(); + return count; +} + static struct kobj_attribute pkram_sb_pfn_attr = - __ATTR(pkram, 0444, show_pkram_sb_pfn, NULL); + __ATTR(pkram, 0644, show_pkram_sb_pfn, store_pkram_sb_pfn); static struct attribute *pkram_attrs[] = { &pkram_sb_pfn_attr.attr, From patchwork Thu May 7 00:41:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532115 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0023315AB for ; Thu, 7 May 2020 00:44:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C107220936 for ; Thu, 7 May 2020 00:44:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Lv++/U9G" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C107220936 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D988F900011; Wed, 6 May 2020 20:44:21 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D4855900008; Wed, 6 May 2020 20:44:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C37A6900011; Wed, 6 May 2020 20:44:21 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0068.hostedemail.com [216.40.44.68]) by kanga.kvack.org (Postfix) with ESMTP id A94B2900008 for ; Wed, 6 May 2020 20:44:21 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 66E66180AD80F for ; Thu, 7 May 2020 00:44:21 +0000 (UTC) X-FDA: 76788076722.09.cakes82_25fb232041529 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30069,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:34,LUA_SUMMARY:none X-HE-Tag: cakes82_25fb232041529 X-Filterd-Recvd-Size: 5656 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:44:20 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dCV4097456; Thu, 7 May 2020 00:43:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=w6gOhqdHP7hbtdLBjCjLiDcomUNMD9tvJtFBbgdcHPI=; b=Lv++/U9G1thp1Jx477m45dEIdSTTpOl/IZTc1/2RMGCUioiWeeoVm7oSSB6ITBLuLy3E hb4PEu9jOPp5h43qH3/TWELHqYV83F6pWkutzNXpCvrrUAGx6RKuzZoitB9sZME1+Dpn yxpBGXG0EmRLkPCfwp52kSGt+cSAm6nJscyd6ST/1t9CehBtFlDtztZ/e6QJPuixX++d yTrGU6b11vzjTeFJyjloX0w+wJrLi0Tph93RVY2loNtSnbBGClJBO2D8eqPCitU1WyS2 qMfxR55riNPEtqH+5Lszl9DKwQAyK/+tbaauixpximYhHRrI5FXZfvC0GwvWPe4vWRWU bw== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2120.oracle.com with ESMTP id 30usgq4gyy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:47 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470an8G136067; Thu, 7 May 2020 00:43:47 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3030.oracle.com with ESMTP id 30sjdwrr7b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:47 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470hkb0029778; Thu, 7 May 2020 00:43:46 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:46 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 20/43] PKRAM: disable feature when running the kdump kernel Date: Wed, 6 May 2020 17:41:46 -0700 Message-Id: <1588812129-8596-21-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The kdump kernel should not preserve or restore pages. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/pkram.c b/mm/pkram.c index 95e691382721..4d4d836fea53 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include @@ -193,7 +194,7 @@ void __init pkram_reserve(void) { int err = 0; - if (!pkram_sb_pfn) + if (!pkram_sb_pfn || is_kdump_kernel()) return; pr_info("PKRAM: Examining preserved memory...\n"); @@ -305,6 +306,9 @@ static void pkram_show_banned(void) int i; unsigned long n, total = 0; + if (is_kdump_kernel()) + return; + pr_info("PKRAM: banned regions:\n"); for (i = 0; i < nr_banned; i++) { n = banned[i].end - banned[i].start + 1; @@ -1223,7 +1227,7 @@ static int __init pkram_init_sb(void) static int __init pkram_init(void) { - if (pkram_init_sb()) { + if (!is_kdump_kernel() && pkram_init_sb()) { register_reboot_notifier(&pkram_reboot_notifier); register_shrinker(&banned_pages_shrinker); sysfs_update_group(kernel_kobj, &pkram_attr_group); From patchwork Thu May 7 00:41:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532117 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E5D2139F for ; Thu, 7 May 2020 00:44:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DC11820936 for ; Thu, 7 May 2020 00:44:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="OBlHDg6G" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DC11820936 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EAA23900008; Wed, 6 May 2020 20:44:25 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E348D900003; Wed, 6 May 2020 20:44:25 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD56C900008; Wed, 6 May 2020 20:44:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id B1937900003 for ; Wed, 6 May 2020 20:44:25 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7190C8249980 for ; Thu, 7 May 2020 00:44:25 +0000 (UTC) X-FDA: 76788076890.11.card42_268eff6e74736 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30045:30054:30064:30070,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: card42_268eff6e74736 X-Filterd-Recvd-Size: 15580 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:44:24 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dEoB097492; Thu, 7 May 2020 00:43:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=HwQl8sIJKFTDlEIKEwN3f7Owe2bSy++gIwkEBIVCps4=; b=OBlHDg6G996R5CuGG8YRp6zgHkmuPkO0g23Yaw0BtGtUE9bADAg/C8afQGeRH6SgDe1I b220uWG8PCFX8Q6H625khd8+KmV7PslJfrSoxUqmtbpNcIj/SJEe+Pfj5LkzizwYM7+M S68tEP+Zwutcv1DYoyqOiOFTyJKgS+2oNq9+rHFtQS/Zc1wjN/MvW25cZwS0meICIs6x KE4FLiVOzGgLJ+iyXNGtEnTJSHvLNehgEi7cBdNbUXHPUYijmTI3ECXDnjXIeYajzXqk Vb/H10DDxUNMx8sVnw84XCfqC4R4pAHC/oNNR1XYlxxIMNfoHdELB5pNdkN8KOhh/l0C LQ== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 30usgq4h03-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:53 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bnED131737; Thu, 7 May 2020 00:43:52 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30t1r95ars-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:52 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470hnSp025943; Thu, 7 May 2020 00:43:49 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:49 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 21/43] x86/KASLR: PKRAM: support physical kaslr Date: Wed, 6 May 2020 17:41:47 -0700 Message-Id: <1588812129-8596-22-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Avoid regions of memory that contain preserved pages when computing slots used to select where to put the decompressed kernel. Signed-off-by: Anthony Yznaga --- arch/x86/boot/compressed/Makefile | 3 + arch/x86/boot/compressed/kaslr.c | 67 ++++++---- arch/x86/boot/compressed/misc.h | 19 +++ arch/x86/boot/compressed/pkram.c | 252 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 320 insertions(+), 21 deletions(-) create mode 100644 arch/x86/boot/compressed/pkram.c diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 5f7c262bcc99..ba0d76c53574 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -84,6 +84,9 @@ ifdef CONFIG_X86_64 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o vmlinux-objs-y += $(obj)/mem_encrypt.o vmlinux-objs-y += $(obj)/pgtable_64.o +ifdef CONFIG_RANDOMIZE_BASE + vmlinux-objs-$(CONFIG_PKRAM) += $(obj)/pkram.o +endif endif vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index d7408af55738..3f0a6fb15ac2 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -613,31 +613,16 @@ static unsigned long slots_fetch_random(void) return 0; } -static void __process_mem_region(struct mem_vector *entry, - unsigned long minimum, - unsigned long image_size) +void ___process_mem_region(struct mem_vector *entry, + unsigned long minimum, + unsigned long image_size) { struct mem_vector region, overlap; - unsigned long start_orig, end; + unsigned long start_orig; struct mem_vector cur_entry; - /* On 32-bit, ignore entries entirely above our maximum. */ - if (IS_ENABLED(CONFIG_X86_32) && entry->start >= KERNEL_IMAGE_SIZE) - return; - - /* Ignore entries entirely below our minimum. */ - if (entry->start + entry->size < minimum) - return; - - /* Ignore entries above memory limit */ - end = min(entry->size + entry->start, mem_limit); - if (entry->start >= end) - return; - cur_entry.start = entry->start; - cur_entry.size = end - entry->start; - - region.start = cur_entry.start; - region.size = cur_entry.size; + region.start = cur_entry.start = entry->start; + region.size = cur_entry.size = entry->size; /* Give up if slot area array is full. */ while (slot_area_index < MAX_SLOT_AREA) { @@ -691,6 +676,39 @@ static void __process_mem_region(struct mem_vector *entry, } } +static void __process_mem_region(struct mem_vector *entry, + unsigned long minimum, + unsigned long image_size) +{ + struct mem_vector region, overlap; + unsigned long start_orig, end; + struct mem_vector cur_entry; + + /* On 32-bit, ignore entries entirely above our maximum. */ + if (IS_ENABLED(CONFIG_X86_32) && entry->start >= KERNEL_IMAGE_SIZE) + return; + + /* Ignore entries entirely below our minimum. */ + if (entry->start + entry->size < minimum) + return; + + /* Ignore entries above memory limit */ + end = min(entry->size + entry->start, mem_limit); + if (entry->start >= end) + return; + cur_entry.start = entry->start; + cur_entry.size = end - entry->start; + + /* Return if region can't contain decompressed kernel */ + if (cur_entry.size < image_size) + return; + + if (pkram_enabled()) + return pkram_process_mem_region(&cur_entry, minimum, image_size); + else + return ___process_mem_region(&cur_entry, minimum, image_size); +} + static bool process_mem_region(struct mem_vector *region, unsigned long long minimum, unsigned long long image_size) @@ -902,6 +920,8 @@ void choose_random_location(unsigned long input, return; } + pkram_init(); + #ifdef CONFIG_X86_5LEVEL if (__read_cr4() & X86_CR4_LA57) { __pgtable_l5_enabled = 1; @@ -952,3 +972,8 @@ void choose_random_location(unsigned long input, random_addr = find_random_virt_addr(LOAD_PHYSICAL_ADDR, output_size); *virt_addr = random_addr; } + +int slot_areas_full(void) +{ + return slot_area_index == MAX_SLOT_AREA; +} diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h index 726e264410ff..ca1a8ae5ebe9 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -117,6 +117,25 @@ static inline void console_init(void) { } #endif +void ___process_mem_region(struct mem_vector *entry, + unsigned long minimum, + unsigned long image_size); + +#ifdef CONFIG_PKRAM +void pkram_init(void); +int pkram_enabled(void); +void pkram_process_mem_region(struct mem_vector *entry, + unsigned long minimum, + unsigned long image_size); +#else +static inline void pkram_init(void) { } +static inline int pkram_enabled(void) { return 0; } +static inline void pkram_process_mem_region(struct mem_vector *entry, + unsigned long minimum, + unsigned long image_size) +{ ___process_mem_region(entry, minimum, image_size); } +#endif + void set_sev_encryption_mask(void); /* acpi.c */ diff --git a/arch/x86/boot/compressed/pkram.c b/arch/x86/boot/compressed/pkram.c new file mode 100644 index 000000000000..5fc1e26909de --- /dev/null +++ b/arch/x86/boot/compressed/pkram.c @@ -0,0 +1,252 @@ +// SPDX-License-Identifier: GPL-2.0 +#define __pa(x) ((unsigned long)(x)) +#define __va(x) ((void *)((unsigned long)(x))) + +#include "misc.h" +#include + +struct pkram_super_block { + __u64 node_pfn; + __u64 pgd_pfn; +}; + +static unsigned long long pkram_sb_pfn; +static struct pkram_super_block *pkram_sb; +static pgd_t *pkram_pgd; + +struct pg_state { + int (*range_cb)(struct pg_state *state, unsigned long base, + unsigned long size); + unsigned long curr_addr; + unsigned long start_addr; + unsigned long min_addr; + unsigned long max_addr; + unsigned long min_size; + unsigned long minimum; + bool tracking; + bool find_holes; +}; + +int pkram_enabled(void) +{ + return pkram_pgd ? 1 : 0; +} + +void pkram_init(void) +{ + char arg[32]; + + if (cmdline_find_option("pkram", arg, sizeof(arg)) > 0) { + if (kstrtoull(arg, 16, &pkram_sb_pfn) != 0) + return; + } else + return; + + pkram_sb = (struct pkram_super_block *)(pkram_sb_pfn << PAGE_SHIFT); + + if (pkram_sb) + pkram_pgd = (pgd_t *)(pkram_sb->pgd_pfn << PAGE_SHIFT); +} + +static int note_page(struct pg_state *st, int present) +{ + unsigned long curr_addr = st->curr_addr; + bool track_page = present ^ st->find_holes; + + if (!st->tracking && track_page) { + if (curr_addr >= st->max_addr) + return 1; + /* + * curr_addr can be < min_addr if the page straddles the + * boundary + */ + st->start_addr = max(curr_addr, st->min_addr); + st->tracking = true; + } else if (st->tracking) { + unsigned long base, size; + int ret; + + /* Continue tracking if upper bound has not been reached */ + if (track_page && curr_addr < st->max_addr) + return 0; + + curr_addr = min(curr_addr, st->max_addr); + + base = st->start_addr; + size = curr_addr - st->start_addr; + st->tracking = false; + + ret = st->range_cb(st, base, size); + + if (curr_addr == st->max_addr) + return 1; + else + return ret; + } + + return 0; +} + +static int walk_pte_level(struct pg_state *st, pmd_t addr, unsigned long P) +{ + unsigned long *bitmap; + int present; + int i, ret; + + bitmap = __va(pmd_val(addr)); + for (i = 0; i < PTRS_PER_PTE; i++) { + unsigned long curr_addr = P + i * PAGE_SIZE; + + if (curr_addr < st->min_addr) + continue; + st->curr_addr = curr_addr; + present = test_bit(i, bitmap); + ret = note_page(st, present); + if (ret) + break; + } + + return ret; +} + +static int walk_pmd_level(struct pg_state *st, pud_t addr, unsigned long P) +{ + pmd_t *start; + int i, ret; + + start = (pmd_t *)pud_page_vaddr(addr); + for (i = 0; i < PTRS_PER_PMD; i++, start++) { + unsigned long curr_addr = P + i * PMD_SIZE; + + if (curr_addr + PMD_SIZE <= st->min_addr) + continue; + st->curr_addr = curr_addr; + if (!pmd_none(*start)) { + if (pmd_large(*start)) + ret = note_page(st, true); + else + ret = walk_pte_level(st, *start, curr_addr); + } else + ret = note_page(st, false); + if (ret) + break; + } + + return ret; +} + +static int walk_pud_level(struct pg_state *st, p4d_t addr, unsigned long P) +{ + pud_t *start; + int i, ret; + + start = (pud_t *)p4d_page_vaddr(addr); + for (i = 0; i < PTRS_PER_PUD; i++, start++) { + unsigned long curr_addr = P + i * PUD_SIZE; + + if (curr_addr + PUD_SIZE <= st->min_addr) + continue; + st->curr_addr = curr_addr; + if (!pud_none(*start)) { + if (pud_large(*start)) + ret = note_page(st, true); + else + ret = walk_pmd_level(st, *start, curr_addr); + } else + ret = note_page(st, false); + if (ret) + break; + } + + return ret; +} + +static int walk_p4d_level(struct pg_state *st, pgd_t addr, unsigned long P) +{ + p4d_t *start; + int i, ret; + + if (PTRS_PER_P4D == 1) + return walk_pud_level(st, __p4d(pgd_val(addr)), P); + + start = (p4d_t *)pgd_page_vaddr(addr); + for (i = 0; i < PTRS_PER_P4D; i++, start++) { + unsigned long curr_addr = P + i * P4D_SIZE; + + if (curr_addr + P4D_SIZE <= st->min_addr) + continue; + st->curr_addr = curr_addr; + if (!p4d_none(*start)) { + if (p4d_large(*start)) + ret = note_page(st, true); + else + ret = walk_pud_level(st, *start, curr_addr); + } else + ret = note_page(st, false); + if (ret) + break; + } + + return ret; +} + +#define pgd_large(a) (pgtable_l5_enabled() ? pgd_large(a) : p4d_large(__p4d(pgd_val(a)))) +#define pgd_none(a) (pgtable_l5_enabled() ? pgd_none(a) : p4d_none(__p4d(pgd_val(a)))) + +static int walk_pgd_level(struct pg_state *st, pgd_t *pgd) +{ + pgd_t *start = pgd; + int i, ret = 0; + + for (i = 0; i < PTRS_PER_PGD; i++, start++) { + unsigned long curr_addr = i * PGDIR_SIZE; + + if (curr_addr + PGDIR_SIZE <= st->min_addr) + continue; + st->curr_addr = curr_addr; + if (!pgd_none(*start)) + ret = walk_p4d_level(st, *start, curr_addr); + else + ret = note_page(st, false); + if (ret) + break; + } + + return ret; +} + +extern int slot_areas_full(void); + +static int pkram_process_mem_region_cb(struct pg_state *st, unsigned long base, unsigned long size) +{ + struct mem_vector region = { + .start = base, + .size = size, + }; + + if (size < st->min_size) + return 0; + + ___process_mem_region(®ion, st->minimum, st->min_size); + + if (slot_areas_full()) + return 1; + + return 0; +} + +void pkram_process_mem_region(struct mem_vector *entry, + unsigned long minimum, + unsigned long image_size) +{ + struct pg_state st = { + .range_cb = pkram_process_mem_region_cb, + .min_addr = max((unsigned long)entry->start, minimum), + .max_addr = entry->start + entry->size, + .min_size = image_size, + .minimum = minimum, + .find_holes = true, + }; + + walk_pgd_level(&st, pkram_pgd); +} From patchwork Thu May 7 00:41:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532119 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6CD58139F for ; Thu, 7 May 2020 00:44:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 231832064A for ; Thu, 7 May 2020 00:44:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="NJFNT1MS" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 231832064A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 29B84900012; Wed, 6 May 2020 20:44:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 223D8900003; Wed, 6 May 2020 20:44:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C77E900012; Wed, 6 May 2020 20:44:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0102.hostedemail.com [216.40.44.102]) by kanga.kvack.org (Postfix) with ESMTP id E5377900003 for ; Wed, 6 May 2020 20:44:26 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A8095180AD811 for ; Thu, 7 May 2020 00:44:26 +0000 (UTC) X-FDA: 76788076932.07.grip98_26bf3e220f538 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30075,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: grip98_26bf3e220f538 X-Filterd-Recvd-Size: 7893 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:44:26 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bsET064662; Thu, 7 May 2020 00:43:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=PxSg4xD+zcpl2hcOtqdCGahscRkc3d/OAEkiyXkPX8s=; b=NJFNT1MSypCVx6F/PClttNU9H/aOwnOLeqJWQY9uRHct0XD45kZmJFX2rQCZrmRyn/ZN 0l6IksA7KAJ5rDMtuWRYVQ0irqh1iPsWn6oha+TugwePRcf6RGAiaYeF8BIflxNBq2RL nUwBQoIU7YswUqraFlqZEjLbpKdvVCnfp3stFfCe2fVhMhGCnkveEebVZt5BIktbeAsP LCxDKX6eF5mh4JqyClC1/lVkPSmBnKEZdn4envNXms7t5192Bmd4P9cLIVgM00SbBen2 cikYZHLnfFO9ogq+Vw4wRthjrp2TSpJX2Nx+yKaRcHZBwyZxfVhQyZ5mE1Md1HmL9v6J Fg== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 30s09rdfaa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:54 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bmHF131733; Thu, 7 May 2020 00:43:54 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 30t1r95at8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:54 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470hqGB020317; Thu, 7 May 2020 00:43:52 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:52 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 22/43] mm: shmem: introduce shmem_insert_page Date: Wed, 6 May 2020 17:41:48 -0700 Message-Id: <1588812129-8596-23-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The function inserts a page into a shmem file at a specified offset. The page can be a regular PAGE_SIZE page or a transparent huge page. If there is something at the offset (page or swap), the function fails. The function will be used by the next patch. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/shmem_fs.h | 3 ++ mm/shmem.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 98 insertions(+) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 7a35a6901221..688b92cd4ec7 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -96,6 +96,9 @@ enum sgp_type { extern int shmem_getpage(struct inode *inode, pgoff_t index, struct page **pagep, enum sgp_type sgp); +extern int shmem_insert_page(struct mm_struct *mm, struct inode *inode, + pgoff_t index, struct page *page); + static inline struct page *shmem_read_mapping_page( struct address_space *mapping, pgoff_t index) { diff --git a/mm/shmem.c b/mm/shmem.c index bd8840082c94..0a9a2166e51f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -677,6 +677,101 @@ static void shmem_delete_from_page_cache(struct page *page, void *radswap) BUG_ON(error); } +int shmem_insert_page(struct mm_struct *mm, struct inode *inode, pgoff_t index, + struct page *page) +{ + struct address_space *mapping = inode->i_mapping; + struct shmem_inode_info *info = SHMEM_I(inode); + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); + gfp_t gfp = mapping_gfp_mask(mapping); + int err; + int nr = 1; + struct mem_cgroup *memcg; + pgoff_t hindex = index; + bool on_lru = PageLRU(page); + + if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) + return -EFBIG; + + if (PageTransHuge(page)) + nr = HPAGE_PMD_NR; + else + nr = 1; +retry: + err = 0; + if (!shmem_inode_acct_block(inode, nr)) + err = -ENOSPC; + if (err) { + int retry = 5; + + /* + * Try to reclaim some space by splitting a huge page + * beyond i_size on the filesystem. + */ + while (retry--) { + int ret; + + ret = shmem_unused_huge_shrink(sbinfo, NULL, 1); + if (ret == SHRINK_STOP) + break; + if (ret) + goto retry; + } + goto failed; + } + + if (!on_lru) { + __SetPageLocked(page); + __SetPageSwapBacked(page); + } else { + lock_page(page); + } + + if (PageTransHuge(page)) + hindex = round_down(index, HPAGE_PMD_NR); + else + hindex = index; + + __SetPageReferenced(page); + + err = mem_cgroup_try_charge_delay(page, mm, gfp, &memcg, + PageTransHuge(page)); + if (err) + goto out_unlock; + + err = shmem_add_to_page_cache(page, mapping, hindex, + NULL, gfp & GFP_RECLAIM_MASK); + if (err) { + mem_cgroup_cancel_charge(page, memcg, + PageTransHuge(page)); + goto out_unlock; + } + mem_cgroup_commit_charge(page, memcg, on_lru, + PageTransHuge(page)); + + if (!on_lru) + lru_cache_add_anon(page); + + spin_lock(&info->lock); + info->alloced += compound_nr(page); + inode->i_blocks += BLOCKS_PER_PAGE << compound_order(page); + shmem_recalc_inode(inode); + spin_unlock(&info->lock); + + flush_dcache_page(page); + SetPageUptodate(page); + set_page_dirty(page); + + unlock_page(page); + return 0; + +out_unlock: + unlock_page(page); + shmem_inode_unacct_blocks(inode, nr); +failed: + return err; +} + /* * Remove swap entry from page cache, free the swap and its page cache. */ From patchwork Thu May 7 00:41:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532219 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BFC9A81 for ; Thu, 7 May 2020 00:46:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 71F802082E for ; Thu, 7 May 2020 00:46:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="h1LKJ3LU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 71F802082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 90250900026; Wed, 6 May 2020 20:46:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8D9F4900023; Wed, 6 May 2020 20:46:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C9A5900026; Wed, 6 May 2020 20:46:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id 5FAD7900023 for ; Wed, 6 May 2020 20:46:34 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 18B96B785 for ; Thu, 7 May 2020 00:46:34 +0000 (UTC) X-FDA: 76788082308.02.shoes93_3940a6b7e995c X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30029:30054:30062:30064:30070,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: shoes93_3940a6b7e995c X-Filterd-Recvd-Size: 21881 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:46:33 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bnDF064643; Thu, 7 May 2020 00:45:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=/lxtQPQEJy/i6bq6Gkbeub+HXra5DA0A1T4/mDiNDwQ=; b=h1LKJ3LUEY0GzjoeFRGdnXzI83/MGmn9S0Qlo9pDOKJbAgdU1TICVVVqQQGl+VJRUl1w of02eBMZ2tKwoG8Ry08iUmg8/4gNoE4ZyuapGq5O8VT5NM77O+vKgJpWScnEq4BF9vkR ITLGKCFl31k6bHl8bSrWFIgNdNW61TNy/G365kbX97hZbfxOba/XZ61oPLN0m01xiPV7 IOhKwwiybcewZqDm0LzTIm+OEkldwuMHUva5coCmkRCik0vjvuPY8BPwVDt9V7csYfFh Y1GJFiVrcFgssrSY1STowI8j7TIQVQ890SMwZbv9JMRqF5Ci74Jsh8QeOPE8DEauiQQc 6Q== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 30s09rdfg4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:45:59 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470asAl170896; Thu, 7 May 2020 00:43:59 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 30us7p2ne6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:43:59 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470ht0u025953; Thu, 7 May 2020 00:43:56 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:55 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 23/43] mm: shmem: enable saving to PKRAM Date: Wed, 6 May 2020 17:41:49 -0700 Message-Id: <1588812129-8596-24-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch illustrates how the PKRAM API can be used for preserving tmpfs. Two options are added to tmpfs: The 'pkram=' option specifies the PKRAM node to load/save the filesystem tree from/to. The 'preserve' option initiates preservation of a read-only filesystem tree. If the 'pkram=' options is passed on mount, shmem will look for the corresponding PKRAM node and load the FS tree from it. If the 'pkram=' options was passed on mount and the 'preserve' option is passed on remount and the filesystem is read-only, shmem will save the FS tree to the PKRAM node. A typical usage scenario looks like: # mount -t tmpfs -o pkram=mytmpfs none /mnt # echo something > /mnt/smth # mount -o remount ro,preserve /mnt # mount -t tmpfs -o pkram=mytmpfs none /mnt # cat /mnt/smth Each FS tree is saved into a PKRAM node, and each file is saved into a PKRAM object. A byte stream written to the object is used for saving file metadata (name, permissions, etc) while the page stream written to the object accommodates file content pages and their offsets. This implementation serves as a demonstration and therefore is simplified: it supports only regular files in the root directory without multiple hard links, and it does not save swapped out files and aborts if any are found. However, it can be elaborated to fully support tmpfs. Originally-by: Vladimir Davydov Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 1 + include/linux/shmem_fs.h | 24 +++ mm/Makefile | 2 +- mm/shmem.c | 66 ++++++++ mm/shmem_pkram.c | 381 +++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 473 insertions(+), 1 deletion(-) create mode 100644 mm/shmem_pkram.c diff --git a/include/linux/pkram.h b/include/linux/pkram.h index 1cd518843d7a..b47b3aef16e3 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -17,6 +17,7 @@ struct pkram_stream { unsigned int entry_idx; /* next entry in link */ unsigned long next_index; + struct address_space *mapping; /* byte data */ struct page *data_page; diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 688b92cd4ec7..f2ce9937a8f2 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -26,6 +26,11 @@ struct shmem_inode_info { struct inode vfs_inode; }; +#define SHMEM_PKRAM_NAME_MAX 128 +struct shmem_pkram_info { + char name[SHMEM_PKRAM_NAME_MAX]; +}; + struct shmem_sb_info { unsigned long max_blocks; /* How many blocks are allowed */ struct percpu_counter used_blocks; /* How many are allocated */ @@ -40,6 +45,8 @@ struct shmem_sb_info { spinlock_t shrinklist_lock; /* Protects shrinklist */ struct list_head shrinklist; /* List of shinkable inodes */ unsigned long shrinklist_len; /* Length of shrinklist */ + struct shmem_pkram_info *pkram; + bool preserve; /* PKRAM-enabled data is preserved */ }; static inline struct shmem_inode_info *SHMEM_I(struct inode *inode) @@ -99,6 +106,23 @@ extern int shmem_getpage(struct inode *inode, pgoff_t index, extern int shmem_insert_page(struct mm_struct *mm, struct inode *inode, pgoff_t index, struct page *page); +#ifdef CONFIG_PKRAM +extern int shmem_parse_pkram(const char *str, struct shmem_pkram_info **pkram); +extern void shmem_show_pkram(struct seq_file *seq, struct shmem_pkram_info *pkram, + bool preserve); +extern int shmem_save_pkram(struct super_block *sb); +extern void shmem_load_pkram(struct super_block *sb); +extern int shmem_release_pkram(struct super_block *sb); +#else +static inline int shmem_parse_pkram(const char *str, + struct shmem_pkram_info **pkram) { return 1; } +static inline void shmem_show_pkram(struct seq_file *seq, + struct shmem_pkram_info *pkram, bool preserve) { } +static inline int shmem_save_pkram(struct super_block *sb) { return 0; } +static inline void shmem_load_pkram(struct super_block *sb) { } +static inline int shmem_release_pkram(struct super_block *sb) { return 0; } +#endif + static inline struct page *shmem_read_mapping_page( struct address_space *mapping, pgoff_t index) { diff --git a/mm/Makefile b/mm/Makefile index c4ad1c56e237..5c07ecaa5a38 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -112,4 +112,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o -obj-$(CONFIG_PKRAM) += pkram.o pkram_pagetable.o +obj-$(CONFIG_PKRAM) += pkram.o pkram_pagetable.o shmem_pkram.o diff --git a/mm/shmem.c b/mm/shmem.c index 0a9a2166e51f..9c28ef657cd1 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -112,14 +112,18 @@ struct shmem_options { unsigned long long blocks; unsigned long long inodes; struct mempolicy *mpol; + struct shmem_pkram_info *pkram; kuid_t uid; kgid_t gid; umode_t mode; int huge; + bool preserve; int seen; #define SHMEM_SEEN_BLOCKS 1 #define SHMEM_SEEN_INODES 2 #define SHMEM_SEEN_HUGE 4 +#define SHMEM_SEEN_PKRAM 8 +#define SHMEM_SEEN_PRESERVE 16 }; #ifdef CONFIG_TMPFS @@ -3467,6 +3471,8 @@ enum shmem_param { Opt_mpol, Opt_nr_blocks, Opt_nr_inodes, + Opt_pkram, + Opt_preserve, Opt_size, Opt_uid, }; @@ -3486,6 +3492,8 @@ const struct fs_parameter_spec shmem_fs_parameters[] = { fsparam_string("mpol", Opt_mpol), fsparam_string("nr_blocks", Opt_nr_blocks), fsparam_string("nr_inodes", Opt_nr_inodes), + fsparam_string("pkram", Opt_pkram), + fsparam_flag_no("preserve", Opt_preserve), fsparam_string("size", Opt_size), fsparam_u32 ("uid", Opt_uid), {} @@ -3557,6 +3565,22 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) if (mpol_parse_str(param->string, &ctx->mpol)) goto bad_value; break; + } else + goto unsupported_parameter; + case Opt_pkram: + if (IS_ENABLED(CONFIG_PKRAM)) { + kfree(ctx->pkram); + if (shmem_parse_pkram(param->string, &ctx->pkram)) + goto bad_value; + ctx->seen |= SHMEM_SEEN_PKRAM; + break; + } + goto unsupported_parameter; + case Opt_preserve: + if (IS_ENABLED(CONFIG_PKRAM)) { + ctx->preserve = result.boolean; + ctx->seen |= SHMEM_SEEN_PRESERVE; + break; } goto unsupported_parameter; } @@ -3650,6 +3674,42 @@ static int shmem_reconfigure(struct fs_context *fc) } } + if (ctx->seen & SHMEM_SEEN_PRESERVE) { + if (!sbinfo->pkram && !(ctx->seen & SHMEM_SEEN_PKRAM)) { + err = "Cannot set preserve/nopreserve. Not enabled for PKRAM"; + goto out; + } + if (ctx->preserve && !(fc->sb_flags & SB_RDONLY)) { + err = "Cannot preserve. Filesystem must be read-only"; + goto out; + } + } + + if (ctx->pkram) { + kfree(sbinfo->pkram); + sbinfo->pkram = ctx->pkram; + } + + if (ctx->seen & SHMEM_SEEN_PRESERVE) { + int error; + + if (!sbinfo->preserve && ctx->preserve) { + error = shmem_save_pkram(fc->root->d_sb); + if (error) { + err = "Failed to preserve"; + goto out; + } + sbinfo->preserve = true; + } else if (sbinfo->preserve && !ctx->preserve) { + error = shmem_release_pkram(fc->root->d_sb); + if (error) { + err = "Failed to unpreserve"; + goto out; + } + sbinfo->preserve = false; + } + } + if (ctx->seen & SHMEM_SEEN_HUGE) sbinfo->huge = ctx->huge; if (ctx->seen & SHMEM_SEEN_BLOCKS) @@ -3667,6 +3727,7 @@ static int shmem_reconfigure(struct fs_context *fc) sbinfo->mpol = ctx->mpol; /* transfers initial ref */ ctx->mpol = NULL; } + spin_unlock(&sbinfo->stat_lock); return 0; out: @@ -3697,6 +3758,7 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root) seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge)); #endif shmem_show_mpol(seq, sbinfo->mpol); + shmem_show_pkram(seq, sbinfo->pkram, sbinfo->preserve); return 0; } @@ -3708,6 +3770,7 @@ static void shmem_put_super(struct super_block *sb) percpu_counter_destroy(&sbinfo->used_blocks); mpol_put(sbinfo->mpol); + kfree(sbinfo->pkram); kfree(sbinfo); sb->s_fs_info = NULL; } @@ -3754,6 +3817,8 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) sbinfo->huge = ctx->huge; sbinfo->mpol = ctx->mpol; ctx->mpol = NULL; + sbinfo->pkram = ctx->pkram; + ctx->pkram = NULL; spin_lock_init(&sbinfo->stat_lock); if (percpu_counter_init(&sbinfo->used_blocks, 0, GFP_KERNEL)) @@ -3783,6 +3848,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_root = d_make_root(inode); if (!sb->s_root) goto failed; + shmem_load_pkram(sb); return 0; failed: diff --git a/mm/shmem_pkram.c b/mm/shmem_pkram.c new file mode 100644 index 000000000000..3fa9cfbe0003 --- /dev/null +++ b/mm/shmem_pkram.c @@ -0,0 +1,381 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct file_header { + __u32 mode; + kuid_t uid; + kgid_t gid; + __u32 namelen; + __u64 size; + __u64 atime; + __u64 mtime; + __u64 ctime; +}; + +int shmem_parse_pkram(const char *str, struct shmem_pkram_info **pkram) +{ + struct shmem_pkram_info *new; + size_t len; + + len = strlen(str); + if (!len || len >= SHMEM_PKRAM_NAME_MAX) + return 1; + new = kzalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return 1; + strcpy(new->name, str); + *pkram = new; + return 0; +} + +void shmem_show_pkram(struct seq_file *seq, struct shmem_pkram_info *pkram, bool preserve) +{ + if (pkram) { + seq_printf(seq, ",pkram=%s", pkram->name); + seq_printf(seq, ",%s", preserve ? "preserve" : "nopreserve"); + } +} + +static int shmem_pkram_name(char *buf, size_t bufsize, + struct shmem_sb_info *sbinfo) +{ + if (snprintf(buf, bufsize, "shmem-%s", sbinfo->pkram->name) >= bufsize) + return -ENAMETOOLONG; + return 0; +} + +static int save_page(struct page *page, struct pkram_stream *ps) +{ + int err = 0; + + if (page) + err = pkram_save_page(ps, page, 0); + + return err; +} + +static int save_file_content(struct pkram_stream *ps) +{ + struct pagevec pvec; + pgoff_t indices[PAGEVEC_SIZE]; + pgoff_t index = 0; + struct page *page; + int i, err = 0; + + pagevec_init(&pvec); + for ( ; ; ) { + pvec.nr = find_get_entries(ps->mapping, index, PAGEVEC_SIZE, + pvec.pages, indices); + if (!pvec.nr) + break; + for (i = 0; i < pagevec_count(&pvec); i++) { + page = pvec.pages[i]; + index = indices[i]; + + if (WARN_ON_ONCE(xa_is_value(page))) { + err = -EINVAL; + break; + } + + lock_page(page); + + if (PageTransTail(page)) { + WARN_ONCE(1, "PageTransTail returned true\n"); + unlock_page(page); + continue; + } + + BUG_ON(page->mapping != ps->mapping); + err = save_page(page, ps); + + i += compound_nr(page) - 1; + index += compound_nr(page) - 1; + + unlock_page(page); + if (err) + break; + } + pagevec_remove_exceptionals(&pvec); + pagevec_release(&pvec); + if (err) + break; + cond_resched(); + index++; + } + + return err; +} + +static int save_file(struct dentry *dentry, struct pkram_stream *ps) +{ + struct inode *inode = dentry->d_inode; + umode_t mode = inode->i_mode; + struct file_header hdr; + ssize_t ret; + + if (WARN_ON_ONCE(!S_ISREG(mode))) + return -EINVAL; + if (WARN_ON_ONCE(inode->i_nlink > 1)) + return -EINVAL; + + hdr.mode = mode; + hdr.uid = inode->i_uid; + hdr.gid = inode->i_gid; + hdr.namelen = dentry->d_name.len; + hdr.size = i_size_read(inode); + hdr.atime = timespec64_to_ns(&inode->i_atime); + hdr.mtime = timespec64_to_ns(&inode->i_mtime); + hdr.ctime = timespec64_to_ns(&inode->i_ctime); + + ret = pkram_write(ps, &hdr, sizeof(hdr)); + if (ret < 0) + return ret; + ret = pkram_write(ps, dentry->d_name.name, dentry->d_name.len); + if (ret < 0) + return ret; + + ps->mapping = inode->i_mapping; + return save_file_content(ps); +} + +static int save_tree(struct super_block *sb, struct pkram_stream *ps) +{ + struct dentry *dentry, *root = sb->s_root; + int err = 0; + + inode_lock(d_inode(root)); + spin_lock(&root->d_lock); + list_for_each_entry(dentry, &root->d_subdirs, d_child) { + if (d_unhashed(dentry) || !dentry->d_inode) + continue; + dget(dentry); + spin_unlock(&root->d_lock); + + err = pkram_prepare_save_obj(ps); + if (!err) + err = save_file(dentry, ps); + if (!err) + pkram_finish_save_obj(ps); + spin_lock(&root->d_lock); + dput(dentry); + if (err) + break; + } + spin_unlock(&root->d_lock); + inode_unlock(d_inode(root)); + + return err; +} + +int shmem_save_pkram(struct super_block *sb) +{ + struct shmem_sb_info *sbinfo = sb->s_fs_info; + struct pkram_stream psobj; + char *buf; + int err = -ENOMEM; + + if (!sbinfo || !sbinfo->pkram || is_kdump_kernel()) + return 0; + + buf = (void *)__get_free_page(GFP_KERNEL); + if (!buf) + goto out; + + err = shmem_pkram_name(buf, PAGE_SIZE, sbinfo); + if (!err) + err = pkram_prepare_save(&psobj, buf, GFP_KERNEL); + if (err) + goto out_free_buf; + + err = save_tree(sb, &psobj); + if (err) + goto out_discard_save; + + pkram_finish_save(&psobj); + goto out_free_buf; + +out_discard_save: + pkram_discard_save(&psobj); +out_free_buf: + free_page((unsigned long)buf); +out: + if (err) + pr_err("SHMEM: PKRAM save failed: %d\n", err); + + return err; +} + +static int load_file_content(struct pkram_stream *ps) +{ + unsigned long index; + struct page *page; + int err = 0; + + do { + page = pkram_load_page(ps, &index, NULL); + if (!page) + break; + + err = shmem_insert_page(current->mm, ps->mapping->host, index, page); + put_page(page); + } while (!err); + + return err; +} + +static int load_file(struct dentry *parent, struct pkram_stream *ps, + char *buf, size_t bufsize) +{ + struct dentry *dentry; + struct inode *inode; + struct file_header hdr; + size_t ret; + umode_t mode; + int namelen; + int err = 0; + + ret = pkram_read(ps, &hdr, sizeof(hdr)); + if (ret != sizeof(hdr)) + return -EINVAL; + + mode = hdr.mode; + namelen = hdr.namelen; + if (!S_ISREG(mode) || namelen > bufsize) + return -EINVAL; + if (pkram_read(ps, buf, namelen) != namelen) + return -EINVAL; + + inode_lock_nested(d_inode(parent), I_MUTEX_PARENT); + + dentry = lookup_one_len(buf, parent, namelen); + if (IS_ERR(dentry)) { + err = PTR_ERR(dentry); + goto out_unlock; + } + + err = vfs_create(parent->d_inode, dentry, mode, NULL); + dput(dentry); /* on success shmem pinned it */ + if (err) + goto out_unlock; + + inode = dentry->d_inode; + inode->i_mode = mode; + inode->i_uid = hdr.uid; + inode->i_gid = hdr.gid; + inode->i_atime = ns_to_timespec64(hdr.atime); + inode->i_mtime = ns_to_timespec64(hdr.mtime); + inode->i_ctime = ns_to_timespec64(hdr.ctime); + i_size_write(inode, hdr.size); + + ps->mapping = inode->i_mapping; + err = load_file_content(ps); +out_unlock: + inode_unlock(d_inode(parent)); + + return err; +} + +static int load_tree(struct super_block *sb, struct pkram_stream *ps, + char *buf, size_t bufsize) +{ + int err; + + do { + err = pkram_prepare_load_obj(ps); + if (err) { + if (err == -ENODATA) + err = 0; + break; + } + err = load_file(sb->s_root, ps, buf, PAGE_SIZE); + pkram_finish_load_obj(ps); + } while (!err); + + return err; +} + +void shmem_load_pkram(struct super_block *sb) +{ + struct shmem_sb_info *sbinfo = sb->s_fs_info; + struct pkram_stream psobj; + char *buf; + int err = -ENOMEM; + + if (!sbinfo->pkram) + return; + + buf = (void *)__get_free_page(GFP_KERNEL); + if (!buf) + goto out; + + err = shmem_pkram_name(buf, PAGE_SIZE, sbinfo); + if (!err) + err = pkram_prepare_load(&psobj, buf); + if (err) { + if (err == -ENOENT) + err = 0; + goto out_free_buf; + } + + err = load_tree(sb, &psobj, buf, PAGE_SIZE); + + pkram_finish_load(&psobj); +out_free_buf: + free_page((unsigned long)buf); +out: + if (err) + pr_err("SHMEM: PKRAM load failed: %d\n", err); +} + +int shmem_release_pkram(struct super_block *sb) +{ + struct shmem_sb_info *sbinfo = sb->s_fs_info; + struct pkram_stream psobj; + char *buf; + int err = -ENOMEM; + + if (!sbinfo->pkram) + return 0; + + buf = (void *)__get_free_page(GFP_KERNEL); + if (!buf) + goto out; + + err = shmem_pkram_name(buf, PAGE_SIZE, sbinfo); + if (!err) + err = pkram_prepare_load(&psobj, buf); + if (err) { + if (err == -ENOENT) + err = 0; + goto out_free_buf; + } + + pkram_finish_load(&psobj); +out_free_buf: + free_page((unsigned long)buf); +out: + if (err) + pr_err("SHMEM: PKRAM load failed: %d\n", err); + + return err; +} From patchwork Thu May 7 00:41:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532127 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 97FC9139A for ; Thu, 7 May 2020 00:44:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1C0BE2087E for ; Thu, 7 May 2020 00:44:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="JnzSCAa6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1C0BE2087E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 421DC900004; Wed, 6 May 2020 20:44:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3D1BD900003; Wed, 6 May 2020 20:44:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C08E900004; Wed, 6 May 2020 20:44:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0242.hostedemail.com [216.40.44.242]) by kanga.kvack.org (Postfix) with ESMTP id 12016900003 for ; Wed, 6 May 2020 20:44:34 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C3638D234 for ; Thu, 7 May 2020 00:44:33 +0000 (UTC) X-FDA: 76788077226.13.hole30_27c456530621d X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30075,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: hole30_27c456530621d X-Filterd-Recvd-Size: 5153 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:44:33 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470biGB064619; Thu, 7 May 2020 00:44:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=kMb7i/pTDMjtomD1k67RpNqpRg9W9RIhuACMeBtmvBQ=; b=JnzSCAa6143Tf6gSIt11bNTyCx7GpPAUJU1bKvLFVgQ6dYKcmFKZvZ6ZeA4CVpSDCvvq tr78qByjt966WSBA8nLcVlBd1GMTKu9kjplV89Ce3cIk3N4+mL/8UKysZ9rzM+7N4PPj WQzQRLDOhM7AwvsS4EtdSTHtCwsJV1xRkbKYZisjrMNMEyb0tTWkd7dQB59mGG3SBf4X GatUzS8P2JEicbPtWOM81A9LSurJhfYjs5FR4rYpptqth4aw7X/v+TqXvQZLAXjJBNkM muZ2xY1Mzuo9R5fCwahtqjbld+smad5eQ57huczIuByfUgE+BBthOdYLu5kRfDNoeJCq 3g== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 30s09rdfam-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:01 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bmll131709; Thu, 7 May 2020 00:44:01 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 30t1r95b0f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:01 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470hxkI029931; Thu, 7 May 2020 00:43:59 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:43:58 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 24/43] mm: shmem: prevent swapping of PKRAM-enabled tmpfs pages Date: Wed, 6 May 2020 17:41:50 -0700 Message-Id: <1588812129-8596-25-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=0 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Workaround the limitation that shmem pages must be in memory in order to be preserved by preventing them from being swapped out in the first place. Do this by marking shmem pages associated with a PKRAM node as unevictable. Signed-off-by: Anthony Yznaga --- mm/shmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index 9c28ef657cd1..13475073fb52 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2360,6 +2360,8 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode INIT_LIST_HEAD(&info->swaplist); simple_xattrs_init(&info->xattrs); cache_no_acl(inode); + if (sbinfo->pkram) + mapping_set_unevictable(inode->i_mapping); switch (mode & S_IFMT) { default: From patchwork Thu May 7 00:41:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532131 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BA85A139F for ; Thu, 7 May 2020 00:44:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8785E20736 for ; Thu, 7 May 2020 00:44:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="WcY7ARry" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8785E20736 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A63A0900005; Wed, 6 May 2020 20:44:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A1401900003; Wed, 6 May 2020 20:44:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 929C9900005; Wed, 6 May 2020 20:44:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id 7750B900003 for ; Wed, 6 May 2020 20:44:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 41AFC4DA8 for ; Thu, 7 May 2020 00:44:39 +0000 (UTC) X-FDA: 76788077478.16.print08_2895e0d147802 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30036:30054:30064,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:22,LUA_SUMMARY:none X-HE-Tag: print08_2895e0d147802 X-Filterd-Recvd-Size: 5971 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:44:38 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470c5BP093096; Thu, 7 May 2020 00:44:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=ALhXxW63LGZoM6V1NGaT+SnLTVsGEByBjEn2ykRFtnU=; b=WcY7ARrycjOtSErtWFAU7jEvJCDzYLSMBBQdEfiSz+VYmm+P1vsiQipFRaHJUKefDQGP 6SpWpJBVSg9v6poZtOhM+2hrCsnxl4pzn6zYo4RjyJ4+qFdAymPEnFqh+BblrFOCjNA0 XyKWvLvG7vKezc7rF7JYRpliViNaHZR8KW1i1GGsXAXequif8VvZ+tkVIKDcugDuYbBH YuryJoR0X2xYHmHfLo+CWApOuQEpt2LzEmBRNoO7V72Rr1m6OLrd+fh37MBvbvhCdP1e kocUdOrK1sim2WztAHrtjy2lA9VQ/tMfFjN2zEdc7hlmRCU+CQRP/xvQDIkwglOkuSz6 pQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 30s1gnd8pb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:05 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470alNS170704; Thu, 7 May 2020 00:44:05 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 30us7p2nm1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:05 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470i20o025972; Thu, 7 May 2020 00:44:02 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:02 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 25/43] mm: shmem: specify the mm to use when inserting pages Date: Wed, 6 May 2020 17:41:51 -0700 Message-Id: <1588812129-8596-26-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Explicitly specify the mm to pass to shmem_insert_page() when the pkram_stream is initialized rather than use the mm of the current thread. This will allow for multiple kernel threads to target the same mm when inserting pages in parallel. Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 1 + mm/pkram.c | 1 + mm/shmem_pkram.c | 2 +- 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index b47b3aef16e3..cbb79d2803c0 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -18,6 +18,7 @@ struct pkram_stream { unsigned long next_index; struct address_space *mapping; + struct mm_struct *mm; /* byte data */ struct page *data_page; diff --git a/mm/pkram.c b/mm/pkram.c index 4d4d836fea53..a5e539052af6 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -565,6 +565,7 @@ static void pkram_stream_init(struct pkram_stream *ps, memset(ps, 0, sizeof(*ps)); ps->gfp_mask = gfp_mask; ps->node = node; + ps->mm = current->mm; } static void pkram_stream_init_obj(struct pkram_stream *ps, struct pkram_obj *obj) diff --git a/mm/shmem_pkram.c b/mm/shmem_pkram.c index 3fa9cfbe0003..c97d64393822 100644 --- a/mm/shmem_pkram.c +++ b/mm/shmem_pkram.c @@ -236,7 +236,7 @@ static int load_file_content(struct pkram_stream *ps) if (!page) break; - err = shmem_insert_page(current->mm, ps->mapping->host, index, page); + err = shmem_insert_page(ps->mm, ps->mapping->host, index, page); put_page(page); } while (!err); From patchwork Thu May 7 00:41:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532135 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AD360139A for ; Thu, 7 May 2020 00:44:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 712AD2082E for ; Thu, 7 May 2020 00:44:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="JpV7XO6R" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 712AD2082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 86A0E900006; Wed, 6 May 2020 20:44:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7550D900003; Wed, 6 May 2020 20:44:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E472900006; Wed, 6 May 2020 20:44:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0092.hostedemail.com [216.40.44.92]) by kanga.kvack.org (Postfix) with ESMTP id 30026900003 for ; Wed, 6 May 2020 20:44:40 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id DEB1A181AEF30 for ; Thu, 7 May 2020 00:44:39 +0000 (UTC) X-FDA: 76788077478.14.art24_28a987c0dd719 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: art24_28a987c0dd719 X-Filterd-Recvd-Size: 6180 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:44:39 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bjvQ092893; Thu, 7 May 2020 00:44:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=sXz+5MX/e43ODGutAI7Eoi78pN72CvHNRozhR5OcLiY=; b=JpV7XO6RwQPucJ1rzxFOrsUoGfS6PaZu4Xq59fbTr7D8gnurRyXVepNkiKrUY7ZWdWuE FYeu8glZtcMNsarunOq7+eS23s7Grd371BkO8UvWf6hmDlnYU0BmdCDtAUPCfpzLMU2Z oeHzKLxttnshWsKX5tF9pbl/opBNAylWQ5ags9r7Cd5eMsMDy+lwlbODzH9+jE1pE2e5 OXzO8JyJy8+aoI+pYt1R4fe7cwE3cX9/TpBvxJnvozO0ov0IrK7M8483f84Le4WlP4ot fVHYkurQo2RH3pmEkEgSB5l3vGC/x4CImxxdoGFnkDifY5a4l1ye4OafH3qm6fqBMYMX 0w== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 30s1gnd8pe-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:08 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470aqPn170885; Thu, 7 May 2020 00:44:08 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3020.oracle.com with ESMTP id 30us7p2np8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:08 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470i6Ko029955; Thu, 7 May 2020 00:44:06 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:05 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 26/43] mm: shmem: when inserting, handle pages already charged to a memcg Date: Wed, 6 May 2020 17:41:52 -0700 Message-Id: <1588812129-8596-27-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If shmem_insert_page() is called to insert a page that was preserved using PKRAM on the current boot (i.e. preserved page is restored without an intervening kexec boot), the page will still be charged to a memory cgroup because it is never freed. Don't try to charge it again. Signed-off-by: Anthony Yznaga --- mm/shmem.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 13475073fb52..1f3b43b8fa34 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -693,6 +693,7 @@ int shmem_insert_page(struct mm_struct *mm, struct inode *inode, pgoff_t index, struct mem_cgroup *memcg; pgoff_t hindex = index; bool on_lru = PageLRU(page); + bool has_memcg = page->mem_cgroup ? true : false; if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) return -EFBIG; @@ -738,20 +739,24 @@ int shmem_insert_page(struct mm_struct *mm, struct inode *inode, pgoff_t index, __SetPageReferenced(page); - err = mem_cgroup_try_charge_delay(page, mm, gfp, &memcg, - PageTransHuge(page)); - if (err) - goto out_unlock; + if (!has_memcg) { + err = mem_cgroup_try_charge_delay(page, mm, gfp, &memcg, + PageTransHuge(page)); + if (err) + goto out_unlock; + } err = shmem_add_to_page_cache(page, mapping, hindex, NULL, gfp & GFP_RECLAIM_MASK); if (err) { - mem_cgroup_cancel_charge(page, memcg, - PageTransHuge(page)); + if (!has_memcg) + mem_cgroup_cancel_charge(page, memcg, + PageTransHuge(page)); goto out_unlock; } - mem_cgroup_commit_charge(page, memcg, on_lru, - PageTransHuge(page)); + if (!has_memcg) + mem_cgroup_commit_charge(page, memcg, on_lru, + PageTransHuge(page)); if (!on_lru) lru_cache_add_anon(page); From patchwork Thu May 7 00:41:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532223 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E4207139F for ; Thu, 7 May 2020 00:46:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A0D882082E for ; Thu, 7 May 2020 00:46:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="ZbRXCCwD" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A0D882082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D963C900027; Wed, 6 May 2020 20:46:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D6D76900023; Wed, 6 May 2020 20:46:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C837D900027; Wed, 6 May 2020 20:46:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0201.hostedemail.com [216.40.44.201]) by kanga.kvack.org (Postfix) with ESMTP id AF081900023 for ; Wed, 6 May 2020 20:46:44 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7AB1F8249980 for ; Thu, 7 May 2020 00:46:44 +0000 (UTC) X-FDA: 76788082728.14.body44_3acff1dc37324 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30080,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: body44_3acff1dc37324 X-Filterd-Recvd-Size: 7303 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:46:43 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470c5BT093096; Thu, 7 May 2020 00:46:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=xGbfAvtj4sdRqUcckGpXoks3Z0mla/qk9EywXv5xu4U=; b=ZbRXCCwD3zJvBSBtHjKnv3vKD/s9Qh7+x7uXjU+8pHxhQY/Sl+24VJ/XFTNx/s+Fx2UB Ixyz374M5ijuqSxj59zlH7kueEtpWqdvI9M/gcgjekxtAX75lAYkYnoqRS9TdHJavfKM 1SHz3fPIA+JwP4nxyYOFp9gC4mslfLfsGeMNtghQMnwhqT2K6s6h1fr0G528Yj5+NXLp VIIVwUXjKRB4fgVXtBSeX8n89zutroqtjC8vVH6Csgt1fIHLY8S8bCCuOTCKfChAu5bF GSKHMacUI76DMGWYW30lTPhXj0kBzBh9hsGqqQHGejk6kRr7kTcf0FlJRlkLh9R3do1h lw== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2120.oracle.com with ESMTP id 30s1gnd8uk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:46:11 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bUPP098663; Thu, 7 May 2020 00:44:10 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 30sjnma4qs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:10 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470i9tL020333; Thu, 7 May 2020 00:44:09 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:09 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 27/43] x86/mm/numa: add numa_isolate_memblocks() Date: Wed, 6 May 2020 17:41:53 -0700 Message-Id: <1588812129-8596-28-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Provide a way for a caller external to numa to ensure memblocks in the memblock reserved list do not cross node boundaries and have a node id assigned to them. This will be used by PKRAM to ensure initialization of page structs for preserved pages can be deferred and multithreaded efficiently. Signed-off-by: Anthony Yznaga --- arch/x86/include/asm/numa.h | 4 ++++ arch/x86/mm/numa.c | 32 ++++++++++++++++++++------------ 2 files changed, 24 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h index bbfde3d2662f..f9e05f4eb1c6 100644 --- a/arch/x86/include/asm/numa.h +++ b/arch/x86/include/asm/numa.h @@ -40,6 +40,7 @@ static inline void set_apicid_to_node(int apicid, s16 node) } extern int numa_cpu_node(int cpu); +extern void __init numa_isolate_memblocks(void); #else /* CONFIG_NUMA */ static inline void set_apicid_to_node(int apicid, s16 node) @@ -50,6 +51,9 @@ static inline int numa_cpu_node(int cpu) { return NUMA_NO_NODE; } +static inline void numa_isolate_memblocks(void) +{ +} #endif /* CONFIG_NUMA */ #ifdef CONFIG_X86_32 diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 59ba008504dc..df0065e24ea5 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -475,6 +475,25 @@ static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi) return true; } +void __init numa_isolate_memblocks(void) +{ + int i; + + /* + * Iterate over all memory known to the x86 architecture, + * and use those ranges to set the nid in memblock.reserved. + * This will split up the memblock regions along node + * boundaries and will set the node IDs as well. + */ + for (i = 0; i < numa_meminfo.nr_blks; i++) { + struct numa_memblk *mb = numa_meminfo.blk + i; + int ret; + + ret = memblock_set_node(mb->start, mb->end - mb->start, &memblock.reserved, mb->nid); + WARN_ON_ONCE(ret); + } +} + /* * Mark all currently memblock-reserved physical memory (which covers the * kernel's own memory ranges) as hot-unswappable. @@ -493,19 +512,8 @@ static void __init numa_clear_kernel_node_hotplug(void) * used by the kernel, but those regions are not split up * along node boundaries yet, and don't necessarily have their * node ID set yet either. - * - * So iterate over all memory known to the x86 architecture, - * and use those ranges to set the nid in memblock.reserved. - * This will split up the memblock regions along node - * boundaries and will set the node IDs as well. */ - for (i = 0; i < numa_meminfo.nr_blks; i++) { - struct numa_memblk *mb = numa_meminfo.blk + i; - int ret; - - ret = memblock_set_node(mb->start, mb->end - mb->start, &memblock.reserved, mb->nid); - WARN_ON_ONCE(ret); - } + numa_isolate_memblocks(); /* * Now go over all reserved memblock regions, to construct a From patchwork Thu May 7 00:41:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532139 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A9C5F139F for ; Thu, 7 May 2020 00:44:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6D10620736 for ; Thu, 7 May 2020 00:44:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="F258v4CU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6D10620736 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9032F900007; Wed, 6 May 2020 20:44:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8B406900003; Wed, 6 May 2020 20:44:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CA0A900007; Wed, 6 May 2020 20:44:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0108.hostedemail.com [216.40.44.108]) by kanga.kvack.org (Postfix) with ESMTP id 62BD0900003 for ; Wed, 6 May 2020 20:44:47 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 266B7B785 for ; Thu, 7 May 2020 00:44:47 +0000 (UTC) X-FDA: 76788077814.18.hot95_29bca1b185a22 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30056:30064:30080,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: hot95_29bca1b185a22 X-Filterd-Recvd-Size: 5382 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:44:46 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470c16X064684; Thu, 7 May 2020 00:44:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=o4VeZakxk7q/YukxMsyh8pVT0eR9oJkktza+0HTryhw=; b=F258v4CUKrVK7T6Q6EY2mRI9P9FAoQwDZIKnDCnu7ShoFjdjhIj3PVgoaI8w7b9nGTfD lsGI5U37QGPAW6648U/DlMs/4F0V+w18oLcSdcdJoLSxjlG1IPYnd+w0fwscwvhQxBIE LLv6ILBDZYp2qCb5JeNRfBA2J9mONHUseYdPwtJXgRxzkymZYNWzAXgd8qtTcTe+3p0d FIwoPNTiAAseFe5DM1z52GJKzVVP0qA0zuu7NIzSR+jImjxIUypqBj2xvd5tjd4aW7V5 gGsYVMmFxFiqqvyJBiBh7PupIgVskESG+K/go7bQWgc9UsKPJzDHFrSlO8pNUHee6Equ 0Q== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 30s09rdfbc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:14 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bhYE098923; Thu, 7 May 2020 00:44:14 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 30sjnma4vr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:13 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470iCZR029965; Thu, 7 May 2020 00:44:12 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:12 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 28/43] PKRAM: ensure memblocks with preserved pages init'd for numa Date: Wed, 6 May 2020 17:41:54 -0700 Message-Id: <1588812129-8596-29-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=0 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In order to facilitate fast initialization of page structs for preserved pages, memblocks with preserved pages must not cross numa node boundaries and must have a node id assigned to them. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/mm/pkram.c b/mm/pkram.c index a5e539052af6..97a7dd0a5b7d 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -21,6 +21,7 @@ #include #include +#include #include "internal.h" @@ -242,6 +243,15 @@ void __init pkram_reserve(void) return; } + /* + * Fix up the reserved memblock list to ensure the + * memblock regions are split along node boundaries + * and have a node ID set. This will allow the page + * structs for the preserved pages to be initialized + * more efficiently. + */ + numa_isolate_memblocks(); + done: pr_info("PKRAM: %lu pages reserved\n", pkram_reserved_pages); } From patchwork Thu May 7 00:41:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532229 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AE78815AB for ; Thu, 7 May 2020 00:46:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 652C72082E for ; Thu, 7 May 2020 00:46:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="eDBZ91s7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 652C72082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 97810900028; Wed, 6 May 2020 20:46:52 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 94EF8900023; Wed, 6 May 2020 20:46:52 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 86487900028; Wed, 6 May 2020 20:46:52 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id 6CDE3900023 for ; Wed, 6 May 2020 20:46:52 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 31C52BEE7 for ; Thu, 7 May 2020 00:46:52 +0000 (UTC) X-FDA: 76788083064.17.girl83_3be973cbecd5f X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:0,LUA_SUMMARY:none X-HE-Tag: girl83_3be973cbecd5f X-Filterd-Recvd-Size: 8014 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:46:51 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bff3064527; Thu, 7 May 2020 00:46:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=lgTPbgWuooIASoan6azS78lSy4rZcz4p1fF1/JwE5D8=; b=eDBZ91s7uE20PGEknQieoApNYK/X77uRpkvgWH4Ntib2svHdQgiuVmaJOv/DQs9lsO5n Twu1RFkZRs4PyGgLJQAG5PbOyCndVv6DYU3uXRbClF2JqmHgmO3VXeOGA9djveUI7PRF zWqffMx29K81BhOHkusXw6BhLtvyWnbGl3cqATmwXxDdiiV0w8G+DMJWe2IeW7RRRhSZ t87VaSvZf3/XtFgn3AATvs0EzrZwQa3MGceKqVcDL/6MzWCdRvPsW9EJwqks/OlsamJG GO6ldoFj7hHx9C2S6ZgaXzJ0HpHph4rTLbT/IQjT/xY7kkJOLnQKfwtNX6bhlYU+vAYU hg== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 30s09rdfh3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:46:20 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470ausi170938; Thu, 7 May 2020 00:44:19 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 30us7p2p25-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:19 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470iGkU026090; Thu, 7 May 2020 00:44:16 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:16 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 29/43] memblock: PKRAM: mark memblocks that contain preserved pages Date: Wed, 6 May 2020 17:41:55 -0700 Message-Id: <1588812129-8596-30-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To support deferred initialization of page structs for preserved pages, separate memblocks containing preserved pages by setting a new flag when adding them to the memblock reserved list. Signed-off-by: Anthony Yznaga --- include/linux/memblock.h | 7 +++++++ mm/memblock.c | 8 +++++++- mm/pkram.c | 4 ++-- 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 6bc37a731d27..27ab2b30ae1d 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -37,6 +37,7 @@ enum memblock_flags { MEMBLOCK_HOTPLUG = 0x1, /* hotpluggable region */ MEMBLOCK_MIRROR = 0x2, /* mirrored region */ MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */ + MEMBLOCK_PRESERVED = 0x8, /* preserved pages region */ }; /** @@ -111,6 +112,7 @@ void memblock_allow_resize(void); int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid); int memblock_add(phys_addr_t base, phys_addr_t size); int memblock_remove(phys_addr_t base, phys_addr_t size); +int __memblock_reserve(phys_addr_t base, phys_addr_t size, enum memblock_flags flags); int memblock_free(phys_addr_t base, phys_addr_t size); int memblock_reserve(phys_addr_t base, phys_addr_t size); #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP @@ -215,6 +217,11 @@ static inline bool memblock_is_nomap(struct memblock_region *m) return m->flags & MEMBLOCK_NOMAP; } +static inline bool memblock_is_preserved(struct memblock_region *m) +{ + return m->flags & MEMBLOCK_PRESERVED; +} + #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, unsigned long *end_pfn); diff --git a/mm/memblock.c b/mm/memblock.c index 69ae883b8d21..1a9a2055ed11 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -831,6 +831,12 @@ int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size) return memblock_remove_range(&memblock.reserved, base, size); } +int __init_memblock __memblock_reserve(phys_addr_t base, phys_addr_t size, + enum memblock_flags flags) +{ + return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, flags); +} + int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size) { phys_addr_t end = base + size - 1; @@ -838,7 +844,7 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size) memblock_dbg("%s: [%pa-%pa] %pS\n", __func__, &base, &end, (void *)_RET_IP_); - return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0); + return __memblock_reserve(base, size, 0); } #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP diff --git a/mm/pkram.c b/mm/pkram.c index 97a7dd0a5b7d..b83d31740619 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -170,7 +170,7 @@ static int __init pkram_reserve_page(unsigned long pfn) size = PAGE_SIZE; if (memblock_is_region_reserved(base, size) || - memblock_reserve(base, size) < 0) + __memblock_reserve(base, size, MEMBLOCK_PRESERVED) < 0) err = -EBUSY; if (!err) @@ -1446,7 +1446,7 @@ static void pkram_remove_identity_map(struct page *page) static int __init pkram_reserve_range_cb(struct pkram_pg_state *st, unsigned long base, unsigned long size) { if (memblock_is_region_reserved(base, size) || - memblock_reserve(base, size) < 0) { + __memblock_reserve(base, size, MEMBLOCK_PRESERVED) < 0) { pr_warn("PKRAM: reservations exist in [0x%lx,0x%lx]\n", base, base + size - 1); /* * Set a lower bound so another walk can undo the earlier, From patchwork Thu May 7 00:41:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532143 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5472215AB for ; Thu, 7 May 2020 00:44:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 144652082E for ; Thu, 7 May 2020 00:44:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="C4+OVNvv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 144652082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 38684900015; Wed, 6 May 2020 20:44:54 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 335D7900003; Wed, 6 May 2020 20:44:54 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24BE5900015; Wed, 6 May 2020 20:44:54 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id 0D8E1900003 for ; Wed, 6 May 2020 20:44:54 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C932C180AD811 for ; Thu, 7 May 2020 00:44:53 +0000 (UTC) X-FDA: 76788078066.26.town18_2ab34cb97e811 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30003:30054:30064:30070,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: town18_2ab34cb97e811 X-Filterd-Recvd-Size: 8239 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:44:53 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470cFcS093150; Thu, 7 May 2020 00:44:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=DoQrZzU/UMsJZ2IxoUJb14SQbVfDKKaxGdtOnUPWmyk=; b=C4+OVNvvwb1kwbmns15EmlXY9rzfDHIB9UetOGbMnxHIDD72iydl9DK8TyWHsrV7hoON NlGymJDXMCb7mQYs5tvh0xbxx0IXo4j2n+4Qmtnparvw1yWqiDvgBC87b9arOKz2nX8z +N0U/G7DJFx0rfYkFpXKeyDP8lAXVViydT8IbMhZf/8QECxh91r0DRWrkb+oO+N5S4gI Te2L4bVGCgMM+hYZTvmkRd43auwozy6qVNCxC4L8nmQrOnPXoEh0+PfFPiCSQCpfeZFZ 8/h7m8B9MtugkDRYI9OOxt/s0X5ccXnLzXwGbYy1cbl0Lg0MkIH9L7LJDe2+nrsl662C GQ== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 30s1gnd8q2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:21 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bonI131815; Thu, 7 May 2020 00:44:21 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 30t1r95c43-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:21 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470iJvK020415; Thu, 7 May 2020 00:44:19 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:19 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 30/43] memblock: add for_each_reserved_mem_range() Date: Wed, 6 May 2020 17:41:56 -0700 Message-Id: <1588812129-8596-31-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=2 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To support deferred initialization of page structs for preserved pages, add an iterator of the memblock reserved list that can select or exclude ranges based on memblock flags. Signed-off-by: Anthony Yznaga --- include/linux/memblock.h | 10 ++++++++++ mm/memblock.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 60 insertions(+), 1 deletion(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 27ab2b30ae1d..f348ebb750c9 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -145,6 +145,11 @@ void __next_mem_range_rev(u64 *idx, int nid, enum memblock_flags flags, void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start, phys_addr_t *out_end); +void __next_reserved_mem_range(u64 *idx, enum memblock_flags flags, + enum memblock_flags exclflags, + phys_addr_t *out_start, phys_addr_t *out_end, + int *out_nid); + void __memblock_free_late(phys_addr_t base, phys_addr_t size); /** @@ -202,6 +207,11 @@ void __memblock_free_late(phys_addr_t base, phys_addr_t size); i != (u64)ULLONG_MAX; \ __next_reserved_mem_region(&i, p_start, p_end)) +#define for_each_reserved_mem_range(i, flags, exclflags, p_start, p_end, p_nid)\ + for (i = 0UL, __next_reserved_mem_range(&i, flags, exclflags, p_start, p_end, p_nid); \ + i != (u64)ULLONG_MAX; \ + __next_reserved_mem_range(&i, flags, exclflags, p_start, p_end, p_nid)) + static inline bool memblock_is_hotpluggable(struct memblock_region *m) { return m->flags & MEMBLOCK_HOTPLUG; diff --git a/mm/memblock.c b/mm/memblock.c index 1a9a2055ed11..33597f352dc0 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -987,6 +987,55 @@ void __init_memblock __next_reserved_mem_region(u64 *idx, *idx = ULLONG_MAX; } +/** + * __next_reserved_mem_range - next function for for_each_reserved_range() + * @idx: pointer to u64 loop variable + * @flags: pick blocks based on memory attributes + * @exclflags: exclude blocks based on memory attributes + * @out_start: ptr to phys_addr_t for start address of the range, can be %NULL + * @out_end: ptr to phys_addr_t for end address of the range, can be %NULL + * @out_nid: ptr to int for nid of the range, can be %NULL + * + * Iterate over all reserved memory ranges. + */ +void __init_memblock __next_reserved_mem_range(u64 *idx, + enum memblock_flags flags, + enum memblock_flags exclflags, + phys_addr_t *out_start, + phys_addr_t *out_end, int *out_nid) +{ + struct memblock_type *type = &memblock.reserved; + int _idx = *idx; + + for (; _idx < type->cnt; _idx++) { + struct memblock_region *r = &type->regions[_idx]; + phys_addr_t base = r->base; + phys_addr_t size = r->size; + + /* skip preserved pages */ + if ((exclflags & MEMBLOCK_PRESERVED) && memblock_is_preserved(r)) + continue; + + /* skip non-preserved pages */ + if ((flags & MEMBLOCK_PRESERVED) && !memblock_is_preserved(r)) + continue; + + if (out_start) + *out_start = base; + if (out_end) + *out_end = base + size - 1; + if (out_nid) + *out_nid = r->nid; + + _idx++; + *idx = (u64)_idx; + return; + } + + /* signal end of iteration */ + *idx = ULLONG_MAX; +} + static bool should_skip_region(struct memblock_region *m, int nid, int flags) { int m_nid = memblock_get_region_node(m); @@ -1011,7 +1060,7 @@ static bool should_skip_region(struct memblock_region *m, int nid, int flags) } /** - * __next_mem_range - next function for for_each_free_mem_range() etc. + * __next__mem_range - next function for for_each_free_mem_range() etc. * @idx: pointer to u64 loop variable * @nid: node selector, %NUMA_NO_NODE for all nodes * @flags: pick from blocks based on memory attributes From patchwork Thu May 7 00:41:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532159 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0F1E139F for ; Thu, 7 May 2020 00:45:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7FEA02064A for ; Thu, 7 May 2020 00:45:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="z4jXuicy" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FEA02064A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 47150900018; Wed, 6 May 2020 20:45:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3D573900003; Wed, 6 May 2020 20:45:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29ECC900018; Wed, 6 May 2020 20:45:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0105.hostedemail.com [216.40.44.105]) by kanga.kvack.org (Postfix) with ESMTP id E87CB900003 for ; Wed, 6 May 2020 20:45:11 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A71D31191E for ; Thu, 7 May 2020 00:45:11 +0000 (UTC) X-FDA: 76788078822.29.order04_2d10da1626427 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30034:30054:30056:30064,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: order04_2d10da1626427 X-Filterd-Recvd-Size: 10336 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:09 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470cY9j093219; Thu, 7 May 2020 00:44:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=VWqlosoXjG5R1RqBZE9eEylkgp/X8RtUPIDoZID/DCc=; b=z4jXuicy7aci4NvCa/KN1RcAMWuOtdDvS84SCGxFCkXjDUmdqqoAAetkRp9ypU3IKNXG beQ7/jLaSZpFpgtKD9BsG5beTlYHBthhaxk4ruV/y+DRxQV45ivD5lze5JWb+Z2lvzSL b2y5KiDoF075GZ7LW2TtpxWI+ujPwm+GPq77bGgdQrQkgfBBk/PdNcsFym6N0SAAlOMm a7zwsecz/Xmq2CoD19k1AbX+DaEbYTCRapdsWGuOQlJ++bw5S3yDeAxbOUxpsWMihbtT TsRWffWXM3MjG82INzJj35Q8il8LFhkfmv0OmwlnVJL7p3nQiLWT8+w/oxXnL6CtNtrv Ww== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 30s1gnd8qc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:27 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bm0c131683; Thu, 7 May 2020 00:44:27 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30t1r95cbf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:27 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470iOBH026117; Thu, 7 May 2020 00:44:24 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:23 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 31/43] memblock, mm: defer initialization of preserved pages Date: Wed, 6 May 2020 17:41:57 -0700 Message-Id: <1588812129-8596-32-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=2 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Preserved pages are represented in the memblock reserved list, but page structs for pages in the reserved list are initialized early while boot is single threaded which means that a large number of preserved pages can impact boot time. To mitigate, defer initialization of preserved pages by skipping them when other reserved pages are initialized and initializing them later with a separate kernel thread. Signed-off-by: Anthony Yznaga --- arch/x86/mm/init_64.c | 1 - include/linux/mm.h | 2 +- mm/memblock.c | 10 ++++++++-- mm/page_alloc.c | 52 +++++++++++++++++++++++++++++++++++++++++++-------- 4 files changed, 53 insertions(+), 12 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 72662615977b..ae569ef6bd7d 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1245,7 +1245,6 @@ void __init mem_init(void) after_bootmem = 1; x86_init.hyper.init_after_bootmem(); - pkram_free_pgt(); totalram_pages_add(pkram_reserved_pages); /* * Must be done after boot memory is put on freelist, because here we diff --git a/include/linux/mm.h b/include/linux/mm.h index 5a323422d783..69b9cd08c721 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2297,7 +2297,7 @@ extern void free_highmem_page(struct page *page); extern void adjust_managed_page_count(struct page *page, long count); extern void mem_init_print_info(const char *str); -extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end); +extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end, int nid); /* Free the reserved page into the buddy system, so it gets managed. */ static inline void __free_reserved_page(struct page *page) diff --git a/mm/memblock.c b/mm/memblock.c index 33597f352dc0..5524edbaf691 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -2042,11 +2042,17 @@ static unsigned long __init free_low_memory_core_early(void) unsigned long count = 0; phys_addr_t start, end; u64 i; + enum memblock_flags exclude; memblock_clear_hotplug(0, -1); - for_each_reserved_mem_region(i, &start, &end) - reserve_bootmem_region(start, end); + if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT)) + exclude = MEMBLOCK_PRESERVED; + else + exclude = MEMBLOCK_NONE; + + for_each_reserved_mem_range(i, 0, exclude, &start, &end, NULL) + reserve_bootmem_region(start, end, -1); /* * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 69827d4fa052..afd97b31725e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -68,6 +68,7 @@ #include #include #include +#include #include #include @@ -1408,15 +1409,18 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn, } #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT -static void __meminit init_reserved_page(unsigned long pfn) +static void __meminit init_reserved_page(unsigned long pfn, int nid) { pg_data_t *pgdat; - int nid, zid; + int zid; - if (!early_page_uninitialised(pfn)) - return; + if (nid == -1) { + if (!early_page_uninitialised(pfn)) + return; + + nid = early_pfn_to_nid(pfn); + } - nid = early_pfn_to_nid(pfn); pgdat = NODE_DATA(nid); for (zid = 0; zid < MAX_NR_ZONES; zid++) { @@ -1428,7 +1432,7 @@ static void __meminit init_reserved_page(unsigned long pfn) __init_single_page(pfn_to_page(pfn), pfn, zid, nid); } #else -static inline void init_reserved_page(unsigned long pfn) +static inline void init_reserved_page(unsigned long pfn, int nid) { } #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ @@ -1439,7 +1443,7 @@ static inline void init_reserved_page(unsigned long pfn) * marks the pages PageReserved. The remaining valid pages are later * sent to the buddy page allocator. */ -void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end) +void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end, int nid) { unsigned long start_pfn = PFN_DOWN(start); unsigned long end_pfn = PFN_UP(end); @@ -1448,7 +1452,7 @@ void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end) if (pfn_valid(start_pfn)) { struct page *page = pfn_to_page(start_pfn); - init_reserved_page(start_pfn); + init_reserved_page(start_pfn, nid); /* Avoid false-positive PageTail() */ INIT_LIST_HEAD(&page->lru); @@ -1876,6 +1880,34 @@ static int __init deferred_init_memmap(void *data) return 0; } +#ifdef CONFIG_PKRAM +static int __init deferred_init_preserved(void *dummy) +{ + unsigned long start = jiffies; + unsigned long nr_pages = 0; + phys_addr_t spa, epa; + int nid; + u64 i; + + for_each_reserved_mem_range(i, MEMBLOCK_PRESERVED, 0, &spa, &epa, &nid) { + reserve_bootmem_region(spa, epa, nid); + nr_pages += ((epa - spa) >> PAGE_SHIFT); + } + + pr_info("initialised %lu preserved pages in %ums\n", nr_pages, + jiffies_to_msecs(jiffies - start)); + + /* + * Free the preserved pages pagetable now that page structs are + * initialized. + */ + pkram_free_pgt(); + + pgdat_init_report_one_done(); + return 0; +} +#endif /* CONFIG_PKRAM */ + /* * If this zone has deferred pages, try to grow it by initializing enough * deferred pages to satisfy the allocation specified by order, rounded up to @@ -1985,6 +2017,10 @@ void __init page_alloc_init_late(void) /* There will be num_node_state(N_MEMORY) threads */ atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY)); +#ifdef CONFIG_PKRAM + atomic_inc(&pgdat_init_n_undone); + kthread_run(deferred_init_preserved, NULL, "pgdatainit_preserved"); +#endif for_each_node_state(nid, N_MEMORY) { kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid); } From patchwork Thu May 7 00:41:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532155 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3A7DA81 for ; Thu, 7 May 2020 00:45:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 04C112082E for ; Thu, 7 May 2020 00:45:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="yXBvQi9H" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 04C112082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 951B5900017; Wed, 6 May 2020 20:45:10 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8DAE2900003; Wed, 6 May 2020 20:45:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A655900018; Wed, 6 May 2020 20:45:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0148.hostedemail.com [216.40.44.148]) by kanga.kvack.org (Postfix) with ESMTP id 34985900017 for ; Wed, 6 May 2020 20:45:10 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 03DD4BEE7 for ; Thu, 7 May 2020 00:45:10 +0000 (UTC) X-FDA: 76788078780.12.color54_2d0da5ef9fc5c X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:4423:30029:30054:30064:30070,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: color54_2d0da5ef9fc5c X-Filterd-Recvd-Size: 10046 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:09 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bgk7064570; Thu, 7 May 2020 00:44:35 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=St3JE32B8DDqwlwkgwkk0bfEnH6+lOQGeM29niLEd0A=; b=yXBvQi9HTMMMU1huyex4wrc1y7Z1DfDKRz/ZcxKXikIq7nPKFWfaw5uEmk/o/LH5PrWe MVh3hXvABR/ob5om6dPne2Ky4euGj/96rbDwepjk12StUklMh99Ia5xQXIhli7yquWVQ P39NA/tI15qHc7DzL1B2hPEXLFUo9ElstoXA47WOI1/juGLon4+gubwDga7chJcNu7fR LXSPQmDngLKwSkiIHHO/0M5D9sNFUqEjD+6NhXn026uKfb0WnOgSw+zbKmaOgZW8ulFE uoxdZ+J512LEpmrb5916loU5Ajdub6qACvGh4J+vfg7cgjuov/sgoNmfM7vJib7Qmn7K mg== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 30s09rdfc0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:35 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470aoW0136163; Thu, 7 May 2020 00:44:34 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3030.oracle.com with ESMTP id 30sjdwrsn0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:34 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470iUY9026137; Thu, 7 May 2020 00:44:31 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:30 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 32/43] shmem: PKRAM: preserve shmem files a chunk at a time Date: Wed, 6 May 2020 17:41:58 -0700 Message-Id: <1588812129-8596-33-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To prepare for multithreading the work done to a preserve a file, divide the work into subranges of the total index range of the file. The chunk size is a rather arbitrary 256k indices. A new API call, pkram_prepare_save_chunk(), is added. It is called after calling pkram_prepare_save_obj(), and it initializes pkram_stream with the index range of the next available range of pages to save. find_get_pages_range() can then be used to get the pages in the range. When no more index ranges are available, pkram_prepare_save_chunk() returns -ENODATA. Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 6 +++++ mm/pkram.c | 26 +++++++++++++++++++++ mm/shmem_pkram.c | 63 +++++++++++++++++++++++++++++++++++---------------- 3 files changed, 75 insertions(+), 20 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index cbb79d2803c0..e71ccb91d6a6 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -20,6 +20,11 @@ struct pkram_stream { struct address_space *mapping; struct mm_struct *mm; + unsigned long start_idx; /* first index in range to save */ + unsigned long end_idx; /* last index in range to save */ + unsigned long max_idx; /* maximum index to save */ + atomic64_t *next_idx; /* first index of next range to save */ + /* byte data */ struct page *data_page; unsigned int data_offset; @@ -46,6 +51,7 @@ void pkram_free_pgt_walk_pgd(pgd_t *pgd); int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gfp_mask); int pkram_prepare_save_obj(struct pkram_stream *ps); +int pkram_prepare_save_chunk(struct pkram_stream *ps); void pkram_finish_save(struct pkram_stream *ps); void pkram_finish_save_obj(struct pkram_stream *ps); void pkram_discard_save(struct pkram_stream *ps); diff --git a/mm/pkram.c b/mm/pkram.c index b83d31740619..5f4e4d12865f 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -638,6 +638,25 @@ int pkram_prepare_save(struct pkram_stream *ps, const char *name, gfp_t gfp_mask return 0; } +unsigned long max_pages_per_chunk = 512 * 512; + +/* + * Initialize the stream @ps for the next index range to save. + * + * Returns 0 on success, -ENODATA if no index range is available + * + */ +int pkram_prepare_save_chunk(struct pkram_stream *ps) +{ + ps->start_idx = atomic64_fetch_add(max_pages_per_chunk, ps->next_idx); + if (ps->start_idx >= ps->max_idx) + return -ENODATA; + + ps->end_idx = ps->start_idx + max_pages_per_chunk - 1; + + return 0; +} + /** * Create a preserved memory object and initialize stream @ps for saving data * to it. @@ -667,6 +686,11 @@ int pkram_prepare_save_obj(struct pkram_stream *ps) obj->obj_pfn = node->obj_pfn; node->obj_pfn = page_to_pfn(page); + ps->next_idx = kmalloc(sizeof(atomic64_t), GFP_KERNEL); + if (!ps->next_idx) + return -ENOMEM; + atomic64_set(ps->next_idx, 0); + pkram_stream_init_obj(ps, obj); return 0; } @@ -679,6 +703,8 @@ void pkram_finish_save_obj(struct pkram_stream *ps) struct pkram_node *node = ps->node; BUG_ON((node->flags & PKRAM_ACCMODE_MASK) != PKRAM_SAVE); + + kfree(ps->next_idx); } /** diff --git a/mm/shmem_pkram.c b/mm/shmem_pkram.c index c97d64393822..2f4d0bdf3e05 100644 --- a/mm/shmem_pkram.c +++ b/mm/shmem_pkram.c @@ -74,58 +74,81 @@ static int save_page(struct page *page, struct pkram_stream *ps) return err; } -static int save_file_content(struct pkram_stream *ps) +static int save_file_content_range(struct address_space *mapping, + struct pkram_stream *ps) { + unsigned long index, end; struct pagevec pvec; - pgoff_t indices[PAGEVEC_SIZE]; - pgoff_t index = 0; struct page *page; - int i, err = 0; + int err = 0; + int i; + + index = ps->start_idx; + end = ps->end_idx; pagevec_init(&pvec); for ( ; ; ) { - pvec.nr = find_get_entries(ps->mapping, index, PAGEVEC_SIZE, - pvec.pages, indices); + pvec.nr = find_get_pages_range(mapping, &index, end, + PAGEVEC_SIZE, pvec.pages); if (!pvec.nr) break; - for (i = 0; i < pagevec_count(&pvec); i++) { + for (i = 0; i < pagevec_count(&pvec); ) { page = pvec.pages[i]; - index = indices[i]; - - if (WARN_ON_ONCE(xa_is_value(page))) { - err = -EINVAL; - break; - } - lock_page(page); if (PageTransTail(page)) { WARN_ONCE(1, "PageTransTail returned true\n"); unlock_page(page); + i++; continue; } - BUG_ON(page->mapping != ps->mapping); + BUG_ON(page->mapping != mapping); err = save_page(page, ps); - i += compound_nr(page) - 1; - index += compound_nr(page) - 1; + if (PageCompound(page)) { + index = page->index + compound_nr(page); + i += compound_nr(page); + } else { + i++; + } unlock_page(page); if (err) break; } - pagevec_remove_exceptionals(&pvec); pagevec_release(&pvec); - if (err) + if (err || (index > end)) break; cond_resched(); - index++; } return err; } +static int do_save_file_content(struct pkram_stream *ps) +{ + int ret; + + do { + ret = pkram_prepare_save_chunk(ps); + if (!ret) + ret = save_file_content_range(ps->mapping, ps); + } while (!ret); + + if (ret == -ENODATA) + ret = 0; + + return ret; +} + +static int save_file_content(struct pkram_stream *ps) +{ + ps->max_idx = DIV_ROUND_UP(i_size_read(ps->mapping->host), PAGE_SIZE); + + return do_save_file_content(ps); +} + static int save_file(struct dentry *dentry, struct pkram_stream *ps) { struct inode *inode = dentry->d_inode; From patchwork Thu May 7 00:41:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532151 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 62AC881 for ; Thu, 7 May 2020 00:45:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2F6E521973 for ; Thu, 7 May 2020 00:45:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="TW6ZzReP" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2F6E521973 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5A913900019; Wed, 6 May 2020 20:45:10 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 53190900003; Wed, 6 May 2020 20:45:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4205D900018; Wed, 6 May 2020 20:45:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id 2AB58900003 for ; Wed, 6 May 2020 20:45:10 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DE11CAF73 for ; Thu, 7 May 2020 00:45:09 +0000 (UTC) X-FDA: 76788078738.09.desk68_2d08cfee8d406 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: desk68_2d08cfee8d406 X-Filterd-Recvd-Size: 5834 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:09 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470c16Z064684; Thu, 7 May 2020 00:44:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=9f3UmDwL/aqxnXqEfybd2gNvyfWYGqkBvYGTXbV7Xmo=; b=TW6ZzRePDXfEU0ecjZeUmFlqbW9LoXtCpaOl0HlyOXm1Me6fsnBFHZrpCPNhhq7ttnTD t0TBEzgUoxr/3lONrETFdIlI09RlNdX6CuSHn+9A4e8QgDsUeJm5++2u9tADLBJOux+l fgkHDqoJN12o0sMaicYlau4KG5phGGI5rWDkH8NacH8Y8SoN93PvuLYdTLSasJpFpBPb H9tdopuB1vR10FMfJJfkGY8Jg0pg2/Rby/r4PfyL1DVWfdh0qJhobQRNlO71jjzfD6a4 bSYCi7CY/aShuBoGMuZ3DeCkUgAJQB4pNmxUE9lUJ6bQQjGuF4GDadaKs6PgPQoarjlj BA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 30s09rdfc6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:37 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bm9X131707; Thu, 7 May 2020 00:44:37 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30t1r95cpc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:37 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470iX0S026145; Thu, 7 May 2020 00:44:34 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:33 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 33/43] PKRAM: atomically add and remove link pages Date: Wed, 6 May 2020 17:41:59 -0700 Message-Id: <1588812129-8596-34-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=0 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add and remove pkram_link pages from a pkram_obj atomically to prepare for multithreading. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/mm/pkram.c b/mm/pkram.c index 5f4e4d12865f..042c14dedc25 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -551,22 +551,31 @@ static void pkram_truncate(void) static void pkram_add_link(struct pkram_link *link, struct pkram_obj *obj) { - link->link_pfn = obj->link_pfn; - obj->link_pfn = page_to_pfn(virt_to_page(link)); + __u64 link_pfn = page_to_pfn(virt_to_page(link)); + __u64 *head = &obj->link_pfn; + + do { + link->link_pfn = *head; + } while (cmpxchg64(head, link->link_pfn, link_pfn) != link->link_pfn); } static struct pkram_link *pkram_remove_link(struct pkram_obj *obj) { struct pkram_link *current_link; + __u64 *head = &obj->link_pfn; + __u64 head_pfn = *head; + + while (head_pfn) { + current_link = pfn_to_kaddr(head_pfn); + if (cmpxchg64(head, head_pfn, current_link->link_pfn) == head_pfn) { + current_link->link_pfn = 0; + return current_link; + } - if (!obj->link_pfn) - return NULL; - - current_link = pfn_to_kaddr(obj->link_pfn); - obj->link_pfn = current_link->link_pfn; - current_link->link_pfn = 0; + head_pfn = *head; + } - return current_link; + return NULL; } static void pkram_stream_init(struct pkram_stream *ps, From patchwork Thu May 7 00:42:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532167 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF01281 for ; Thu, 7 May 2020 00:45:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7DA1821655 for ; Thu, 7 May 2020 00:45:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="NaEwX/P0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7DA1821655 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0EF5490001C; Wed, 6 May 2020 20:45:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 07A4390001B; Wed, 6 May 2020 20:45:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC0D090001C; Wed, 6 May 2020 20:45:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id B3135900003 for ; Wed, 6 May 2020 20:45:17 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 83680181AEF30 for ; Thu, 7 May 2020 00:45:17 +0000 (UTC) X-FDA: 76788079074.19.cable31_2e257af74ff2f X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:4423:30034:30054:30064:30069:30079,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: cable31_2e257af74ff2f X-Filterd-Recvd-Size: 9191 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:16 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dTvg076311; Thu, 7 May 2020 00:44:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=J/IiCj+3kST83E/VcM8iyZaL98ROPYAwK19raHrmTZw=; b=NaEwX/P0ZZI39WJCV8xbxOOGMpPaIk3KdKPqmf8xqw4W7aDYRJW8HFEbpNiMORAFOvdg Pw0JMz+eZIw1ILS1MUToUdL98fMRvPTPn5RInXZTH3CXNINmo4dd871gbwXTMtprIi5c OY/w/M9UUV003jvnhTMeGaLUnH/GW8q+W3Ps1wOZYgvxGqADYIXsF1ml9k2aP35nIE65 AJqv83Hztpww5XQcAIl/iTQk4+71FS6FE5+fB/xoAxqaagg+rcsnPUwprV29UckCzZ3Z 29bXAE5twy3CU1lwfYpHznoWgzK+Pd7y9lXfIJChI/rgDlTz1AoTZKiuLeSxgkiiZcSB Hw== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 30s09rdfcg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:42 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bTQT098629; Thu, 7 May 2020 00:44:41 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 30sjnma5t8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:41 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470ibRQ026192; Thu, 7 May 2020 00:44:37 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:36 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 34/43] shmem: PKRAM: multithread preserving and restoring shmem pages Date: Wed, 6 May 2020 17:42:00 -0700 Message-Id: <1588812129-8596-35-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=0 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Improve performance by multithreading the work to preserve and restore shmem pages. Add 'pkram_max_threads=' kernel option to specify the maximum number of threads to use to preserve or restore the pages of a shmem file. The default is 16. When preserving pages each thread saves chunks of a file to a pkram_obj until no more no more chunks are available. When restoring pages each thread loads pages using a copy of a pkram_stream initialized by pkram_prepare_load_obj(). Under the hood each thread ends up fetching and operating on pkram_link pages. Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 2 + mm/shmem_pkram.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 101 insertions(+), 2 deletions(-) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index e71ccb91d6a6..bf2e138b044e 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -13,6 +13,8 @@ struct pkram_stream { struct pkram_node *node; struct pkram_obj *obj; + int error; + struct pkram_link *link; /* current link */ unsigned int entry_idx; /* next entry in link */ diff --git a/mm/shmem_pkram.c b/mm/shmem_pkram.c index 2f4d0bdf3e05..4992b6c3e54e 100644 --- a/mm/shmem_pkram.c +++ b/mm/shmem_pkram.c @@ -126,6 +126,16 @@ static int save_file_content_range(struct address_space *mapping, return err; } +/* Completion tracking for do_save_file_content_thr() threads */ +static atomic_t pkram_save_n_undone; +static DECLARE_COMPLETION(pkram_save_all_done_comp); + +static inline void pkram_save_report_one_done(void) +{ + if (atomic_dec_and_test(&pkram_save_n_undone)) + complete(&pkram_save_all_done_comp); +} + static int do_save_file_content(struct pkram_stream *ps) { int ret; @@ -142,11 +152,55 @@ static int do_save_file_content(struct pkram_stream *ps) return ret; } +static int do_save_file_content_thr(void *data) +{ + struct pkram_stream *ps = data; + struct pkram_stream pslocal = *ps; + int ret; + + ret = do_save_file_content(&pslocal); + if (ret && !ps->error) + ps->error = ret; + + pkram_save_report_one_done(); + return 0; +} +#define PKRAM_DEFAULT_MAX_THREADS 16 + +static int pkram_max_threads = PKRAM_DEFAULT_MAX_THREADS; + +static int __init set_pkram_max_threads(char *str) +{ + unsigned int val; + + if (kstrtouint(str, 0, &val)) + return 1; + + pkram_max_threads = val; + + return 1; +} +__setup("pkram_max_threads=", set_pkram_max_threads); + static int save_file_content(struct pkram_stream *ps) { + unsigned int thr, nr_threads; + + nr_threads = num_online_cpus() - 1; + nr_threads = clamp_val(pkram_max_threads, 1, nr_threads); + ps->max_idx = DIV_ROUND_UP(i_size_read(ps->mapping->host), PAGE_SIZE); - return do_save_file_content(ps); + if (nr_threads == 1) + return do_save_file_content(ps); + + atomic_set(&pkram_save_n_undone, nr_threads); + for (thr = 0; thr < nr_threads; thr++) + kthread_run(do_save_file_content_thr, ps, "pkram_save%d", thr); + + wait_for_completion(&pkram_save_all_done_comp); + + return ps->error; } static int save_file(struct dentry *dentry, struct pkram_stream *ps) @@ -248,7 +302,17 @@ int shmem_save_pkram(struct super_block *sb) return err; } -static int load_file_content(struct pkram_stream *ps) +/* Completion tracking for do_load_file_content_thr() threads */ +static atomic_t pkram_load_n_undone; +static DECLARE_COMPLETION(pkram_load_all_done_comp); + +static inline void pkram_load_report_one_done(void) +{ + if (atomic_dec_and_test(&pkram_load_n_undone)) + complete(&pkram_load_all_done_comp); +} + +static int do_load_file_content(struct pkram_stream *ps) { unsigned long index; struct page *page; @@ -266,6 +330,39 @@ static int load_file_content(struct pkram_stream *ps) return err; } +static int do_load_file_content_thr(void *data) +{ + struct pkram_stream *ps = data; + struct pkram_stream pslocal = *ps; + int ret; + + ret = do_load_file_content(&pslocal); + if (ret && !ps->error) + ps->error = ret; + + pkram_load_report_one_done(); + return 0; +} + +static int load_file_content(struct pkram_stream *ps) +{ + unsigned int thr, nr_threads; + + nr_threads = num_online_cpus() - 1; + nr_threads = clamp_val(pkram_max_threads, 1, nr_threads); + + if (nr_threads == 1) + return do_load_file_content(ps); + + atomic_set(&pkram_load_n_undone, nr_threads); + for (thr = 0; thr < nr_threads; thr++) + kthread_run(do_load_file_content_thr, ps, "pkram_load%d", thr); + + wait_for_completion(&pkram_load_all_done_comp); + + return ps->error; +} + static int load_file(struct dentry *parent, struct pkram_stream *ps, char *buf, size_t bufsize) { From patchwork Thu May 7 00:42:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532171 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 04561139F for ; Thu, 7 May 2020 00:45:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B89A521655 for ; Thu, 7 May 2020 00:45:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="HgABZr+I" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B89A521655 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 70E0990001D; Wed, 6 May 2020 20:45:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 59EE190001B; Wed, 6 May 2020 20:45:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D57890001D; Wed, 6 May 2020 20:45:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0131.hostedemail.com [216.40.44.131]) by kanga.kvack.org (Postfix) with ESMTP id 1F3B890001B for ; Wed, 6 May 2020 20:45:19 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BB5FF824559C for ; Thu, 7 May 2020 00:45:18 +0000 (UTC) X-FDA: 76788079116.17.man71_2e4faeee51a22 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30003:30036:30054:30064:30070:30075,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: man71_2e4faeee51a22 X-Filterd-Recvd-Size: 8642 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:17 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bf8n064546; Thu, 7 May 2020 00:44:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=oFT1yBFW8AHAZV3UNuKLHKSGogjlAC220NrwLhQ0UYw=; b=HgABZr+IuW0bizjZsOhBocfRcnf0KRefLf3gt3ntgcQxVC5z5midvGSiL3u5c+0IKq7G i+qh6CeUH01sNbh0H+SiqbbQHo5PUodu9RD4K1ErRgDicA6zKoxWLq/FzV/uk94KsDrG C5MsL+g0FI9qhmduZAhHk8ILBazNIMUO2KlXNvNtH+DxPveiamYpfTPrTvrHR/D5jFYe aIdLYMI/qzYHDhGnOAs5CwZbJSCYQhQAucJw58JnfFcRJH9yFZFFMhAjJh6HvnD2FuEU OGod21YuTTgQQddM/74eIduNCal5IlR81Ox8j3lxVdhgNkNw8Czy13yyzXW376U8IV+W rA== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 30s09rdfcn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:43 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470allu170700; Thu, 7 May 2020 00:44:43 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 30us7p2pfa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:43 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470ieAl024907; Thu, 7 May 2020 00:44:40 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:39 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 35/43] shmem: introduce shmem_insert_pages() Date: Wed, 6 May 2020 17:42:01 -0700 Message-Id: <1588812129-8596-36-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=2 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Calling shmem_insert_page() to insert one page at a time does not scale well when multiple threads are inserting pages into the same shmem segment. This is primarily due to the locking needed when adding to the pagecache and LRU but also due to contention on the shmem_inode_info lock. To address the shmem_inode_info lock and prepare for future optimizations, introduce shmem_insert_pages() which allows a caller to pass an array of pages to be inserted into a shmem segment. Signed-off-by: Anthony Yznaga --- include/linux/shmem_fs.h | 3 +- mm/shmem.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+), 1 deletion(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index f2ce9937a8f2..d308a6a154b6 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -105,7 +105,8 @@ extern int shmem_getpage(struct inode *inode, pgoff_t index, extern int shmem_insert_page(struct mm_struct *mm, struct inode *inode, pgoff_t index, struct page *page); - +extern int shmem_insert_pages(struct mm_struct *mm, struct inode *inode, + pgoff_t index, struct page *pages[], int npages); #ifdef CONFIG_PKRAM extern int shmem_parse_pkram(const char *str, struct shmem_pkram_info **pkram); extern void shmem_show_pkram(struct seq_file *seq, struct shmem_pkram_info *pkram, diff --git a/mm/shmem.c b/mm/shmem.c index 1f3b43b8fa34..ca5edf580f24 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -781,6 +781,120 @@ int shmem_insert_page(struct mm_struct *mm, struct inode *inode, pgoff_t index, return err; } +int shmem_insert_pages(struct mm_struct *mm, struct inode *inode, pgoff_t index, + struct page *pages[], int npages) +{ + struct address_space *mapping = inode->i_mapping; + struct shmem_inode_info *info = SHMEM_I(inode); + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); + gfp_t gfp = mapping_gfp_mask(mapping); + struct mem_cgroup *memcg; + int i, err; + int nr = 0; + + for (i = 0; i < npages; i++) + nr += compound_nr(pages[i]); + + if (index + nr - 1 > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) + return -EFBIG; + +retry: + err = 0; + if (!shmem_inode_acct_block(inode, nr)) + err = -ENOSPC; + if (err) { + int retry = 5; + + /* + * Try to reclaim some space by splitting a huge page + * beyond i_size on the filesystem. + */ + while (retry--) { + int ret; + + ret = shmem_unused_huge_shrink(sbinfo, NULL, 1); + if (ret == SHRINK_STOP) + break; + if (ret) + goto retry; + } + goto failed; + } + + for (i = 0; i < npages; i++) { + if (!PageLRU(pages[i])) { + __SetPageLocked(pages[i]); + __SetPageSwapBacked(pages[i]); + } else { + lock_page(pages[i]); + } + + __SetPageReferenced(pages[i]); + + if (pages[i]->mem_cgroup) + continue; + + err = mem_cgroup_try_charge_delay(pages[i], mm, gfp, + &memcg, PageTransHuge(pages[i])); + if (err) + goto out_unlock; + + } + + for (i = 0; i < npages; i++) { + err = shmem_add_to_page_cache(pages[i], mapping, index, + NULL, gfp & GFP_RECLAIM_MASK); + if (err) + goto out_truncate; + + if (PageTransHuge(pages[i])) + index += HPAGE_PMD_NR; + else + index++; + } + + spin_lock(&info->lock); + info->alloced += nr; + inode->i_blocks += BLOCKS_PER_PAGE * nr; + shmem_recalc_inode(inode); + spin_unlock(&info->lock); + + for (i = 0; i < npages; i++) { + if (!pages[i]->mem_cgroup) { + mem_cgroup_commit_charge(pages[i], memcg, + PageLRU(pages[i]), PageTransHuge(pages[i])); + } + + if (!PageLRU(pages[i])) + lru_cache_add_anon(pages[i]); + + flush_dcache_page(pages[i]); + SetPageUptodate(pages[i]); + set_page_dirty(pages[i]); + + unlock_page(pages[i]); + } + + return 0; + +out_truncate: + while (--i >= 0) + truncate_inode_page(mapping, pages[i]); + i = npages; +out_unlock: + while (--i >= 0) { + if (!pages[i]->mem_cgroup) { + mem_cgroup_cancel_charge(pages[i], memcg, + PageTransHuge(pages[i])); + } + + unlock_page(pages[i]); + } + shmem_inode_unacct_blocks(inode, nr); +failed: + return err; +} + /* * Remove swap entry from page cache, free the swap and its page cache. */ From patchwork Thu May 7 00:42:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532169 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 414B5139F for ; Thu, 7 May 2020 00:45:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 007492064A for ; Thu, 7 May 2020 00:45:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="nR0OSEw/" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 007492064A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 36046900003; Wed, 6 May 2020 20:45:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1B0D490001D; Wed, 6 May 2020 20:45:18 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E820D900003; Wed, 6 May 2020 20:45:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168]) by kanga.kvack.org (Postfix) with ESMTP id C74F290001B for ; Wed, 6 May 2020 20:45:17 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8D442AF73 for ; Thu, 7 May 2020 00:45:17 +0000 (UTC) X-FDA: 76788079074.04.hole17_2e24557689052 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:4423:30029:30034:30054:30064:30079,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: hole17_2e24557689052 X-Filterd-Recvd-Size: 9734 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:16 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dEUI097500; Thu, 7 May 2020 00:44:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=ERMI4X4SzsQD4gAyYFAitYboRjs4H+C7qGOMU4P5U5w=; b=nR0OSEw/DvPNl9VYaj/Ak5biC59GnYGNIl5WQf5hnttYGpCE+hCzH04GgFFOJHLe9304 l2dCwTZ2mUP8uX8uE6MIkQ5nz1AtnTIop8y59PSxkfzRYXKpyzttiW96qTCEEqXI5pPg nHnl4Ayfbaxsf8xyp2mvb0z544E6ealj8rS+aBok46UoclSSA7PuVpeeic97ivg2PFpZ 5+3rdAEyLoyYz/tMD1ikokxl+2tXKO9md3r3n/Jp8WjctISXJqa3VJGNot17BlcADz4W ZUI1CXRETSuC63ayWY7FC9nD+BrycaicYQ2Lp8Nuo4lrwYimTkkods9kTa2WACEx7uIt LQ== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2120.oracle.com with ESMTP id 30usgq4h2b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:47 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470at2I136306; Thu, 7 May 2020 00:44:47 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3030.oracle.com with ESMTP id 30sjdwrt4e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:47 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470ih14026275; Thu, 7 May 2020 00:44:43 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:43 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 36/43] PKRAM: add support for loading pages in bulk Date: Wed, 6 May 2020 17:42:02 -0700 Message-Id: <1588812129-8596-37-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds three functions: pkram_prepare_load_pages() Called after calling pkram_prepare_load_obj() pkram_load_pages() Loads some number of pages that are contiguous by their original file index values. The index of the first page, an array of the page pointers, and the number of pages in the array are provided to the caller. pkram_finish_load_pages() Called when no more pages will be loaded from the pkram_obj. Signed-off-by: Anthony Yznaga --- include/linux/pkram.h | 6 +++ mm/pkram.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 112 insertions(+) diff --git a/include/linux/pkram.h b/include/linux/pkram.h index bf2e138b044e..3f059791f88c 100644 --- a/include/linux/pkram.h +++ b/include/linux/pkram.h @@ -18,6 +18,9 @@ struct pkram_stream { struct pkram_link *link; /* current link */ unsigned int entry_idx; /* next entry in link */ + struct page **pages; + unsigned int nr_pages; + unsigned long next_index; struct address_space *mapping; struct mm_struct *mm; @@ -60,14 +63,17 @@ void pkram_discard_save(struct pkram_stream *ps); int pkram_prepare_load(struct pkram_stream *ps, const char *name); int pkram_prepare_load_obj(struct pkram_stream *ps); +int pkram_prepare_load_pages(struct pkram_stream *ps); void pkram_finish_load(struct pkram_stream *ps); void pkram_finish_load_obj(struct pkram_stream *ps); +void pkram_finish_load_pages(struct pkram_stream *ps); #define PKRAM_PAGE_TRANS_HUGE 0x1 /* page is a transparent hugepage */ int pkram_save_page(struct pkram_stream *ps, struct page *page, short flags); struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, short *flags); +int pkram_load_pages(struct pkram_stream *ps, unsigned long *index); ssize_t pkram_write(struct pkram_stream *ps, const void *buf, size_t count); size_t pkram_read(struct pkram_stream *ps, void *buf, size_t count); diff --git a/mm/pkram.c b/mm/pkram.c index 042c14dedc25..ef092aa5ce7a 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -820,6 +820,37 @@ int pkram_prepare_load_obj(struct pkram_stream *ps) } /** + * Initialize stream @ps for loading preserved pages from it. + * + * Returns 0 on success, -errno on failure. + * + * Error values: + * %ENOMEM: insufficient memory available + * + * After the load has finished, pkram_finish_load_pages() is to be called. + */ +int pkram_prepare_load_pages(struct pkram_stream *ps) +{ + BUG_ON((ps->node->flags & PKRAM_ACCMODE_MASK) != PKRAM_LOAD); + + ps->pages = kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!ps->pages) + return -ENOMEM; + + return 0; +} + +/** + * Finish the load of preserved pages started with pkram_prepare_load_pages() + */ +void pkram_finish_load_pages(struct pkram_stream *ps) +{ + BUG_ON((ps->node->flags & PKRAM_ACCMODE_MASK) != PKRAM_LOAD); + + kfree(ps->pages); +} + +/** * Finish the load of a preserved memory object started with * pkram_prepare_load_obj() freeing the object and any data that has not * been loaded from it. @@ -1066,6 +1097,81 @@ struct page *pkram_load_page(struct pkram_stream *ps, unsigned long *index, shor } /** + * Load pages from the preserved memory object associated with stream + * @ps. The stream must have been initialized with pkram_prepare_load(), + * pkram_prepare_load_obj(), and pkram_prepare_load_pages(). + * The page entries of a single pkram_link are processed, and the stream + * 'pages' buffer is populated with the page pointers. + * + * Returns 0 if one or more pages are loaded or -ENODATA if there are no + * pages to load. + * + * The pages loaded have an incremented refcount either because the page + * was initialized with a refcount of 1 at boot or because the page was + * subsequently preserved which increased the refcount. + */ +int pkram_load_pages(struct pkram_stream *ps, unsigned long *index) +{ + struct pkram_link *link = ps->link; + int nr_entries = 0; + int i; + + if (!link) { + link = pkram_remove_link(ps->obj); + if (!link) + return -ENODATA; + } + + *index = link->index; + + for (i = 0; i < PKRAM_LINK_ENTRIES_MAX; i++) { + unsigned long p = link->entry[i]; + struct page *page; + short flags; + + if (!p) + break; + + flags = (p >> PKRAM_ENTRY_FLAGS_SHIFT) & PKRAM_ENTRY_FLAGS_MASK; + nr_entries++; + + page = pfn_to_page(PHYS_PFN(p)); + ps->pages[i] = page; + + if (flags & PKRAM_PAGE_TRANS_HUGE) { + int order = p & PKRAM_ENTRY_ORDER_MASK; + int nr_pages = 1 << order; + int j; + + for (j = 0; j < nr_pages; j++) { + struct page *p = page + j; + + ClearPageReserved(p); + } + + prep_compound_page(page, order); + prep_transhuge_page(page); + } else { + ClearPageReserved(page); + } + + pkram_remove_identity_map(page); + } + + ps->nr_pages = nr_entries; + + /* Advance to next pkram_link page and free this one */ + if (link->link_pfn) + ps->link = pfn_to_kaddr(link->link_pfn); + else + ps->link = NULL; + + pkram_free_page(link); + + return 0; +} + +/** * Copy @count bytes from @buf to the preserved memory node and object * associated with stream @ps. The stream must have been initialized with * pkram_prepare_save() and pkram_prepare_save_obj(). From patchwork Thu May 7 00:42:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532173 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C658E81 for ; Thu, 7 May 2020 00:45:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8A1112064A for ; Thu, 7 May 2020 00:45:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="lBRYYVji" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8A1112064A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1D7EF90001E; Wed, 6 May 2020 20:45:21 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 162E990001B; Wed, 6 May 2020 20:45:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1CD890001E; Wed, 6 May 2020 20:45:20 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0008.hostedemail.com [216.40.44.8]) by kanga.kvack.org (Postfix) with ESMTP id D85DC90001B for ; Wed, 6 May 2020 20:45:20 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 91A4711913 for ; Thu, 7 May 2020 00:45:20 +0000 (UTC) X-FDA: 76788079200.24.quill51_2e96f84341b52 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: quill51_2e96f84341b52 X-Filterd-Recvd-Size: 5587 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:19 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dEUJ097500; Thu, 7 May 2020 00:44:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=tOzDN5QRiW0NbdfQAY9h5DJPaKglehgjEjrdEdmx1f4=; b=lBRYYVjiwnboBLdsbCnfuIKuUvyV4Tb1PrnqIJzVMBtICdZ9jaFnmAsVtSqdufLw8Lkj R/rLVnwt21p57My0cE/rC/CpuNQ4Mzukt/dxNwn87fyXSXioV/8VV0X4ro8aZ60ZGB7x CS07279N98Ai7FAC4KfEWCksHQTXacRnj6rWSwdiGHc/dZ8N+XKy+sYelKE4Cy8d/1to FMWAPMfC1fvp/xhjRmQylo2gn9PzBstNHezUodSJte6uzxyt2tjCSrKc/4DPgyG6py+N qH/XacX69hs+noVnHM24ROyLnA8wJFKZcLGbbMgSIA76DkT9aK5lV+sxssGBjrJ03+js nQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 30usgq4h2f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:50 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470am50170751; Thu, 7 May 2020 00:44:50 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 30us7p2pjm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:49 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470iksR026299; Thu, 7 May 2020 00:44:46 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:46 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 37/43] shmem: PKRAM: enable bulk loading of preserved pages into shmem Date: Wed, 6 May 2020 17:42:03 -0700 Message-Id: <1588812129-8596-38-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Make use of new interfaces for loading and inserting preserved pages into a shmem file in bulk. Signed-off-by: Anthony Yznaga --- mm/shmem_pkram.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/mm/shmem_pkram.c b/mm/shmem_pkram.c index 4992b6c3e54e..435488368104 100644 --- a/mm/shmem_pkram.c +++ b/mm/shmem_pkram.c @@ -315,18 +315,29 @@ static inline void pkram_load_report_one_done(void) static int do_load_file_content(struct pkram_stream *ps) { unsigned long index; - struct page *page; - int err = 0; + int i, err; + + err = pkram_prepare_load_pages(ps); + if (err) + return err; do { - page = pkram_load_page(ps, &index, NULL); - if (!page) + err = pkram_load_pages(ps, &index); + if (err) { + if (err == -ENODATA) + err = 0; break; + } - err = shmem_insert_page(ps->mm, ps->mapping->host, index, page); - put_page(page); + err = shmem_insert_pages(ps->mm, ps->mapping->host, index, + ps->pages, ps->nr_pages); + + for (i = 0; i < ps->nr_pages; i++) + put_page(ps->pages[i]); } while (!err); + pkram_finish_load_pages(ps); + return err; } From patchwork Thu May 7 00:42:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532247 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6948981 for ; Thu, 7 May 2020 00:47:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 27B1C2082E for ; Thu, 7 May 2020 00:47:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="dPlMIKVt" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 27B1C2082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 69112900029; Wed, 6 May 2020 20:47:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6412A900023; Wed, 6 May 2020 20:47:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53035900029; Wed, 6 May 2020 20:47:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0060.hostedemail.com [216.40.44.60]) by kanga.kvack.org (Postfix) with ESMTP id 3B8EF900023 for ; Wed, 6 May 2020 20:47:27 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 03E18181AEF30 for ; Thu, 7 May 2020 00:47:27 +0000 (UTC) X-FDA: 76788084534.26.brass95_40ff1b2fe4823 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30070,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: brass95_40ff1b2fe4823 X-Filterd-Recvd-Size: 8908 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:47:26 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dorh097717; Thu, 7 May 2020 00:46:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=arcq8/HROQoIUiARdJysmZeubuzb+nFutvXELT9aDds=; b=dPlMIKVtPrIjyIeR1b5KRXUxKZhC1X1XWLgImAWpPKmV8IpsamzyGSJ0TKX2RV0704gq CTahxA4HfhdRGEAhkqYGa4Dkj25GdkzZcwii0mXdm5sNq2jYqLOsXcFkYCt4RFACBoov O7P9uIY02ZTKBv48yX3ToBUW8nvJzf1VAdYxx+goAyMFnoLU+rSCL3PfIvieghDUr6yn X7LGaV6szLjIDh4YyXKsDdRakFOjC8SB8RSMbJwe/y/x0wH9hJ46S5qJdeUpMKPzm7p8 LbdnzdD6dVGHtIgd2lh83oLwfkmZ2KTumssABEf4D3rt0iSfGpPB4aUY8kRR8rFxlkMr ow== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2120.oracle.com with ESMTP id 30usgq4h6x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:46:54 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470at2K136306; Thu, 7 May 2020 00:44:53 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3030.oracle.com with ESMTP id 30sjdwrt78-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:53 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470ioUE024931; Thu, 7 May 2020 00:44:50 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:49 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 38/43] mm: implement splicing a list of pages to the LRU Date: Wed, 6 May 2020 17:42:04 -0700 Message-Id: <1588812129-8596-39-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Considerable contention on the LRU lock happens when multiple threads are used to insert pages into a shmem file in parallel. To alleviate this provide a way for pages to be added to the same LRU to be staged so that they can be added by splicing lists and updating stats once with the lock held. For now only unevictable pages are supported. Signed-off-by: Anthony Yznaga --- include/linux/swap.h | 11 ++++++ mm/swap.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 112 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index e1bbf7a16b27..462045f536a8 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -346,6 +346,17 @@ extern void swap_setup(void); extern void lru_cache_add_active_or_unevictable(struct page *page, struct vm_area_struct *vma); +struct lru_splice { + struct list_head splice; + struct list_head *lru_head; + struct pglist_data *pgdat; + struct lruvec *lruvec; + enum lru_list lru; + unsigned long nr_pages[MAX_NR_ZONES]; + unsigned long pgculled; +}; +extern void lru_splice_add_anon(struct page *page, struct lru_splice *splice); +extern void add_splice_to_lru_list(struct lru_splice *splice); /* linux/mm/vmscan.c */ extern unsigned long zone_reclaimable_pages(struct zone *zone); diff --git a/mm/swap.c b/mm/swap.c index bf9a79fed62d..848f8b516471 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -187,6 +187,107 @@ int get_kernel_page(unsigned long start, int write, struct page **pages) } EXPORT_SYMBOL_GPL(get_kernel_page); +/* + * Update stats and move accumulated pages from an lru_splice to the lru. + */ +void add_splice_to_lru_list(struct lru_splice *splice) +{ + struct pglist_data *pgdat = splice->pgdat; + struct lruvec *lruvec = splice->lruvec; + enum lru_list lru = splice->lru; + unsigned long flags = 0; + int zid; + + spin_lock_irqsave(&pgdat->lru_lock, flags); + for (zid = 0; zid < MAX_NR_ZONES; zid++) { + if (splice->nr_pages[zid]) + update_lru_size(lruvec, lru, zid, splice->nr_pages[zid]); + } + count_vm_events(UNEVICTABLE_PGCULLED, splice->pgculled); + list_splice(&splice->splice, splice->lru_head); + spin_unlock_irqrestore(&pgdat->lru_lock, flags); + + splice->lru_head = NULL; + splice->pgculled = 0; +} + +static void add_page_to_lru_splice(struct page *page, + struct lru_splice *splice, + struct lruvec *lruvec, enum lru_list lru) +{ + int zid; + + if (splice->lru_head == &lruvec->lists[lru]) { + list_add(&page->lru, &splice->splice); + splice->nr_pages[page_zonenum(page)] += hpage_nr_pages(page); + return; + } + + INIT_LIST_HEAD(&splice->splice); + splice->lruvec = lruvec; + splice->lru_head = &lruvec->lists[lru]; + splice->lru = lru; + list_add(&page->lru, &splice->splice); + for (zid = 0; zid < MAX_NR_ZONES; zid++) + splice->nr_pages[zid] = 0; + splice->nr_pages[page_zonenum(page)] = hpage_nr_pages(page); + +} + +/* + * Similar in functionality to __pagevec_lru_add_fn() but here the page is + * being added to an lru_splice and the LRU lock is not held. + */ +static void page_lru_splice_add(struct page *page, struct lruvec *lruvec, struct lru_splice *splice) +{ + enum lru_list lru; + int was_unevictable = TestClearPageUnevictable(page); + + VM_BUG_ON_PAGE(PageLRU(page), page); + /* XXX only supports unevictable pages at the moment */ + VM_BUG_ON_PAGE(was_unevictable, page); + + SetPageLRU(page); + smp_mb(); + + lru = LRU_UNEVICTABLE; + ClearPageActive(page); + SetPageUnevictable(page); + if (!was_unevictable) + splice->pgculled++; + + add_page_to_lru_splice(page, splice, lruvec, lru); + trace_mm_lru_insertion(page, lru); +} + +static void lru_splice_add(struct page *page, struct lru_splice *splice) +{ + struct pglist_data *pagepgdat, *pgdat = splice->pgdat; + struct lruvec *lruvec; + + pagepgdat = page_pgdat(page); + + if (pagepgdat != pgdat) { + if (pgdat) + add_splice_to_lru_list(splice); + splice->pgdat = pagepgdat; + } + + lruvec = mem_cgroup_page_lruvec(page, pagepgdat); + page_lru_splice_add(page, lruvec, splice); +} + +void lru_splice_add_anon(struct page *page, struct lru_splice *splice) +{ + if (PageActive(page)) + ClearPageActive(page); + get_page(page); + + lru_splice_add(page, splice); + + put_page(page); +} + static void pagevec_lru_move_fn(struct pagevec *pvec, void (*move_fn)(struct page *page, struct lruvec *lruvec, void *arg), void *arg) From patchwork Thu May 7 00:42:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532175 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C20881 for ; Thu, 7 May 2020 00:45:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 094A320736 for ; Thu, 7 May 2020 00:45:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="vm1xGfzf" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 094A320736 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2C2A790001F; Wed, 6 May 2020 20:45:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 298D590001B; Wed, 6 May 2020 20:45:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 160E990001F; Wed, 6 May 2020 20:45:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id E88BB90001B for ; Wed, 6 May 2020 20:45:25 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A78D2BEE7 for ; Thu, 7 May 2020 00:45:25 +0000 (UTC) X-FDA: 76788079410.11.room16_2f57d8439320a X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: room16_2f57d8439320a X-Filterd-Recvd-Size: 5679 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:25 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dOG7097631; Thu, 7 May 2020 00:44:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=U5kSpS0/Dc5fgEe6X16U4eAHbnbPBtC/T/domT4TH8s=; b=vm1xGfzfTCm7LXxnLDJHzBg+kv+2Q3Gu1b70Z2+C7lDFsCCkSq2aF5KU4TaJWKovUTbm 4ZCOEQ05F0FY1NqHr63zFBE+93CAKFygbJaDM38ipDP0MjM3ZnKCMwqPxeYsyU2jqMgs EBDvlOYXbwTwEFkmLSqhwG/YD/Rh/FaeX3jxgpBiaR2ZyrHOjnpcWNgOcb0XCceJeF1f BdYiGLFJmlbWo697vNbyzh7SOZaBzfqwlhp4wsjkyOoz34sQb6Ti3HK26oJnEH9zCczv olmk4xH+Sda92kPCQl7VvYBJq/Bv8KAwazzL+LEYXb1xcT74Gb7QSPUdgly1DKhBX9I6 XQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 30usgq4h2u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:56 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470alL4170703; Thu, 7 May 2020 00:44:55 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3020.oracle.com with ESMTP id 30us7p2pnn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:55 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470irBD020628; Thu, 7 May 2020 00:44:53 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:53 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 39/43] shmem: optimize adding pages to the LRU in shmem_insert_pages() Date: Wed, 6 May 2020 17:42:05 -0700 Message-Id: <1588812129-8596-40-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Reduce LRU lock contention when inserting shmem pages by staging pages to be added to the same LRU and adding them en masse. Signed-off-by: Anthony Yznaga --- mm/shmem.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/shmem.c b/mm/shmem.c index ca5edf580f24..678a396ba8d3 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -789,9 +789,12 @@ int shmem_insert_pages(struct mm_struct *mm, struct inode *inode, pgoff_t index, struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); gfp_t gfp = mapping_gfp_mask(mapping); struct mem_cgroup *memcg; + struct lru_splice splice; int i, err; int nr = 0; + memset(&splice, 0, sizeof(splice)); + for (i = 0; i < npages; i++) nr += compound_nr(pages[i]); @@ -866,7 +869,7 @@ int shmem_insert_pages(struct mm_struct *mm, struct inode *inode, pgoff_t index, } if (!PageLRU(pages[i])) - lru_cache_add_anon(pages[i]); + lru_splice_add_anon(pages[i], &splice); flush_dcache_page(pages[i]); SetPageUptodate(pages[i]); @@ -875,6 +878,9 @@ int shmem_insert_pages(struct mm_struct *mm, struct inode *inode, pgoff_t index, unlock_page(pages[i]); } + if (splice.pgdat) + add_splice_to_lru_list(&splice); + return 0; out_truncate: From patchwork Thu May 7 00:42:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532185 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADBAA15AB for ; Thu, 7 May 2020 00:45:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6D1BE2082E for ; Thu, 7 May 2020 00:45:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="fknRxePL" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6D1BE2082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 693E990001B; Wed, 6 May 2020 20:45:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6424B900002; Wed, 6 May 2020 20:45:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 50C9290001B; Wed, 6 May 2020 20:45:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id 35687900002 for ; Wed, 6 May 2020 20:45:31 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E8A84181AEF30 for ; Thu, 7 May 2020 00:45:30 +0000 (UTC) X-FDA: 76788079620.22.drug46_3018d2ac21819 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064:30070:30075,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: drug46_3018d2ac21819 X-Filterd-Recvd-Size: 8081 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:30 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470dDks097472; Thu, 7 May 2020 00:45:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=7K77j/6mtxQ9j5vDJ+M7D2bl3GjRuKy3se3PRu5nsHg=; b=fknRxePLvSCA2rgFxv17tFilEvUcB8Sm1qixpcq8htEL5+jMu9HgoK4AmSWmU5S+ZkSA nDAhgsc9si0wjbVnFvNv/8X2KeIYxyingZ+wv8jvbeRn+nVmTFly4CU0yNcKBvY2+jVu c2D85RuZlAHnVFY0HdVTQEdHGa7aiJPbG99wIg4XJyPEW+jT33EouxaTBIY1nMaIv0lV CbQYJXCcHIp4hPBwSJR28UisseHnsGqGQwpfd6W5LjvXz7KtxddPjhifE9TJ6cNYG2Ae zzaUH0VA/so04hfFmVi7z0tTREp0xKaYemyw2Yl7K60g3PKDQvkRdYChSgUY6vCl9+rG 4Q== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 30usgq4h3c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:45:00 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470alm0170700; Thu, 7 May 2020 00:44:59 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3020.oracle.com with ESMTP id 30us7p2prs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:44:59 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470iv2E020654; Thu, 7 May 2020 00:44:57 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:44:57 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 40/43] shmem: initial support for adding multiple pages to pagecache Date: Wed, 6 May 2020 17:42:06 -0700 Message-Id: <1588812129-8596-41-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: shmem_insert_pages() currently loops over the array of pages passed to it and calls shmem_add_to_page_cache() for each one. Prepare for adding pages to the pagecache in bulk by adding and using a shmem_add_pages_to_cache() call. For now it just iterates over an array and adds pages individually, but improvements in performance when multiple threads are adding to the same pagecache are achieved by calling a new shmem_add_to_page_cache_fast() function that does not check for conflicts and drops the xarray lock before updating stats. Signed-off-by: Anthony Yznaga --- mm/shmem.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 84 insertions(+), 11 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 678a396ba8d3..f621d863e362 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -660,6 +660,57 @@ static int shmem_add_to_page_cache(struct page *page, return 0; } +static int shmem_add_to_page_cache_fast(struct page *page, + struct address_space *mapping, + pgoff_t index, gfp_t gfp) +{ + XA_STATE_ORDER(xas, &mapping->i_pages, index, compound_order(page)); + unsigned long nr = compound_nr(page); + unsigned long i = 0; + + VM_BUG_ON_PAGE(PageTail(page), page); + VM_BUG_ON_PAGE(index != round_down(index, nr), page); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(!PageSwapBacked(page), page); + + page_ref_add(page, nr); + page->mapping = mapping; + page->index = index; + + do { + xas_lock_irq(&xas); + xas_create_range(&xas); + if (xas_error(&xas)) + goto unlock; +next: + xas_store(&xas, page); + if (++i < nr) { + xas_next(&xas); + goto next; + } + mapping->nrpages += nr; + xas_unlock(&xas); + if (PageTransHuge(page)) { + count_vm_event(THP_FILE_ALLOC); + __inc_node_page_state(page, NR_SHMEM_THPS); + } + __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr); + __mod_node_page_state(page_pgdat(page), NR_SHMEM, nr); + local_irq_enable(); + break; +unlock: + xas_unlock_irq(&xas); + } while (xas_nomem(&xas, gfp)); + + if (xas_error(&xas)) { + page->mapping = NULL; + page_ref_sub(page, nr); + return xas_error(&xas); + } + + return 0; +} + /* * Like delete_from_page_cache, but substitutes swap for page. */ @@ -681,6 +732,35 @@ static void shmem_delete_from_page_cache(struct page *page, void *radswap) BUG_ON(error); } +static int shmem_add_pages_to_cache(struct page *pages[], int npages, + struct address_space *mapping, + pgoff_t start, gfp_t gfp) +{ + pgoff_t index = start; + int err = 0; + int i; + + i = 0; + while (i < npages) { + if (PageTransHuge(pages[i])) { + err = shmem_add_to_page_cache_fast(pages[i], mapping, index, gfp); + if (err) + break; + index += HPAGE_PMD_NR; + i++; + continue; + } + + err = shmem_add_to_page_cache_fast(pages[i], mapping, index, gfp); + if (err) + break; + index++; + i++; + } + + return err; +} + int shmem_insert_page(struct mm_struct *mm, struct inode *inode, pgoff_t index, struct page *page) { @@ -844,17 +924,10 @@ int shmem_insert_pages(struct mm_struct *mm, struct inode *inode, pgoff_t index, } - for (i = 0; i < npages; i++) { - err = shmem_add_to_page_cache(pages[i], mapping, index, - NULL, gfp & GFP_RECLAIM_MASK); - if (err) - goto out_truncate; - - if (PageTransHuge(pages[i])) - index += HPAGE_PMD_NR; - else - index++; - } + err = shmem_add_pages_to_cache(pages, npages, mapping, index, + gfp & GFP_RECLAIM_MASK); + if (err) + goto out_truncate; spin_lock(&info->lock); info->alloced += nr; From patchwork Thu May 7 00:42:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532253 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B0132139F for ; Thu, 7 May 2020 00:47:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 626782082E for ; Thu, 7 May 2020 00:47:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="M8E0M0Ww" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 626782082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 927C390002A; Wed, 6 May 2020 20:47:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8FFF0900023; Wed, 6 May 2020 20:47:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EEAC90002A; Wed, 6 May 2020 20:47:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0048.hostedemail.com [216.40.44.48]) by kanga.kvack.org (Postfix) with ESMTP id 67D66900023 for ; Wed, 6 May 2020 20:47:47 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 20D5B824999B for ; Thu, 7 May 2020 00:47:47 +0000 (UTC) X-FDA: 76788085374.23.scale49_43eb5920ce10b X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30012:30045:30054:30064,0,RBL:156.151.31.86:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:1,LUA_SUMMARY:none X-HE-Tag: scale49_43eb5920ce10b X-Filterd-Recvd-Size: 11175 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:47:46 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470cd03065150; Thu, 7 May 2020 00:47:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=l1jFeQ++L0S4ASpfp6U4S++udUwtZnOa7K5HcmPbm40=; b=M8E0M0WwtI3cedOWrkJtCN5Seer3lVIlIgeTRqf/ZAbULwP7hbR/DUd10TGvyh+zKJFu Yqbahbnf+BnBlvlzLj9ffvGcE1WmKLs37LsglI7zWYQUZZrkUmRw/Zu8eeB7ow8I3OWq ZnEKp3idtHZBAgPqGrPKlom6mEbYfFZVKEpCLi3J5c8FvIhOmbHf7UrgPDpMlYQukFYI UOjs92DL79AGbXOmlETANSHaur/mHJLJ5oQw8ozc0QzXCKZVnDa7yXv+0jI31db7oLuM EfeVSQ99bXniBIIktzx5ynqu48FqKIKVE81AYG3H9Vh2uM6N+VttsdlqlJSGwds2Vsrl 1g== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 30s09rdfk9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:47:06 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470aoJd136102; Thu, 7 May 2020 00:45:05 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3030.oracle.com with ESMTP id 30sjdwrtfs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:45:05 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470j141025060; Thu, 7 May 2020 00:45:01 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:45:01 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 41/43] XArray: add xas_export_node() and xas_import_node() Date: Wed, 6 May 2020 17:42:07 -0700 Message-Id: <1588812129-8596-42-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 clxscore=1015 suspectscore=2 priorityscore=1501 malwarescore=0 mlxlogscore=999 phishscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Contention on the xarray lock when multiple threads are adding to the same xarray can be mitigated by providing a way to add entries in bulk. Allow a caller to allocate and populate an xarray node outside of the target xarray and then only take the xarray lock long enough to import the node into it. Signed-off-by: Anthony Yznaga --- Documentation/core-api/xarray.rst | 8 +++ include/linux/xarray.h | 2 + lib/test_xarray.c | 45 +++++++++++++++++ lib/xarray.c | 100 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 155 insertions(+) diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst index 640934b6f7b4..659e10df8901 100644 --- a/Documentation/core-api/xarray.rst +++ b/Documentation/core-api/xarray.rst @@ -444,6 +444,14 @@ called each time the XArray updates a node. This is used by the page cache workingset code to maintain its list of nodes which contain only shadow entries. +xas_export_node() is used to remove and return a node from an XArray +while xas_import_node() is used to add a node to an XArray. Together +these can be used, for example, to reduce lock contention when multiple +threads are updating an XArray by allowing a caller to allocate and +populate a node outside of the target XArray in a local XArray, export +the node, and then take the target XArray lock just long enough to import +the node. + Multi-Index Entries ------------------- diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 14c893433139..73bd8ccc4424 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -1504,6 +1504,8 @@ bool xas_nomem(struct xa_state *, gfp_t); void xas_pause(struct xa_state *); void xas_create_range(struct xa_state *); +struct xa_node *xas_export_node(struct xa_state *xas); +void xas_import_node(struct xa_state *xas, struct xa_node *node); /** * xas_reload() - Refetch an entry from the xarray. diff --git a/lib/test_xarray.c b/lib/test_xarray.c index d4f97925dbd8..5cfaa1720cc1 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -1682,6 +1682,50 @@ static noinline void check_destroy(struct xarray *xa) #endif } +static noinline void check_export_import_1(struct xarray *xa, + unsigned long index, unsigned int order) +{ + int xa_shift = order + XA_CHUNK_SHIFT - (order % XA_CHUNK_SHIFT); + XA_STATE(xas, xa, index); + struct xa_node *node; + unsigned long i; + + xa_store_many_order(xa, index, xa_shift); + + xas_lock(&xas); + xas_set_order(&xas, index, xa_shift); + node = xas_export_node(&xas); + xas_unlock(&xas); + + XA_BUG_ON(xa, !xa_empty(xa)); + + do { + xas_lock(&xas); + xas_set_order(&xas, index, xa_shift); + xas_import_node(&xas, node); + xas_unlock(&xas); + } while (xas_nomem(&xas, GFP_KERNEL)); + + for (i = index; i < index + (1UL << xa_shift); i++) + xa_erase_index(xa, i); + + XA_BUG_ON(xa, !xa_empty(xa)); +} + +static noinline void check_export_import(struct xarray *xa) +{ + unsigned int order; + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 12 : 1; + + for (order = 0; order < max_order; order += XA_CHUNK_SHIFT) { + int xa_shift = order + XA_CHUNK_SHIFT; + unsigned long j; + + for (j = 0; j < XA_CHUNK_SIZE; j++) + check_export_import_1(xa, j << xa_shift, order); + } +} + static DEFINE_XARRAY(array); static int xarray_checks(void) @@ -1712,6 +1756,7 @@ static int xarray_checks(void) check_workingset(&array, 0); check_workingset(&array, 64); check_workingset(&array, 4096); + check_export_import(&array); printk("XArray: %u of %u tests passed\n", tests_passed, tests_run); return (tests_run == tests_passed) ? 0 : -EINVAL; diff --git a/lib/xarray.c b/lib/xarray.c index e9e641d3c0c3..478925780e87 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -507,6 +507,30 @@ static void xas_delete_node(struct xa_state *xas) xas_shrink(xas); } +static void xas_unlink_node(struct xa_state *xas) +{ + struct xa_node *node = xas->xa_node; + struct xa_node *parent; + + parent = xa_parent_locked(xas->xa, node); + xas->xa_node = parent; + xas->xa_offset = node->offset; + + if (!parent) { + xas->xa->xa_head = NULL; + xas->xa_node = XAS_BOUNDS; + return; + } + + parent->slots[xas->xa_offset] = NULL; + parent->count--; + XA_NODE_BUG_ON(parent, parent->count > XA_CHUNK_SIZE); + + xas_update(xas, parent); + + xas_delete_node(xas); +} + /** * xas_free_nodes() - Free this node and all nodes that it references * @xas: Array operation state. @@ -1540,6 +1564,82 @@ static void xas_set_range(struct xa_state *xas, unsigned long first, } /** + * xas_export_node() - remove and return a node from an XArray + * @xas: XArray operation state + * + * The range covered by @xas must be aligned to and cover a single node + * at any level of the tree. + * + * Return: On success, returns the removed node. If the range is invalid, + * returns %NULL and sets -EINVAL in @xas. Otherwise returns %NULL if the + * node does not exist. + */ +struct xa_node *xas_export_node(struct xa_state *xas) +{ + struct xa_node *node; + + if (!xas->xa_shift || xas->xa_sibs) { + xas_set_err(xas, -EINVAL); + return NULL; + } + + xas->xa_shift -= XA_CHUNK_SHIFT; + + if (!xas_find(xas, xas->xa_index)) + return NULL; + node = xas->xa_node; + xas_unlink_node(xas); + node->parent = NULL; + + return node; +} + +/** + * xas_import_node() - add a node to an XArray + * @xas: XArray operation state + * @node: The node to add + * + * The range covered by @xas must be aligned to and cover a single node + * at any level of the tree. No nodes should already exist within the + * range. + * Sets an error in @xas if the range is invalid or xas_create() fails + */ +void xas_import_node(struct xa_state *xas, struct xa_node *node) +{ + struct xa_node *parent = NULL; + void __rcu **slot = &xas->xa->xa_head; + int count = 0; + + if (!xas->xa_shift || xas->xa_sibs) { + xas_set_err(xas, -EINVAL); + return; + } + + if (xas->xa_index || xa_head_locked(xas->xa)) { + xas_set_order(xas, xas->xa_index, node->shift + XA_CHUNK_SHIFT); + xas_create(xas, true); + + if (xas_invalid(xas)) + return; + + parent = xas->xa_node; + } + + if (parent) { + slot = &parent->slots[xas->xa_offset]; + node->offset = xas->xa_offset; + count++; + } + + RCU_INIT_POINTER(node->parent, parent); + node->array = xas->xa; + + rcu_assign_pointer(*slot, xa_mk_node(node)); + + update_node(xas, parent, count, 0); +} + +/** * xa_store_range() - Store this entry at a range of indices in the XArray. * @xa: XArray. * @first: First index to affect. From patchwork Thu May 7 00:42:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532191 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C926281 for ; Thu, 7 May 2020 00:45:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8928120736 for ; Thu, 7 May 2020 00:45:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="e6tBMfZv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8928120736 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AB124900022; Wed, 6 May 2020 20:45:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A899D900002; Wed, 6 May 2020 20:45:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 977D0900022; Wed, 6 May 2020 20:45:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id 7B384900002 for ; Wed, 6 May 2020 20:45:38 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2AEDB181AEF30 for ; Thu, 7 May 2020 00:45:38 +0000 (UTC) X-FDA: 76788079956.26.goose11_31267993c244e X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30056:30064:30070,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: goose11_31267993c244e X-Filterd-Recvd-Size: 9472 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:37 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470d0Ej105240; Thu, 7 May 2020 00:45:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=tO+BBoL6QA83N8AKybR2dW6y4zRDAJKByikm9WLIvJA=; b=e6tBMfZvQU1I7oBrLvbLNbk8ehB/ncx827a2VLGsvklCAyzd8qo2KC8U01bEFq8Xq+yC ezx4eY6eLsFamhw4qwLpItroIYvmiLNZ9TyGONKuxkX/mUMrWXX156AnYAG0PpSS/JpV clkpPQPh2XNuUiPA89c3tqYVFYgYm6zMZ88YPkBCDH1RGsWrLkHoinDtzVy/mNMC6ZoM Eip8SgOqH/NcYWiu7gMFjQfImEuPvQ1iD7Wi1tnf2s81oJbUBGo9TJV65x3YgNuWVZe6 /Z0j4hQIVRZP3FbQbHfeRuL9D25iCA1SP6a7ALImGzCsdL6cCKd65RYuG3KEJHhIYTc9 TA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2120.oracle.com with ESMTP id 30s1gnd8sb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:45:07 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bT2v098614; Thu, 7 May 2020 00:45:06 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 30sjnma6ts-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:45:06 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0470j5lH020685; Thu, 7 May 2020 00:45:05 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:45:04 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 42/43] shmem: reduce time holding xa_lock when inserting pages Date: Wed, 6 May 2020 17:42:08 -0700 Message-Id: <1588812129-8596-43-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Rather than adding one page at a time to the page cache and taking the page cache xarray lock each time, where possible add pages in bulk by first populating an xarray node outside of the page cache before taking the lock to insert it. When a group of pages to be inserted will fill an xarray node, add them to a local xarray, export the xarray node, and then take the lock on the page cache xarray and insert the node. Signed-off-by: Anthony Yznaga --- mm/shmem.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 138 insertions(+), 7 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index f621d863e362..9d3c4e1f2dc1 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -732,17 +732,130 @@ static void shmem_delete_from_page_cache(struct page *page, void *radswap) BUG_ON(error); } +static int shmem_add_aligned_to_page_cache(struct page *pages[], int npages, + struct address_space *mapping, + pgoff_t index, gfp_t gfp, int order) +{ + int xa_shift = order + XA_CHUNK_SHIFT - (order % XA_CHUNK_SHIFT); + XA_STATE_ORDER(xas, &mapping->i_pages, index, xa_shift); + struct xa_state *xas_ptr = &xas; + struct xarray xa_tmp; + /* + * Specify order so xas_create_range() only needs to be called once + * to allocate the entire range. This guarantees that xas_store() + * will not fail due to lack of memory. + * Specify index == 0 so the minimum necessary nodes are allocated. + */ + XA_STATE_ORDER(xas_tmp, &xa_tmp, 0, xa_shift); + unsigned long nr = 1UL << order; + struct xa_node *node; + int i; + + if (npages * nr != 1 << xa_shift) { + WARN_ONCE(1, "npages (%d) not aligned to xa_shift\n", npages); + return -EINVAL; + } + if (!IS_ALIGNED(index, 1 << xa_shift)) { + WARN_ONCE(1, "index (%lu) not aligned to xa_shift\n", index); + return -EINVAL; + } + + for (i = 0; i < npages; i++) { + VM_BUG_ON_PAGE(PageTail(pages[i]), pages[i]); + VM_BUG_ON_PAGE(!PageLocked(pages[i]), pages[i]); + VM_BUG_ON_PAGE(!PageSwapBacked(pages[i]), pages[i]); + + page_ref_add(pages[i], nr); + pages[i]->mapping = mapping; + pages[i]->index = index + (i * nr); + } + + xa_init(&xa_tmp); + do { + xas_lock(&xas_tmp); + xas_create_range(&xas_tmp); + if (xas_error(&xas_tmp)) + goto unlock; + for (i = 0; i < npages; i++) { + int j = 0; +next: + xas_store(&xas_tmp, pages[i]); + if (++j < nr) { + xas_next(&xas_tmp); + goto next; + } + if (i < npages - 1) + xas_next(&xas_tmp); + } + xas_set_order(&xas_tmp, 0, xa_shift); + node = xas_export_node(&xas_tmp); +unlock: + xas_unlock(&xas_tmp); + } while (xas_nomem(&xas_tmp, gfp)); + + if (xas_error(&xas_tmp)) { + xas_ptr = &xas_tmp; + goto error; + } + + do { + xas_lock_irq(&xas); + xas_import_node(&xas, node); + if (xas_error(&xas)) + goto unlock1; + mapping->nrpages += nr * npages; + xas_unlock(&xas); + for (i = 0; i < npages; i++) { + __mod_node_page_state(page_pgdat(pages[i]), NR_FILE_PAGES, nr); + __mod_node_page_state(page_pgdat(pages[i]), NR_SHMEM, nr); + if (PageTransHuge(pages[i])) { + count_vm_event(THP_FILE_ALLOC); + __inc_node_page_state(pages[i], NR_SHMEM_THPS); + } + } + local_irq_enable(); + break; +unlock1: + xas_unlock_irq(&xas); + } while (xas_nomem(&xas, gfp)); + + if (!xas_error(&xas)) + return 0; + +error: + for (i = 0; i < npages; i++) { + pages[i]->mapping = NULL; + page_ref_sub(pages[i], nr); + } + return xas_error(xas_ptr); +} + static int shmem_add_pages_to_cache(struct page *pages[], int npages, struct address_space *mapping, pgoff_t start, gfp_t gfp) { pgoff_t index = start; int err = 0; - int i; + int i, j; i = 0; while (i < npages) { if (PageTransHuge(pages[i])) { + if (IS_ALIGNED(index, 4096) && i+8 <= npages) { + for (j = 1; j < 8; j++) { + if (!PageTransHuge(pages[i+j])) + break; + } + if (j == 8) { + err = shmem_add_aligned_to_page_cache(&pages[i], 8, mapping, index, gfp, HPAGE_PMD_ORDER); + if (err) + goto done; + index += HPAGE_PMD_NR * 8; + i += 8; + continue; + } + } + err = shmem_add_to_page_cache_fast(pages[i], mapping, index, gfp); if (err) break; @@ -751,13 +864,31 @@ static int shmem_add_pages_to_cache(struct page *pages[], int npages, continue; } - err = shmem_add_to_page_cache_fast(pages[i], mapping, index, gfp); - if (err) - break; - index++; - i++; - } + for (j = 1; i + j < npages; j++) { + if (PageTransHuge(pages[i + j])) + break; + } + + while (j > 0) { + if (IS_ALIGNED(index, 64) && j >= 64) { + err = shmem_add_aligned_to_page_cache(&pages[i], 64, mapping, index, gfp, 0); + if (err) + goto done; + index += 64; + i += 64; + j -= 64; + continue; + } + err = shmem_add_to_page_cache_fast(pages[i], mapping, index, gfp); + if (err) + goto done; + index++; + i++; + j--; + } + } +done: return err; } From patchwork Thu May 7 00:42:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 11532199 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 731EA81 for ; Thu, 7 May 2020 00:45:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 40D9E20736 for ; Thu, 7 May 2020 00:45:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="B2ieybBw" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40D9E20736 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 42791900002; Wed, 6 May 2020 20:45:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 362A0900023; Wed, 6 May 2020 20:45:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 188AF900024; Wed, 6 May 2020 20:45:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84]) by kanga.kvack.org (Postfix) with ESMTP id E16D6900023 for ; Wed, 6 May 2020 20:45:41 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id AABFC180AD82F for ; Thu, 7 May 2020 00:45:41 +0000 (UTC) X-FDA: 76788080082.13.coal52_31aa0ee3b1a5b X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,anthony.yznaga@oracle.com,,RULES_HIT:30054:30064,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:0,LUA_SUMMARY:none X-HE-Tag: coal52_31aa0ee3b1a5b X-Filterd-Recvd-Size: 5789 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 00:45:41 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470ep5K098400; Thu, 7 May 2020 00:45:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=GHyTm1dF4tuIvRr/Sme8g0LfjNUADURL5kv1ja2d5gg=; b=B2ieybBw6yGUeCUHVlAsRBnlc2/4Xd/KOmp96lhPgAt8aY+OY0nx5m0HV49zY4rmmwNJ C2s+DocZP+0TkwaFub5pxJI6VLiZDTcntuJuwRiK6BhA4pqOyBsR5hSUgs8TBMEtVRwq OElN/qCU2uFHMkJBBKXftNAEdERNSddgfIfiOzBTs8P4ylt3KV1I3R+rr5nplsc4zmhK ndh51F39UD6VOjw/FzRcndPUY74810gTKBZWOztHuZiGWlEMz8N9DZFuEnfShhYFDYJm 6QwLRCsTOBx7pn2Mq4SU4ShEr7RZ2iKufkGsu48ZeUFaikGwL8cPgZL4TT7Ejf7MNy1O 3A== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 30usgq4h41-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:45:12 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0470bUBT098659; Thu, 7 May 2020 00:45:12 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3020.oracle.com with ESMTP id 30sjnma71q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2020 00:45:11 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0470j8I8025069; Thu, 7 May 2020 00:45:08 GMT Received: from ayz-linux.localdomain (/68.7.158.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 06 May 2020 17:45:08 -0700 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@linux.ibm.com, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, masahiroy@kernel.org, ardb@kernel.org, ndesaulniers@google.com, dima@golovin.in, daniel.kiper@oracle.com, nivedita@alum.mit.edu, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, zhenzhong.duan@oracle.com, jroedel@suse.de, bhe@redhat.com, guro@fb.com, Thomas.Lendacky@amd.com, andriy.shevchenko@linux.intel.com, keescook@chromium.org, hannes@cmpxchg.org, minchan@kernel.org, mhocko@kernel.org, ying.huang@intel.com, yang.shi@linux.alibaba.com, gustavo@embeddedor.com, ziqian.lzq@antfin.com, vdavydov.dev@gmail.com, jason.zeng@intel.com, kevin.tian@intel.com, zhiyuan.lv@intel.com, lei.l.li@intel.com, paul.c.lai@intel.com, ashok.raj@intel.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org Subject: [RFC 43/43] PKRAM: improve index alignment of pkram_link entries Date: Wed, 6 May 2020 17:42:09 -0700 Message-Id: <1588812129-8596-44-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> References: <1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9613 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005070001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To take advantage of optimizations when adding pages to the page cache via shmem_insert_pages(), improve the likelihood that the pages array passed to shmem_insert_pages() starts on an aligned index. Do this when preserving pages by starting a new pkram_link page when the current page is aligned and the next aligned page will not fit on the pkram_link page. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/mm/pkram.c b/mm/pkram.c index ef092aa5ce7a..416c3ca4411b 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -913,11 +913,21 @@ static int __pkram_save_page(struct pkram_stream *ps, { struct pkram_link *link = ps->link; struct pkram_obj *obj = ps->obj; + int order, align, align_cnt; pkram_entry_t p; - int order; + + if (PageTransHuge(page)) { + align = 1 << (HPAGE_PMD_ORDER + XA_CHUNK_SHIFT - (HPAGE_PMD_ORDER % XA_CHUNK_SHIFT)); + align_cnt = align >> HPAGE_PMD_ORDER; + } else { + align = XA_CHUNK_SIZE; + align_cnt = XA_CHUNK_SIZE; + } if (!link || ps->entry_idx >= PKRAM_LINK_ENTRIES_MAX || - index != ps->next_index) { + index != ps->next_index || + (IS_ALIGNED(index, align) && + (ps->entry_idx + align_cnt > PKRAM_LINK_ENTRIES_MAX))) { struct page *link_page; link_page = pkram_alloc_page((ps->gfp_mask & GFP_RECLAIM_MASK) |