From patchwork Mon Dec 7 11:30:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B2D8C433FE for ; Mon, 7 Dec 2020 11:31:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A6ED02339F for ; Mon, 7 Dec 2020 11:31:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A6ED02339F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2825B8D0003; Mon, 7 Dec 2020 06:31:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 20CBD8D0001; Mon, 7 Dec 2020 06:31:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AD518D0003; Mon, 7 Dec 2020 06:31:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id E4A828D0001 for ; Mon, 7 Dec 2020 06:31:46 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9A046180AD806 for ; Mon, 7 Dec 2020 11:31:46 +0000 (UTC) X-FDA: 77566271412.22.cover10_050bed7273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 778B418038E68 for ; Mon, 7 Dec 2020 11:31:46 +0000 (UTC) X-HE-Tag: cover10_050bed7273de X-Filterd-Recvd-Size: 13067 Received: from mail-pg1-f193.google.com (mail-pg1-f193.google.com [209.85.215.193]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:31:45 +0000 (UTC) Received: by mail-pg1-f193.google.com with SMTP id w4so8621473pgg.13 for ; Mon, 07 Dec 2020 03:31:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ewpiaZqTRGdJ05S6FvGPc0qF7vcc593bt6ShYsuckHQ=; b=RPs0JFNrA34b2v6m4o0WKzzEMoIgFha8rYwcJlJYFwLi4ku7d2HYbJ7wKa//+yfUBX tJ3QIj9uXjYAV7QDBPF4i8kWN/WukA9vB1cJPlVE8J6zlIPP6dECt3+Gy067gGDUNMfT pq4e3VandG+gI0X1miFP+Ome0AN6naVoR6X8H6B3T3wm6ewpqwDAyeGbGhKie5qRvkry 7yhUTV47Un0jRI9sEq2r17tOIMitEH+ebYmdWLf4zCGTHh9IVoeUf8FE6QbNebsYSzAi pNi6cX4KjaxtUH7vfGNTEGN+cfuBtxZeaa5w5yXo88wxc/a85VdJd282dDdVw8c3iW1Q p9eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ewpiaZqTRGdJ05S6FvGPc0qF7vcc593bt6ShYsuckHQ=; b=UqL4gQKemAJN96c+/4Lb7dg7FJAvptXdVh0l2cFnSVtq3vfpFkvlFCBKa7N+ErJBVR jf0D8bctVlHoXxrVBKnzF/yjeWvRSTTiY8jX+c0y7UyQYnBbDbaDDgxzNhBezS7+1mdd 3w9v//CoA8DSGSmA+OGWx4PaBDYwKiZcUnuVrp4++3M8SnQAPPi5bezdDxIre/sSuu8P i9RV2B2OOVoHbLiLeMYcJ/OO1exGCV1T33asjE2Fi9kW9IQ/sqhYnRxaEMxlSMagVcaM dksIE0Vf3+7GO2KGBr/tHHLotn/YlYr//RFd3bKDDnRKIkopVE14nPoU7gmqr58ygVsZ BT7Q== X-Gm-Message-State: AOAM532ADNEetGRb/So9Zul8ts1qAYkOUdaNKu1CPpVfyh3pr13UtLRB nEsomCIIsw2i5K8p1tRxEZJJhEBnk8c= X-Google-Smtp-Source: ABdhPJyMmiiecC7lQBFkhMTn89Ja1IhAR/H6fUYhXm8PI2Ykcxu8WlFJkRhL9epCDiG6RPld96HGHg== X-Received: by 2002:a17:902:8341:b029:d8:d123:2297 with SMTP id z1-20020a1709028341b02900d8d1232297mr15889510pln.65.1607340704982; Mon, 07 Dec 2020 03:31:44 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.31.41 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:31:44 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 01/37] fs: introduce dmemfs module Date: Mon, 7 Dec 2020 19:30:54 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang dmemfs (Direct Memory filesystem) is device memory or reserved memory based filesystem. This kind of memory is special as it is not managed by kernel and it is not associated with 'struct page'. The original purpose for dmemfs is to drop the usage of 'struct page' to save extra system memory in public cloud enviornment. This patch introduces the basic framework of dmemfs and only mkdir and create regular file are supported. Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- fs/Kconfig | 1 + fs/Makefile | 1 + fs/dmemfs/Kconfig | 13 +++ fs/dmemfs/Makefile | 7 ++ fs/dmemfs/inode.c | 266 +++++++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/magic.h | 1 + 6 files changed, 289 insertions(+) create mode 100644 fs/dmemfs/Kconfig create mode 100644 fs/dmemfs/Makefile create mode 100644 fs/dmemfs/inode.c diff --git a/fs/Kconfig b/fs/Kconfig index aa4c122..18e7208 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -41,6 +41,7 @@ source "fs/btrfs/Kconfig" source "fs/nilfs2/Kconfig" source "fs/f2fs/Kconfig" source "fs/zonefs/Kconfig" +source "fs/dmemfs/Kconfig" config FS_DAX bool "Direct Access (DAX) support" diff --git a/fs/Makefile b/fs/Makefile index 999d1a2..34747ec 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -136,3 +136,4 @@ obj-$(CONFIG_EFIVAR_FS) += efivarfs/ obj-$(CONFIG_EROFS_FS) += erofs/ obj-$(CONFIG_VBOXSF_FS) += vboxsf/ obj-$(CONFIG_ZONEFS_FS) += zonefs/ +obj-$(CONFIG_DMEM_FS) += dmemfs/ diff --git a/fs/dmemfs/Kconfig b/fs/dmemfs/Kconfig new file mode 100644 index 00000000..d2894a5 --- /dev/null +++ b/fs/dmemfs/Kconfig @@ -0,0 +1,13 @@ +config DMEM_FS + tristate "Direct Memory filesystem support" + help + dmemfs (Direct Memory filesystem) is device memory or reserved + memory based filesystem. This kind of memory is special as it + is not managed by kernel and it is without 'struct page'. + + The original purpose of dmemfs is saving extra memory of + 'struct page' that reduces the total cost of ownership (TCO) + for cloud providers. + + To compile this file system support as a module, choose M here: the + module will be called dmemfs. diff --git a/fs/dmemfs/Makefile b/fs/dmemfs/Makefile new file mode 100644 index 00000000..73bdc9c --- /dev/null +++ b/fs/dmemfs/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for the linux dmem-filesystem routines. +# +obj-$(CONFIG_DMEM_FS) += dmemfs.o + +dmemfs-y += inode.o diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c new file mode 100644 index 00000000..0aa3d3b --- /dev/null +++ b/fs/dmemfs/inode.c @@ -0,0 +1,266 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * linux/fs/dmemfs/inode.c + * + * Authors: + * Xiao Guangrong + * Chen Zhuo + * Haiwei Li + * Yulei Zhang + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +MODULE_AUTHOR("Tencent Corporation"); +MODULE_LICENSE("GPL v2"); + +struct dmemfs_mount_opts { + unsigned long dpage_size; +}; + +struct dmemfs_fs_info { + struct dmemfs_mount_opts mount_opts; +}; + +enum dmemfs_param { + Opt_dpagesize, +}; + +const struct fs_parameter_spec dmemfs_fs_parameters[] = { + fsparam_string("pagesize", Opt_dpagesize), + {} +}; + +static int check_dpage_size(unsigned long dpage_size) +{ + if (dpage_size != PAGE_SIZE && dpage_size != PMD_SIZE && + dpage_size != PUD_SIZE) + return -EINVAL; + + return 0; +} + +static struct inode * +dmemfs_get_inode(struct super_block *sb, const struct inode *dir, umode_t mode); + +static int +__create_file(struct inode *dir, struct dentry *dentry, umode_t mode) +{ + struct inode *inode = dmemfs_get_inode(dir->i_sb, dir, mode); + int error = -ENOSPC; + + if (inode) { + d_instantiate(dentry, inode); + dget(dentry); /* Extra count - pin the dentry in core */ + error = 0; + dir->i_mtime = dir->i_ctime = current_time(inode); + if (mode & S_IFDIR) + inc_nlink(dir); + } + return error; +} + +static int dmemfs_create(struct inode *dir, struct dentry *dentry, + umode_t mode, bool excl) +{ + return __create_file(dir, dentry, mode | S_IFREG); +} + +static int dmemfs_mkdir(struct inode *dir, struct dentry *dentry, + umode_t mode) +{ + return __create_file(dir, dentry, mode | S_IFDIR); +} + +static const struct inode_operations dmemfs_dir_inode_operations = { + .create = dmemfs_create, + .lookup = simple_lookup, + .unlink = simple_unlink, + .mkdir = dmemfs_mkdir, + .rmdir = simple_rmdir, + .rename = simple_rename, +}; + +static const struct inode_operations dmemfs_file_inode_operations = { + .setattr = simple_setattr, + .getattr = simple_getattr, +}; + +static const struct file_operations dmemfs_file_operations = { +}; + +static int dmemfs_parse_param(struct fs_context *fc, struct fs_parameter *param) +{ + struct dmemfs_fs_info *fsi = fc->s_fs_info; + struct fs_parse_result result; + int opt, ret; + + opt = fs_parse(fc, dmemfs_fs_parameters, param, &result); + if (opt < 0) + return opt; + + switch (opt) { + case Opt_dpagesize: + fsi->mount_opts.dpage_size = memparse(param->string, NULL); + ret = check_dpage_size(fsi->mount_opts.dpage_size); + if (ret) { + pr_warn("dmemfs: unknown pagesize %x.\n", + result.uint_32); + return ret; + } + break; + default: + pr_warn("dmemfs: unknown mount option [%x].\n", + opt); + return -EINVAL; + } + + return 0; +} + +struct inode *dmemfs_get_inode(struct super_block *sb, + const struct inode *dir, umode_t mode) +{ + struct inode *inode = new_inode(sb); + + if (inode) { + inode->i_ino = get_next_ino(); + inode_init_owner(inode, dir, mode); + inode->i_mapping->a_ops = &empty_aops; + mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); + mapping_set_unevictable(inode->i_mapping); + inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode); + switch (mode & S_IFMT) { + default: + init_special_inode(inode, mode, 0); + break; + case S_IFREG: + inode->i_op = &dmemfs_file_inode_operations; + inode->i_fop = &dmemfs_file_operations; + break; + case S_IFDIR: + inode->i_op = &dmemfs_dir_inode_operations; + inode->i_fop = &simple_dir_operations; + + /* + * directory inodes start off with i_nlink == 2 + * (for "." entry) + */ + inc_nlink(inode); + break; + case S_IFLNK: + inode->i_op = &page_symlink_inode_operations; + break; + } + } + return inode; +} + +static int dmemfs_statfs(struct dentry *dentry, struct kstatfs *buf) +{ + simple_statfs(dentry, buf); + buf->f_bsize = dentry->d_sb->s_blocksize; + + return 0; +} + +static const struct super_operations dmemfs_ops = { + .statfs = dmemfs_statfs, + .drop_inode = generic_delete_inode, +}; + +static int +dmemfs_fill_super(struct super_block *sb, struct fs_context *fc) +{ + struct inode *inode; + struct dmemfs_fs_info *fsi = sb->s_fs_info; + + sb->s_maxbytes = MAX_LFS_FILESIZE; + sb->s_blocksize = fsi->mount_opts.dpage_size; + sb->s_blocksize_bits = ilog2(fsi->mount_opts.dpage_size); + sb->s_magic = DMEMFS_MAGIC; + sb->s_op = &dmemfs_ops; + sb->s_time_gran = 1; + + inode = dmemfs_get_inode(sb, NULL, S_IFDIR); + sb->s_root = d_make_root(inode); + if (!sb->s_root) + return -ENOMEM; + + return 0; +} + +static int dmemfs_get_tree(struct fs_context *fc) +{ + return get_tree_nodev(fc, dmemfs_fill_super); +} + +static void dmemfs_free_fc(struct fs_context *fc) +{ + kfree(fc->s_fs_info); +} + +static const struct fs_context_operations dmemfs_context_ops = { + .free = dmemfs_free_fc, + .parse_param = dmemfs_parse_param, + .get_tree = dmemfs_get_tree, +}; + +int dmemfs_init_fs_context(struct fs_context *fc) +{ + struct dmemfs_fs_info *fsi; + + fsi = kzalloc(sizeof(*fsi), GFP_KERNEL); + if (!fsi) + return -ENOMEM; + + fsi->mount_opts.dpage_size = PAGE_SIZE; + fc->s_fs_info = fsi; + fc->ops = &dmemfs_context_ops; + return 0; +} + +static void dmemfs_kill_sb(struct super_block *sb) +{ + kill_litter_super(sb); +} + +static struct file_system_type dmemfs_fs_type = { + .owner = THIS_MODULE, + .name = "dmemfs", + .init_fs_context = dmemfs_init_fs_context, + .kill_sb = dmemfs_kill_sb, +}; + +static int __init dmemfs_init(void) +{ + int ret; + + ret = register_filesystem(&dmemfs_fs_type); + + return ret; +} + +static void __exit dmemfs_uninit(void) +{ + unregister_filesystem(&dmemfs_fs_type); +} + +module_init(dmemfs_init) +module_exit(dmemfs_uninit) diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index f3956fc..3fbd066 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -97,5 +97,6 @@ #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ #define Z3FOLD_MAGIC 0x33 #define PPC_CMM_MAGIC 0xc7571590 +#define DMEMFS_MAGIC 0x2ace90c6 #endif /* __LINUX_MAGIC_H__ */ From patchwork Mon Dec 7 11:30:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1D40C433FE for ; Mon, 7 Dec 2020 11:33:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD667233A0 for ; Mon, 7 Dec 2020 11:33:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD667233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F03908D0005; Mon, 7 Dec 2020 06:33:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EB2598D0001; Mon, 7 Dec 2020 06:33:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7C5C8D0005; Mon, 7 Dec 2020 06:33:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0143.hostedemail.com [216.40.44.143]) by kanga.kvack.org (Postfix) with ESMTP id B99E08D0001 for ; Mon, 7 Dec 2020 06:33:30 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 751F58249980 for ; Mon, 7 Dec 2020 11:33:30 +0000 (UTC) X-FDA: 77566275780.10.shame79_0809807273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 3205B16A0B9 for ; Mon, 7 Dec 2020 11:33:30 +0000 (UTC) X-HE-Tag: shame79_0809807273de X-Filterd-Recvd-Size: 20214 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:33:29 +0000 (UTC) Received: by mail-pf1-f171.google.com with SMTP id 131so9596667pfb.9 for ; Mon, 07 Dec 2020 03:33:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aHAaJZi4VQAK0yBssImPSXAXLPsQCU0DdRybKh82TfM=; b=FrpcGDmXQu7AxQyiNFxzQHlkKAmF8U/kxPQ8ibnwy+4FIWEA2w4Tp+hinEQx/f2s8C uBSBW9ubA9c1qzZCth4xGF/xVVOG/Wrk2F3W2/gNzLGYQu5k5zu8Zh/YHL36I+/Ri1Ry o47M9yB8l4QxB3btGHSMgZfdajF5cwyV8iAWGojkCSkPOwlRq23xXkzLI59awhcZwpGU 03eLKJREMJLwsLCO7Oj8N2s0rFDuxYVff2buZlOEF0ptz43qd52CasDbGpNINSHlkTUX kDvBei16cnJCFMAy3LloT9iF/QFNsCSPspaK9P3wqHq/Lgmass178kFe05z0W09loOFT SK7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aHAaJZi4VQAK0yBssImPSXAXLPsQCU0DdRybKh82TfM=; b=kKRmliYKZebCxi66mngzJxq0VgYOLD5fs0fcCXyzdWomIhTEqfoltEsgWAsQAbRmrI ZPi/0iA5w+HJuwDg+0O16/DdabQVkk/thl0VhilD7aPed6hSZxMG9RsNNgIWgB5/N5FS O/cJA8ksA4g2hrsKC4ERsb9fhfO4eRnJNcsDTy5trSpvi0KCb09m9QkT2hd6Y86BU/mn BTgYeiik2sQppauEKal0THrh5ioLfHG1H4HAqrJRQa1GQFD0llPAw2ev9hBq2mKgaGOK sWLiJyxuwiqzCs/meNdoIHnuP9Py1zvhvKxJmSpxZdYtRik0sc0UYBN3N1qr7vNYFCN+ W1Hg== X-Gm-Message-State: AOAM532u9JyZd87k+hryyN6jc2NJgsS1ZX60v12yGPNF2Xv9zxdL5nRq h2h1nOy5zo3Rbz00dLQkxnxRV2QOiqw= X-Google-Smtp-Source: ABdhPJxcIYSZ+aoxUJo39tPUtbCw9RyCJxtFEXKfpOIRIp5yehcE9scRL2sUinFiN1S8sA7OsPpSOg== X-Received: by 2002:a63:4703:: with SMTP id u3mr18044209pga.199.1607340808062; Mon, 07 Dec 2020 03:33:28 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.33.25 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:33:27 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 02/37] mm: support direct memory reservation Date: Mon, 7 Dec 2020 19:30:55 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Introduce 'dmem=' to reserve system memory for DMEM (direct memory), comparing with 'mem=' and 'memmap', it reserves memory based on the topology of NUMA, for the detailed info, please refer to kernel-parameters.txt Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- Documentation/admin-guide/kernel-parameters.txt | 38 +++ arch/x86/kernel/setup.c | 3 + include/linux/dmem.h | 16 ++ mm/Kconfig | 8 + mm/Makefile | 1 + mm/dmem.c | 137 +++++++++++ mm/dmem_reserve.c | 303 ++++++++++++++++++++++++ 7 files changed, 506 insertions(+) create mode 100644 include/linux/dmem.h create mode 100644 mm/dmem.c create mode 100644 mm/dmem_reserve.c diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 526d65d..78caf11 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -991,6 +991,44 @@ The filter can be disabled or changed to another driver later using sysfs. + dmem=[!]size[KMG] + [KNL, NUMA] When CONFIG_DMEM is set, this means + the size of memory reserved for dmemfs on each NUMA + memory node and 'size' must be aligned to the default + alignment that is the size of memory section which is + 128M by default on x86_64. If set '!', such amount of + memory on each node will be owned by kernel and dmemfs + owns the rest of memory on each node. + Example: Reserve 4G memory on each node for dmemfs + dmem = 4G + + dmem=[!]size[KMG]:align[KMG] + [KNL, NUMA] Ditto. 'align' should be power of two and + not smaller than the default alignment. Also 'size' + must be aligned to 'align'. + Example: Bad dmem parameter because 'size' misaligned + dmem=0x40200000:1G + + dmem=size[KMG]@addr[KMG] + [KNL] When CONFIG_DMEM is set, this marks specific + memory as reserved for dmemfs. Region of memory will be + used by dmemfs, from addr to addr + size. Reserving a + certain memory region for kernel is illegal so '!' is + forbidden. Should not assign 'addr' to 0 because kernel + will occupy fixed memory region beginning at 0 address. + Ditto, 'size' and 'addr' must be aligned to default + alignment. + Example: Exclude memory from 5G-6G for dmemfs. + dmem=1G@5G + + dmem=size[KMG]@addr[KMG]:align[KMG] + [KNL] Ditto. 'align' should be power of two and + not smaller than the default alignment. Also 'size' + and 'addr' must be aligned to 'align'. Specially, + '@addr' and ':align' could occur in any order. + Example: Exclude memory from 5G-6G for dmemfs. + dmem=1G:1G@5G + driver_async_probe= [KNL] List of driver names to be probed asynchronously. Format: ,... diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 84f581c..9d05e1b 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -48,6 +48,7 @@ #include #include #include +#include /* * max_low_pfn_mapped: highest directly mapped pfn < 4 GB @@ -1149,6 +1150,8 @@ void __init setup_arch(char **cmdline_p) if (boot_cpu_has(X86_FEATURE_GBPAGES)) hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT); + dmem_reserve_init(); + /* * Reserve memory for crash kernel after SRAT is parsed so that it * won't consume hotpluggable memory. diff --git a/include/linux/dmem.h b/include/linux/dmem.h new file mode 100644 index 00000000..5049322 --- /dev/null +++ b/include/linux/dmem.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _LINUX_DMEM_H +#define _LINUX_DMEM_H + +#ifdef CONFIG_DMEM +int dmem_reserve_init(void); +void dmem_init(void); +int dmem_region_register(int node, phys_addr_t start, phys_addr_t end); + +#else +static inline int dmem_reserve_init(void) +{ + return 0; +} +#endif +#endif /* _LINUX_DMEM_H */ diff --git a/mm/Kconfig b/mm/Kconfig index d42423f..3a6d408 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -226,6 +226,14 @@ config BALLOON_COMPACTION scenario aforementioned and helps improving memory defragmentation. # +# support for direct memory basics +config DMEM + bool "Direct Memory Reservation" + depends on SPARSEMEM + help + Allow reservation of memory which could be for the dedicated use of dmem. + It's the basis of dmemfs. + # support for memory compaction config COMPACTION bool "Allow for memory compaction" diff --git a/mm/Makefile b/mm/Makefile index d73aed0..775c8518 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -120,3 +120,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o +obj-$(CONFIG_DMEM) += dmem.o dmem_reserve.o diff --git a/mm/dmem.c b/mm/dmem.c new file mode 100644 index 00000000..b5fb4f1 --- /dev/null +++ b/mm/dmem.c @@ -0,0 +1,137 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * memory management for dmemfs + * + * Authors: + * Xiao Guangrong + * Chen Zhuo + * Haiwei Li + * Yulei Zhang + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * There are two kinds of page in dmem management: + * - nature page, it's the CPU's page size, i.e, 4K on x86 + * + * - dmem page, it's the unit size used by dmem itself to manage all + * registered memory. It's set by dmem_alloc_init() + */ +struct dmem_region { + /* original registered memory region */ + phys_addr_t reserved_start_addr; + phys_addr_t reserved_end_addr; + + /* memory region aligned to dmem page */ + phys_addr_t dpage_start_pfn; + phys_addr_t dpage_end_pfn; + + /* + * avoid memory allocation if the dmem region is small enough + */ + unsigned long static_bitmap; + unsigned long *bitmap; + u64 next_free_pos; + struct list_head node; + + unsigned long static_error_bitmap; + unsigned long *error_bitmap; +}; + +/* + * statically define number of regions to avoid allocating memory + * dynamically from memblock as slab is not available at that time + */ +#define DMEM_REGION_PAGES 2 +#define INIT_REGION_NUM \ + ((DMEM_REGION_PAGES << PAGE_SHIFT) / sizeof(struct dmem_region)) + +static struct dmem_region static_regions[INIT_REGION_NUM]; + +struct dmem_node { + unsigned long total_dpages; + unsigned long free_dpages; + + /* fallback list for allocation */ + int nodelist[MAX_NUMNODES]; + struct list_head regions; +}; + +struct dmem_pool { + struct mutex lock; + + unsigned long region_num; + unsigned long registered_pages; + unsigned long unaligned_pages; + + /* shift bits of dmem page */ + unsigned long dpage_shift; + + unsigned long total_dpages; + unsigned long free_dpages; + + /* + * increased when allocator is initialized, + * stop it being destroyed when someone is + * still using it + */ + u64 user_count; + struct dmem_node nodes[MAX_NUMNODES]; +}; + +static struct dmem_pool dmem_pool = { + .lock = __MUTEX_INITIALIZER(dmem_pool.lock), +}; + +#define for_each_dmem_node(_dnode) \ + for (_dnode = dmem_pool.nodes; \ + _dnode < dmem_pool.nodes + ARRAY_SIZE(dmem_pool.nodes); \ + _dnode++) + +void __init dmem_init(void) +{ + struct dmem_node *dnode; + + pr_info("dmem: pre-defined region: %ld\n", INIT_REGION_NUM); + + for_each_dmem_node(dnode) + INIT_LIST_HEAD(&dnode->regions); +} + +/* + * register the memory region to dmem pool as freed memory, the region + * should be properly aligned to PAGE_SIZE at least + * + * it's safe to be out of dmem_pool's lock as it's used at the very + * beginning of system boot + */ +int dmem_region_register(int node, phys_addr_t start, phys_addr_t end) +{ + struct dmem_region *dregion; + + pr_info("dmem: register region [%#llx - %#llx] on node %d.\n", + (unsigned long long)start, (unsigned long long)end, node); + + if (unlikely(dmem_pool.region_num >= INIT_REGION_NUM)) { + pr_err("dmem: region is not sufficient.\n"); + return -ENOMEM; + } + + dregion = &static_regions[dmem_pool.region_num++]; + dregion->reserved_start_addr = start; + dregion->reserved_end_addr = end; + + list_add_tail(&dregion->node, &dmem_pool.nodes[node].regions); + dmem_pool.registered_pages += __phys_to_pfn(end) - + __phys_to_pfn(start); + return 0; +} + diff --git a/mm/dmem_reserve.c b/mm/dmem_reserve.c new file mode 100644 index 00000000..567ee9f --- /dev/null +++ b/mm/dmem_reserve.c @@ -0,0 +1,303 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Support reserved memory for dmem. + * As dmem_reserve_init will adjust memblock to reserve memory + * for dmem, we could save a vast amount of memory for 'struct page'. + * + * Authors: + * Xiao Guangrong + */ +#include +#include +#include +#include +#include + +struct dmem_param { + phys_addr_t base; + phys_addr_t size; + phys_addr_t align; + /* + * If set to 1, dmem_param specified requested memory for kernel, + * otherwise for dmem. + */ + bool resv_kernel; +}; + +static struct dmem_param dmem_param __initdata; + +/* Check dmem param defined by user to match dmem align */ +static int __init check_dmem_param(bool resv_kernel, phys_addr_t base, + phys_addr_t size, phys_addr_t align) +{ + phys_addr_t min_align = 1UL << SECTION_SIZE_BITS; + + if (!align) + align = min_align; + + /* + * the reserved region should be aligned to memory section + * at least + */ + if (align < min_align) { + pr_warn("dmem: 'align' should be %#llx at least to be aligned to memory section.\n", + min_align); + return -EINVAL; + } + + if (!is_power_of_2(align)) { + pr_warn("dmem: 'align' should be power of 2.\n"); + return -EINVAL; + } + + if (base & (align - 1)) { + pr_warn("dmem: 'addr' is unaligned to 'align' in dmem=\n"); + return -EINVAL; + } + + if (size & (align - 1)) { + pr_warn("dmem: 'size' is unaligned to 'align' in dmem=\n"); + return -EINVAL; + } + + if (base >= base + size) { + pr_warn("dmem: 'addr + size' overflow in dmem=\n"); + return -EINVAL; + } + + if (resv_kernel && base) { + pr_warn("dmem: take a certain base address for kernel is illegal\n"); + return -EINVAL; + } + + dmem_param.base = base; + dmem_param.size = size; + dmem_param.align = align; + dmem_param.resv_kernel = resv_kernel; + + pr_info("dmem: parameter: base address %#llx size %#llx align %#llx resv_kernel %d\n", + (unsigned long long)base, (unsigned long long)size, + (unsigned long long)align, resv_kernel); + return 0; +} + +static int __init parse_dmem(char *p) +{ + phys_addr_t base, size, align; + char *oldp; + bool resv_kernel = false; + + if (!p) + return -EINVAL; + + base = align = 0; + + if (*p == '!') { + resv_kernel = true; + p++; + } + + oldp = p; + size = memparse(p, &p); + if (oldp == p) + return -EINVAL; + + if (!size) { + pr_warn("dmem: 'size' of 0 defined in dmem=, or {invalid} param\n"); + return -EINVAL; + } + + while (*p) { + phys_addr_t *pvalue; + + switch (*p) { + case '@': + pvalue = &base; + break; + case ':': + pvalue = &align; + break; + default: + pr_warn("dmem: unknown indicator: %c in dmem=\n", *p); + return -EINVAL; + } + + /* + * Some attribute had been specified multiple times. + * This is not allowed. + */ + if (*pvalue) + return -EINVAL; + + oldp = ++p; + *pvalue = memparse(p, &p); + if (oldp == p) + return -EINVAL; + + if (*pvalue == 0) { + pr_warn("dmem: 'addr' or 'align' should not be set to 0\n"); + return -EINVAL; + } + } + + return check_dmem_param(resv_kernel, base, size, align); +} + +early_param("dmem", parse_dmem); + +/* + * We wanna remove a memory range from memblock.memory thoroughly. + * As isolating memblock.memory in memblock_remove needs to double + * the array of memblock_region, allocated memory for new array maybe + * locate in the memory range which we wanna to remove. + * So, conflict. + * To resolve this conflict, here reserve this memory range firstly. + * While reserving this memory range, isolating memory.reserved will allocate + * memory excluded from memory range which to be removed. So following + * double array in memblock_remove can't observe this reserved range. + */ +static void __init dmem_remove_memblock(phys_addr_t base, phys_addr_t size) +{ + memblock_reserve(base, size); + memblock_remove(base, size); + memblock_free(base, size); +} + +static u64 node_req_mem[MAX_NUMNODES] __initdata; + +/* Reserve certain size of memory for dmem in each numa node */ +static void __init dmem_reserve_size(phys_addr_t size, phys_addr_t align, + bool resv_kernel) +{ + phys_addr_t start, end; + u64 i; + int nid; + + /* Calculate available free memory on each node */ + for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, + &end, &nid) + node_req_mem[nid] += end - start; + + /* Calculate memory size needed to reserve on each node for dmem */ + for (i = 0; i < MAX_NUMNODES; i++) { + node_req_mem[i] = ALIGN(node_req_mem[i], align); + + if (!resv_kernel) { + node_req_mem[i] = min(size, node_req_mem[i]); + continue; + } + + /* leave dmem_param.size memory for kernel */ + if (node_req_mem[i] > size) + node_req_mem[i] = node_req_mem[i] - size; + else + node_req_mem[i] = 0; + } + +retry: + for_each_free_mem_range_reverse(i, NUMA_NO_NODE, MEMBLOCK_NONE, + &start, &end, &nid) { + /* Well, we have got enough memory for this node. */ + if (!node_req_mem[nid]) + continue; + + start = round_up(start, align); + end = round_down(end, align); + /* Skip memblock_region which is too small */ + if (start >= end) + continue; + + /* Towards memory block at higher address */ + start = end - min((end - start), node_req_mem[nid]); + + /* + * do not have enough resource to save the region, skip it + * from now on + */ + if (dmem_region_register(nid, start, end) < 0) + break; + + dmem_remove_memblock(start, end - start); + + node_req_mem[nid] -= end - start; + + /* We have dropped a memblock, so re-walk it. */ + goto retry; + } + + for (i = 0; i < MAX_NUMNODES; i++) { + if (!node_req_mem[i]) + continue; + + pr_info("dmem: %#llx size of memory is not reserved on node %lld due to misaligned regions.\n", + (unsigned long long)size, i); + } + +} + +/* Reserve [base, base + size) for dmem. */ +static void __init +dmem_reserve_region(phys_addr_t base, phys_addr_t size, phys_addr_t align) +{ + phys_addr_t start, end; + phys_addr_t p_start, p_end; + u64 i; + int nid; + + p_start = base; + p_end = base + size; + +retry: + for_each_free_mem_range_reverse(i, NUMA_NO_NODE, MEMBLOCK_NONE, + &start, &end, &nid) { + /* Find region located in user defined range. */ + if (start >= p_end || end <= p_start) + continue; + + start = round_up(max(start, p_start), align); + end = round_down(min(end, p_end), align); + if (start >= end) + continue; + + if (dmem_region_register(nid, start, end) < 0) + break; + + dmem_remove_memblock(start, end - start); + + size -= end - start; + if (!size) + return; + + /* We have dropped a memblock, so re-walk it. */ + goto retry; + } + + pr_info("dmem: %#llx size of memory is not reserved for dmem due to holes and misaligned regions in [%#llx, %#llx].\n", + (unsigned long long)size, (unsigned long long)base, + (unsigned long long)(base + size)); +} + +/* Reserve memory for dmem */ +int __init dmem_reserve_init(void) +{ + phys_addr_t base, size, align; + bool resv_kernel; + + dmem_init(); + + base = dmem_param.base; + size = dmem_param.size; + align = dmem_param.align; + resv_kernel = dmem_param.resv_kernel; + + /* Dmem param had not been enabled. */ + if (size == 0) + return 0; + + if (base) + dmem_reserve_region(base, size, align); + else + dmem_reserve_size(size, align, resv_kernel); + + return 0; +} From patchwork Mon Dec 7 11:30:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AABFC4361B for ; Mon, 7 Dec 2020 11:33:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C5A4F233A0 for ; Mon, 7 Dec 2020 11:33:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C5A4F233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 47CC18D0006; Mon, 7 Dec 2020 06:33:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 42D078D0001; Mon, 7 Dec 2020 06:33:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 31BF88D0006; Mon, 7 Dec 2020 06:33:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id 132928D0001 for ; Mon, 7 Dec 2020 06:33:35 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C698C180AD80F for ; Mon, 7 Dec 2020 11:33:34 +0000 (UTC) X-FDA: 77566275948.15.car88_40003e4273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 9B7CA1814B0C1 for ; Mon, 7 Dec 2020 11:33:34 +0000 (UTC) X-HE-Tag: car88_40003e4273de X-Filterd-Recvd-Size: 23891 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:33:33 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id t8so9604047pfg.8 for ; Mon, 07 Dec 2020 03:33:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wbJCpaqM/KGCLPuuObH8xrDvyt3YizKo4k6KtxXLqa8=; b=tYZhqaZKJCQqusFMQa99A+bjMfUXsJOlj+grQyGfmYBspUpcJnrDcHnEmK/ckzFusp z0G11NGmU7j3RpvEZRin2QCl3i0tZ48QP5UeonJvYMm6opUVpcRJW1Itn8vLgnl9Kjh7 pHd4U/R2oljBsV+xs1IeJvaEKFiNuh7u/GdZPTfx9XpwYniKuoCGAeI9WPLQD/9NKOOU pxfK2FgGWbMm0aitewEUUCRPb5keEAn4WTVKHuDYBo7MlODbnFZKIuaLGV7YERMhcyNG yO5kd7x9M6b7hQp0OFTSG56tjGhoV++kON4Am2ynECZV58+94FfsYxPTUrdqLk2S6iyf htUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wbJCpaqM/KGCLPuuObH8xrDvyt3YizKo4k6KtxXLqa8=; b=mOA//tHNloti9FwRbrh6jqTSq4+xGW5EvYGihhqEDo0tgvaMJstkQdEOufN5pcOmfK Aq2nFJRXU/i/xc+CunzNMVLe4DAYDYZsFWEMTx1P2UKs9Ci/oEGqvYLb8wWKsD2efFx0 cqDTYsq5bILllaLAoTA1Dc5upcL0X7pORumbhu9g3V8Vwy8WnQLurUm60+7XpofRhdst r6ApCsiS8Q1sBfdAUhwKd+YQ6hyXtwyufhDAnfPa/3+LVZV1HPbCZ87URBe+RpEUqsCD l4NFph3pgLLQ2o9muMNcqS1NjAzo3Bs8tXn2joaYaU0ciIX6XnT4MbMH1HNSr9Tpa5LN KT5g== X-Gm-Message-State: AOAM533+5m+jiuPO3ZyXY1rVLtj22TvpOpHoF7Xh5941PxBi+qai9wZz 1svqAJ439FROjKG1k06ozzNbEX6lRvM= X-Google-Smtp-Source: ABdhPJxECpmJH1pT5gWDW0HE1gPF5QZ0ODYFe7yvOkAEZTxOK5iFhSOocfBWRw/rsl1GSq62gRYoww== X-Received: by 2002:a17:902:6803:b029:d6:cf9d:2cfb with SMTP id h3-20020a1709026803b02900d6cf9d2cfbmr15556617plk.55.1607340812726; Mon, 07 Dec 2020 03:33:32 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.33.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:33:32 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 03/37] dmem: implement dmem memory management Date: Mon, 7 Dec 2020 19:30:56 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang The belowing figure shows the topology of dmem memory management, it reserves a few memory regions from the numa nodes, and in each region it leverages the bitmap to track the actual memory usage. +------+-+-------+---------+ | Node0| | ... | NodeN | +--/-\-+-+-------+---------+ / \ +---v----+v-----+----------+ |region 0| ... | region n | +--/---\-+------+----------+ / \ +-+v+------v-------+-+-+-+ | | | bitmap | | | | +-+-+--------------+-+-+-+ It introduces the interfaces to manage dmem pages that include: - dmem_region_register(), it registers the reserved memory to the dmem management system - dmem_alloc_init(), initiate dmem allocator, note the page size the allocator used isn't the same thing with the alignment used to reserve dmem memory - dmem_alloc_pages_vma() and dmem_free_pages() are the interfaces allocating and freeing dmem memory, multiple pages can be allocated at one time, but it should be power of two Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- include/linux/dmem.h | 3 + mm/dmem.c | 674 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 677 insertions(+) diff --git a/include/linux/dmem.h b/include/linux/dmem.h index 5049322..476a82e 100644 --- a/include/linux/dmem.h +++ b/include/linux/dmem.h @@ -7,6 +7,9 @@ void dmem_init(void); int dmem_region_register(int node, phys_addr_t start, phys_addr_t end); +int dmem_alloc_init(unsigned long dpage_shift); +void dmem_alloc_uinit(void); + #else static inline int dmem_reserve_init(void) { diff --git a/mm/dmem.c b/mm/dmem.c index b5fb4f1..a77a064 100644 --- a/mm/dmem.c +++ b/mm/dmem.c @@ -91,11 +91,38 @@ struct dmem_pool { .lock = __MUTEX_INITIALIZER(dmem_pool.lock), }; +#define DMEM_PAGE_SIZE (1UL << dmem_pool.dpage_shift) +#define DMEM_PAGE_UP(x) phys_to_dpage(((x) + DMEM_PAGE_SIZE - 1)) +#define DMEM_PAGE_DOWN(x) phys_to_dpage(x) + +#define dpage_to_phys(_dpage) \ + ((_dpage) << dmem_pool.dpage_shift) +#define phys_to_dpage(_addr) \ + ((_addr) >> dmem_pool.dpage_shift) + +#define dpage_to_pfn(_dpage) \ + (__phys_to_pfn(dpage_to_phys(_dpage))) +#define pfn_to_dpage(_pfn) \ + (phys_to_dpage(__pfn_to_phys(_pfn))) + +#define dnode_to_nid(_dnode) \ + ((_dnode) - dmem_pool.nodes) +#define nid_to_dnode(nid) \ + (&dmem_pool.nodes[nid]) + #define for_each_dmem_node(_dnode) \ for (_dnode = dmem_pool.nodes; \ _dnode < dmem_pool.nodes + ARRAY_SIZE(dmem_pool.nodes); \ _dnode++) +#define for_each_dmem_region(_dnode, _dregion) \ + list_for_each_entry(_dregion, &(_dnode)->regions, node) + +static inline int *dmem_nodelist(int nid) +{ + return nid_to_dnode(nid)->nodelist; +} + void __init dmem_init(void) { struct dmem_node *dnode; @@ -135,3 +162,650 @@ int dmem_region_register(int node, phys_addr_t start, phys_addr_t end) return 0; } +#define PENALTY_FOR_DMEM_SHARED_NODE (1) + +static int dmem_nodeload[MAX_NUMNODES] __initdata; + +/* Evaluate penalty for each dmem node */ +static int __init dmem_evaluate_node(int local, int node) +{ + int penalty; + + /* Use the distance array to find the distance */ + penalty = node_distance(local, node); + + /* Penalize nodes under us ("prefer the next node") */ + penalty += (node < local); + + /* Give preference to headless and unused nodes */ + if (!cpumask_empty(cpumask_of_node(node))) + penalty += PENALTY_FOR_NODE_WITH_CPUS; + + /* Penalize dmem-node shared with kernel */ + if (node_state(node, N_MEMORY)) + penalty += PENALTY_FOR_DMEM_SHARED_NODE; + + /* Slight preference for less loaded node */ + penalty *= (nr_online_nodes * MAX_NUMNODES); + + penalty += dmem_nodeload[node]; + + return penalty; +} + +static int __init find_next_dmem_node(int local, nodemask_t *used_nodes) +{ + struct dmem_node *dnode; + int node, best_node = NUMA_NO_NODE; + int penalty, min_penalty = INT_MAX; + + /* Invalid node is not suitable to call node_distance */ + if (!node_state(local, N_POSSIBLE)) + return NUMA_NO_NODE; + + /* Use the local node if we haven't already */ + if (!node_isset(local, *used_nodes)) { + node_set(local, *used_nodes); + return local; + } + + for_each_dmem_node(dnode) { + if (list_empty(&dnode->regions)) + continue; + + node = dnode_to_nid(dnode); + + /* Don't want a node to appear more than once */ + if (node_isset(node, *used_nodes)) + continue; + + penalty = dmem_evaluate_node(local, node); + + if (penalty < min_penalty) { + min_penalty = penalty; + best_node = node; + } + } + + if (best_node >= 0) + node_set(best_node, *used_nodes); + + return best_node; +} + +static int __init dmem_node_init(struct dmem_node *dnode) +{ + int *nodelist; + nodemask_t used_nodes; + int local, node, prev; + int load; + int i = 0; + + nodelist = dnode->nodelist; + nodes_clear(used_nodes); + local = dnode_to_nid(dnode); + prev = local; + load = nr_online_nodes; + + while ((node = find_next_dmem_node(local, &used_nodes)) >= 0) { + /* + * We don't want to pressure a particular node. + * So adding penalty to the first node in same + * distance group to make it round-robin. + */ + if (node_distance(local, node) != node_distance(local, prev)) + dmem_nodeload[node] = load; + + nodelist[i++] = prev = node; + load--; + } + + return 0; +} + +static void __init dmem_region_uinit(struct dmem_region *dregion) +{ + unsigned long nr_pages, size, *bitmap = dregion->error_bitmap; + + if (!bitmap) + return; + + nr_pages = __phys_to_pfn(dregion->reserved_end_addr) + - __phys_to_pfn(dregion->reserved_start_addr); + + WARN_ON(!nr_pages); + + size = BITS_TO_LONGS(nr_pages) * sizeof(long); + if (size > sizeof(dregion->static_bitmap)) + kfree(bitmap); + dregion->error_bitmap = NULL; +} + +/* + * we only stop allocator to use the reserved page and do not + * reture pages back if anything goes wrong + */ +static void __init dmem_uinit(void) +{ + struct dmem_region *dregion, *dr; + struct dmem_node *dnode; + + for_each_dmem_node(dnode) { + dnode->nodelist[0] = NUMA_NO_NODE; + list_for_each_entry_safe(dregion, dr, &dnode->regions, node) { + dmem_region_uinit(dregion); + dregion->reserved_start_addr = + dregion->reserved_end_addr = 0; + list_del(&dregion->node); + } + } + + dmem_pool.region_num = 0; + dmem_pool.registered_pages = 0; +} + +static int __init dmem_region_init(struct dmem_region *dregion) +{ + unsigned long *bitmap, size, nr_pages; + + nr_pages = __phys_to_pfn(dregion->reserved_end_addr) + - __phys_to_pfn(dregion->reserved_start_addr); + + size = BITS_TO_LONGS(nr_pages) * sizeof(long); + if (size <= sizeof(dregion->static_error_bitmap)) { + bitmap = &dregion->static_error_bitmap; + } else { + bitmap = kzalloc(size, GFP_KERNEL); + if (!bitmap) + return -ENOMEM; + } + dregion->error_bitmap = bitmap; + return 0; +} + +/* + * dmem memory is not 'struct page' backend, i.e, the kernel threats + * it as invalid pfn + */ +static int __init dmem_check_region(struct dmem_region *dregion) +{ + unsigned long pfn; + + for (pfn = __phys_to_pfn(dregion->reserved_start_addr); + pfn < __phys_to_pfn(dregion->reserved_end_addr); pfn++) { + if (!WARN_ON(pfn_valid(pfn))) + continue; + + pr_err("dmem: check pfn %#lx failed, its memory was not properly reserved\n", + pfn); + return -EINVAL; + } + + return 0; +} + +static int __init dmem_late_init(void) +{ + struct dmem_region *dregion; + struct dmem_node *dnode; + int ret; + + for_each_dmem_node(dnode) { + dmem_node_init(dnode); + + for_each_dmem_region(dnode, dregion) { + ret = dmem_region_init(dregion); + if (ret) + goto exit; + ret = dmem_check_region(dregion); + if (ret) + goto exit; + } + } + return ret; +exit: + dmem_uinit(); + return ret; +} +late_initcall(dmem_late_init); + +static int dmem_alloc_region_init(struct dmem_region *dregion, + unsigned long *dpages) +{ + unsigned long start, end, *bitmap, size; + + start = DMEM_PAGE_UP(dregion->reserved_start_addr); + end = DMEM_PAGE_DOWN(dregion->reserved_end_addr); + + *dpages = end - start; + if (!*dpages) + return 0; + + size = BITS_TO_LONGS(*dpages) * sizeof(long); + if (size <= sizeof(dregion->static_bitmap)) + bitmap = &dregion->static_bitmap; + else { + bitmap = kzalloc(size, GFP_KERNEL); + if (!bitmap) + return -ENOMEM; + } + + dregion->bitmap = bitmap; + dregion->next_free_pos = 0; + dregion->dpage_start_pfn = start; + dregion->dpage_end_pfn = end; + + dmem_pool.unaligned_pages += __phys_to_pfn((dpage_to_phys(start) + - dregion->reserved_start_addr)); + dmem_pool.unaligned_pages += __phys_to_pfn(dregion->reserved_end_addr + - dpage_to_phys(end)); + return 0; +} + +static bool dmem_dpage_is_error(struct dmem_region *dregion, phys_addr_t dpage) +{ + unsigned long valid_pages; + unsigned long pos_pfn, pos_offset; + unsigned long pages_per_dpage = DMEM_PAGE_SIZE >> PAGE_SHIFT; + phys_addr_t reserved_start_pfn; + + reserved_start_pfn = __phys_to_pfn(dregion->reserved_start_addr); + valid_pages = dpage_to_pfn(dregion->dpage_end_pfn) - reserved_start_pfn; + + pos_offset = dpage_to_pfn(dpage) - reserved_start_pfn; + pos_pfn = find_next_bit(dregion->error_bitmap, valid_pages, pos_offset); + if (pos_pfn < pos_offset + pages_per_dpage) + return true; + return false; +} + +static unsigned long +dmem_alloc_bitmap_clear(struct dmem_region *dregion, phys_addr_t dpage, + unsigned int dpages_nr) +{ + u64 pos = dpage - dregion->dpage_start_pfn; + unsigned int i; + unsigned long err_num = 0; + + for (i = 0; i < dpages_nr; i++) { + if (dmem_dpage_is_error(dregion, dpage + i)) { + WARN_ON(!test_bit(pos + i, dregion->bitmap)); + err_num++; + } else { + WARN_ON(!__test_and_clear_bit(pos + i, + dregion->bitmap)); + } + } + return err_num; +} + +/* set or clear corresponding bit on allocation bitmap based on error bitmap */ +static unsigned long dregion_alloc_bitmap_set_clear(struct dmem_region *dregion, + bool set) +{ + unsigned long pos_pfn, pos_offset; + unsigned long valid_pages, mce_dpages = 0; + phys_addr_t dpage, reserved_start_pfn; + + reserved_start_pfn = __phys_to_pfn(dregion->reserved_start_addr); + + valid_pages = dpage_to_pfn(dregion->dpage_end_pfn) - reserved_start_pfn; + pos_offset = dpage_to_pfn(dregion->dpage_start_pfn) + - reserved_start_pfn; +try_set: + pos_pfn = find_next_bit(dregion->error_bitmap, valid_pages, pos_offset); + + if (pos_pfn >= valid_pages) + return mce_dpages; + mce_dpages++; + dpage = pfn_to_dpage(pos_pfn + reserved_start_pfn); + if (set) + WARN_ON(__test_and_set_bit(dpage - dregion->dpage_start_pfn, + dregion->bitmap)); + else + WARN_ON(!__test_and_clear_bit(dpage - dregion->dpage_start_pfn, + dregion->bitmap)); + pos_offset = dpage_to_pfn(dpage + 1) - reserved_start_pfn; + goto try_set; +} + +static void dmem_uinit_check_alloc_bitmap(struct dmem_region *dregion) +{ + unsigned long dpages, size; + + dregion_alloc_bitmap_set_clear(dregion, false); + + dpages = dregion->dpage_end_pfn - dregion->dpage_start_pfn; + size = BITS_TO_LONGS(dpages) * sizeof(long); + WARN_ON(!bitmap_empty(dregion->bitmap, size * BITS_PER_BYTE)); +} + +static void dmem_alloc_region_uinit(struct dmem_region *dregion) +{ + unsigned long dpages, size, *bitmap = dregion->bitmap; + + if (!bitmap) + return; + + dpages = dregion->dpage_end_pfn - dregion->dpage_start_pfn; + WARN_ON(!dpages); + + dmem_uinit_check_alloc_bitmap(dregion); + + size = BITS_TO_LONGS(dpages) * sizeof(long); + if (size > sizeof(dregion->static_bitmap)) + kfree(bitmap); + dregion->bitmap = NULL; +} + +static void __dmem_alloc_uinit(void) +{ + struct dmem_node *dnode; + struct dmem_region *dregion; + + if (!dmem_pool.dpage_shift) + return; + + dmem_pool.unaligned_pages = 0; + + for_each_dmem_node(dnode) { + for_each_dmem_region(dnode, dregion) + dmem_alloc_region_uinit(dregion); + + dnode->total_dpages = dnode->free_dpages = 0; + } + + dmem_pool.dpage_shift = 0; + dmem_pool.total_dpages = dmem_pool.free_dpages = 0; +} + +static void dnode_count_free_dpages(struct dmem_node *dnode, long dpages) +{ + dnode->free_dpages += dpages; + dmem_pool.free_dpages += dpages; +} + +/* + * uninitialize dmem allocator + * + * all dpages should be freed before calling it + */ +void dmem_alloc_uinit(void) +{ + mutex_lock(&dmem_pool.lock); + if (!--dmem_pool.user_count) + __dmem_alloc_uinit(); + mutex_unlock(&dmem_pool.lock); +} +EXPORT_SYMBOL(dmem_alloc_uinit); + +/* + * initialize dmem allocator + * @dpage_shift: the shift bits of dmem page size used to manange + * dmem memory, it should be CPU's nature page size at least + * + * Note: the page size the allocator used isn't the same thing with + * the alignment used to reserve dmem memory + */ +int dmem_alloc_init(unsigned long dpage_shift) +{ + struct dmem_node *dnode; + struct dmem_region *dregion; + unsigned long dpages; + int ret = 0; + + if (dpage_shift < PAGE_SHIFT) + return -EINVAL; + + mutex_lock(&dmem_pool.lock); + + if (dmem_pool.dpage_shift) { + /* + * double init on the same page size is okay + * to make the unit tests happy + */ + if (dmem_pool.dpage_shift != dpage_shift) + ret = -EBUSY; + + goto exit; + } + + dmem_pool.dpage_shift = dpage_shift; + + for_each_dmem_node(dnode) { + for_each_dmem_region(dnode, dregion) { + ret = dmem_alloc_region_init(dregion, &dpages); + if (ret < 0) { + __dmem_alloc_uinit(); + goto exit; + } + + dnode_count_free_dpages(dnode, dpages); + } + dnode->total_dpages = dnode->free_dpages; + } + + dmem_pool.total_dpages = dmem_pool.free_dpages; + + if (dmem_pool.unaligned_pages && !ret) + pr_warn("dmem: %llu pages are wasted due to alignment\n", + (unsigned long long)dmem_pool.unaligned_pages); +exit: + if (!ret) + dmem_pool.user_count++; + + mutex_unlock(&dmem_pool.lock); + return ret; +} +EXPORT_SYMBOL(dmem_alloc_init); + +static phys_addr_t +dmem_alloc_region_page(struct dmem_region *dregion, unsigned int try_max, + unsigned int *result_nr) +{ + unsigned long pos, dpages; + unsigned int i; + + /* no dpage is available in this region */ + if (!dregion->bitmap) + return 0; + + dpages = dregion->dpage_end_pfn - dregion->dpage_start_pfn; + + /* no free page in this region */ + if (dregion->next_free_pos >= dpages) + return 0; + + pos = find_next_zero_bit(dregion->bitmap, dpages, + dregion->next_free_pos); + if (pos >= dpages) { + dregion->next_free_pos = pos; + return 0; + } + + __set_bit(pos, dregion->bitmap); + + /* do not go beyond the region */ + try_max = min(try_max, (unsigned int)(dpages - pos - 1)); + for (i = 1; i < try_max; i++) + if (__test_and_set_bit(pos + i, dregion->bitmap)) + break; + + *result_nr = i; + dregion->next_free_pos = pos + *result_nr; + return dpage_to_phys(dregion->dpage_start_pfn + pos); +} + +/* + * allocate dmem pages from the nodelist + * + * @nodelist: dmem_node's nodelist + * @nodemask: nodemask for filtering the dmem nodelist + * @try_max: try to allocate @try_max dpages if possible + * @result_nr: allocated dpage number returned to the caller + * + * return the physical address of the first dpage allocated from dmem + * pool, or 0 on failure. The allocated dpage number is filled into + * @result_nr + */ +static phys_addr_t +dmem_alloc_pages_from_nodelist(int *nodelist, nodemask_t *nodemask, + unsigned int try_max, unsigned int *result_nr) +{ + struct dmem_node *dnode; + struct dmem_region *dregion; + phys_addr_t addr = 0; + int node, i; + unsigned int local_result_nr; + + WARN_ON(try_max > 1 && !result_nr); + + if (!result_nr) + result_nr = &local_result_nr; + + *result_nr = 0; + + for (i = 0; !addr && i < ARRAY_SIZE(dnode->nodelist); i++) { + node = nodelist[i]; + + if (nodemask && !node_isset(node, *nodemask)) + continue; + + mutex_lock(&dmem_pool.lock); + + WARN_ON(!dmem_pool.dpage_shift); + + dnode = &dmem_pool.nodes[node]; + for_each_dmem_region(dnode, dregion) { + addr = dmem_alloc_region_page(dregion, try_max, + result_nr); + if (addr) { + dnode_count_free_dpages(dnode, + -(long)(*result_nr)); + break; + } + } + + mutex_unlock(&dmem_pool.lock); + } + return addr; +} + +/* + * allocate a dmem page from the dmem pool and try to allocate more + * continuous dpages if @try_max is not less than 1 + * + * @nid: the NUMA node the dmem page got from + * @nodemask: nodemask for filtering the dmem nodelist + * @try_max: try to allocate @try_max dpages if possible + * @result_nr: allocated dpage number returned to the caller + * + * return the physical address of the first dpage allocated from dmem + * pool, or 0 on failure. The allocated dpage number is filled into + * @result_nr + */ +phys_addr_t +dmem_alloc_pages_nodemask(int nid, nodemask_t *nodemask, unsigned int try_max, + unsigned int *result_nr) +{ + int *nodelist; + + if (nid >= sizeof(ARRAY_SIZE(dmem_pool.nodes))) + return 0; + + nodelist = dmem_nodelist(nid); + return dmem_alloc_pages_from_nodelist(nodelist, nodemask, + try_max, result_nr); +} +EXPORT_SYMBOL(dmem_alloc_pages_nodemask); + +/* + * dmem_alloc_pages_vma - Allocate pages for a VMA. + * + * @vma: Pointer to VMA or NULL if not available. + * @addr: Virtual Address of the allocation. Must be inside the VMA. + * @try_max: try to allocate @try_max dpages if possible + * @result_nr: allocated dpage number returned to the caller + * + * Return the physical address of the first dpage allocated from dmem + * pool, or 0 on failure. The allocated dpage number is filled into + * @result_nr + */ +phys_addr_t +dmem_alloc_pages_vma(struct vm_area_struct *vma, unsigned long addr, + unsigned int try_max, unsigned int *result_nr) +{ + phys_addr_t phys_addr; + int *nl; + unsigned int cpuset_mems_cookie; + +retry_cpuset: + nl = dmem_nodelist(numa_node_id()); + + phys_addr = dmem_alloc_pages_from_nodelist(nl, NULL, try_max, + result_nr); + if (unlikely(!phys_addr && read_mems_allowed_retry(cpuset_mems_cookie))) + goto retry_cpuset; + + return phys_addr; +} +EXPORT_SYMBOL(dmem_alloc_pages_vma); + +/* + * Don't need to call it in a lock. + * This function uses the reserved addresses those are initially registered + * and will not be modified at run time. + */ +static struct dmem_region *find_dmem_region(phys_addr_t phys_addr, + struct dmem_node **pdnode) +{ + struct dmem_node *dnode; + struct dmem_region *dregion; + + for_each_dmem_node(dnode) + for_each_dmem_region(dnode, dregion) { + if (dregion->reserved_start_addr > phys_addr) + continue; + if (dregion->reserved_end_addr <= phys_addr) + continue; + + *pdnode = dnode; + return dregion; + } + + return NULL; +} + +/* + * free dmem page to the dmem pool + * @addr: the physical addree will be freed + * @dpage_nr: the number of dpage to be freed + */ +void dmem_free_pages(phys_addr_t addr, unsigned int dpages_nr) +{ + struct dmem_region *dregion; + struct dmem_node *pdnode = NULL; + phys_addr_t dpage = phys_to_dpage(addr); + u64 pos; + unsigned long err_dpages; + + mutex_lock(&dmem_pool.lock); + + WARN_ON(!dmem_pool.dpage_shift); + + dregion = find_dmem_region(addr, &pdnode); + WARN_ON(!dregion || !dregion->bitmap || !pdnode); + + pos = dpage - dregion->dpage_start_pfn; + dregion->next_free_pos = min(dregion->next_free_pos, pos); + + /* it is not possible to span multiple regions */ + WARN_ON(dpage + dpages_nr - 1 >= dregion->dpage_end_pfn); + + err_dpages = dmem_alloc_bitmap_clear(dregion, dpage, dpages_nr); + + dnode_count_free_dpages(pdnode, dpages_nr - err_dpages); + mutex_unlock(&dmem_pool.lock); +} +EXPORT_SYMBOL(dmem_free_pages); + From patchwork Mon Dec 7 11:30:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68900C4361B for ; Mon, 7 Dec 2020 11:33:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 07BD323340 for ; Mon, 7 Dec 2020 11:33:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 07BD323340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9ABCD8D0007; Mon, 7 Dec 2020 06:33:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 95BC48D0001; Mon, 7 Dec 2020 06:33:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FCEF8D0007; Mon, 7 Dec 2020 06:33:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0064.hostedemail.com [216.40.44.64]) by kanga.kvack.org (Postfix) with ESMTP id 6A5B78D0001 for ; Mon, 7 Dec 2020 06:33:39 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2C4EE1EE6 for ; Mon, 7 Dec 2020 11:33:39 +0000 (UTC) X-FDA: 77566276158.14.door01_1117bd4273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id EBE261822987A for ; Mon, 7 Dec 2020 11:33:38 +0000 (UTC) X-HE-Tag: door01_1117bd4273de X-Filterd-Recvd-Size: 5332 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:33:38 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id b26so9610452pfi.3 for ; Mon, 07 Dec 2020 03:33:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=V7IqDWSKXukQODvyRYxxHyAp9kzYnONBivG3E1uSTi0=; b=YIMPqFee/AenZi8qBNwZ1evrJB2ILU5DVBLCXYmgkgMvkajLVi78Jv6Z/OiFi2nfW1 u/YJnlOgZeEA70FaDcZFIcf31FvalQEUazAyBN7IUyWDay51dK1/1zqafd3IwqnePSII f4iij588EBRDCzP9KPiO0aTvaS4APpif6lMHhKyytak+ZjMZ3sxH2BKJEPMaTDsyFeLu 9kb1y8rVVQPPxQW+2Eb5TjVDa9Jv4XiM6S+q6Lfic5FL6MuReKSh5B0cZHiDIXHxmLiB QsCC+kgZIWtG6Zl3CQdJd/Yz+V47c7WU3yfJrs9g+0W1tVoMSSFSOz4Y4OTMBNn32GVd N0JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=V7IqDWSKXukQODvyRYxxHyAp9kzYnONBivG3E1uSTi0=; b=AGignTptC2o4WDsLBoFVAJkOyLr5BKlM6mHZM+dadMACVBqxH+5FY/WA7UwkVKptF1 5a10xDF+VKqFkL/27KXDP7NWDvkUYqUdQGHvgUeWmW6lIkwuVL/ec8jfUce/1ejI3OsJ bkGAJltvQL30eplSFxjZafRmy8eork44mG9ledUjLMmHrf/txdkqEJ61Uuq4v+pIMfKU l1gOGY3biObauW8yMu8MtnFRU2LJZHDHPJmbUGZuEnRCLoDjRXJN1SHGSBSYxh7PMkcC W9IDEcMXhNc3T1rBTQEK/jBVgVZpkqxPGxOtQTVUoyOk38iI7Y5TMZj6aDWHH0f/3Qb0 yYcw== X-Gm-Message-State: AOAM530rg+r2D/LBhgT+4YfLVXQEbihB0+0TMSwnonFoSlIgBLFmHNKe Q2eYGpK9aHI520M9QirXpbJdr3vkzrg= X-Google-Smtp-Source: ABdhPJyQefdtFNDvBzo/WKmc11oLIwPzRy//V+/lSFKCStAywDqYn6pKN5mcS5ZLbN4k0Rx9dBfgRA== X-Received: by 2002:a17:902:9341:b029:da:13f5:302a with SMTP id g1-20020a1709029341b02900da13f5302amr15670304plp.9.1607340817145; Mon, 07 Dec 2020 03:33:37 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.33.33 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:33:36 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 04/37] dmem: let pat recognize dmem Date: Mon, 7 Dec 2020 19:30:57 +0800 Message-Id: <805999e57d629348f813017e02a086e33e507d9e.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang x86 pat uses 'struct page' by only checking if it's system ram, however it is not true if dmem is used, let's teach pat to recognize this case if it is ram but it is !pfn_valid() We always use WB for dmem and any attempt to change this behavior will be rejected and WARN_ON is triggered Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- arch/x86/mm/pat/memtype.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index 8f665c3..fd8a298 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -511,6 +511,13 @@ static int reserve_ram_pages_type(u64 start, u64 end, for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) { enum page_cache_mode type; + /* + * it's dmem if it's ram but not 'struct page' backend, + * we always use WB + */ + if (WARN_ON(!pfn_valid(pfn))) + return -EBUSY; + page = pfn_to_page(pfn); type = get_page_memtype(page); if (type != _PAGE_CACHE_MODE_WB) { @@ -539,6 +546,13 @@ static int free_ram_pages_type(u64 start, u64 end) u64 pfn; for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) { + /* + * it's dmem, see the comments in + * reserve_ram_pages_type() + */ + if (WARN_ON(!pfn_valid(pfn))) + continue; + page = pfn_to_page(pfn); set_page_memtype(page, _PAGE_CACHE_MODE_WB); } @@ -714,6 +728,13 @@ static enum page_cache_mode lookup_memtype(u64 paddr) if (pat_pagerange_is_ram(paddr, paddr + PAGE_SIZE)) { struct page *page; + /* + * dmem always uses WB, see the comments in + * reserve_ram_pages_type() + */ + if (!pfn_valid(paddr >> PAGE_SHIFT)) + return rettype; + page = pfn_to_page(paddr >> PAGE_SHIFT); return get_page_memtype(page); } From patchwork Mon Dec 7 11:30:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AABBBC433FE for ; Mon, 7 Dec 2020 11:33:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1912223340 for ; Mon, 7 Dec 2020 11:33:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1912223340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A20DE8D0008; Mon, 7 Dec 2020 06:33:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FBBF8D0001; Mon, 7 Dec 2020 06:33:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 875158D0008; Mon, 7 Dec 2020 06:33:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0071.hostedemail.com [216.40.44.71]) by kanga.kvack.org (Postfix) with ESMTP id 72C1C8D0001 for ; Mon, 7 Dec 2020 06:33:43 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 34233181AEF1F for ; Mon, 7 Dec 2020 11:33:43 +0000 (UTC) X-FDA: 77566276326.10.cast28_2515f70273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 1378A16A0B9 for ; Mon, 7 Dec 2020 11:33:43 +0000 (UTC) X-HE-Tag: cast28_2515f70273de X-Filterd-Recvd-Size: 14406 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:33:42 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id m9so8659432pgb.4 for ; Mon, 07 Dec 2020 03:33:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/44otMrMxzCUiGk0V1t2L81SCVx385WHUCyFll3f9p0=; b=EiialdWRB3FHs1vDA+YkzfnQD6VJz6N4qmPV73L07sHuG/P4ErRRe6bm61f9g5hr9H +bV+1omLT2lXgkipsrER7t4rQ/QroKwlhp9ASMHS41/YxSwuudP0PtPd/urhOkA/EO9+ TiWzoGcwtXpte4GC/vTXZi503fNXbWeHFok7IbUc5PmkUX4gvlyg1b9Q+jbOSz3ZunLJ gdl947A+Eh0NIoZr03LVyIBhMu8U/PPJJg7YRgXc0swqDOVuRoWZ1LhJGdtkIFQaMPvG Dygv6S2XVQrEGFMtvhF+fVgdh1BjHZ9kopYhtI9kkXo7XYMujVSmvjq+i89HNmdDCg1L X81A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/44otMrMxzCUiGk0V1t2L81SCVx385WHUCyFll3f9p0=; b=QW9fsoYCQIOd/Y5CxOFMX7D7bo4qthRnTola9+3pAouay+NBtIdRDpLZXo/+VEKUIK FSSic5ycPUhvoqn7BpU2JLkpzxmOwwhi6T145lmPKFVwu98J0bqX54EYM3amCGezj6sa 6qDXZS7s87r1iIjlzTGc3Lve5EVIz2ZzkiVKeplN+GbfhX1oNDVcCS9j7HmMJ6GcYwuf I1pauYldrO1lJiSoMAFwmRHtfhw9Kbz58igmezuZlaFhKT+J3RRxw+5xselOdGFgw4QK f6iMkNXlij6nZFHU/oli7Fkrhj3AhzVH1Ef0MNyv7s656Miu0o3eZAarOXtVyPfsRAtw mnXA== X-Gm-Message-State: AOAM533KhNslMSdS+qLopjEoDX+NDDkJiY/lPSTj9T7rO3Cc4GU1Nygj hjPW+veSPXJ9XPHyEw7QavVv/kBrIF4= X-Google-Smtp-Source: ABdhPJxsuCuuWYmFuc/z0htdn8lG7NF7yPaaVzD61TY34e7sLxSOK8H64w92t88zlRNSfCJLilKEiQ== X-Received: by 2002:aa7:8b15:0:b029:196:59ad:ab93 with SMTP id f21-20020aa78b150000b029019659adab93mr15263676pfd.16.1607340821419; Mon, 07 Dec 2020 03:33:41 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.33.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:33:40 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 05/37] dmemfs: support mmap for dmemfs Date: Mon, 7 Dec 2020 19:30:58 +0800 Message-Id: <556903717e3d0b0fc0b9583b709f4b34be2154cb.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang This patch adds mmap support. Note the file will be extended if it's beyond mmap's offset, that drops the requirement of write() operation, however, it has not supported cutting file down yet. Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- fs/dmemfs/inode.c | 343 ++++++++++++++++++++++++++++++++++++++++++++++++++- include/linux/dmem.h | 10 ++ 2 files changed, 351 insertions(+), 2 deletions(-) diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c index 0aa3d3b..7b6e51d 100644 --- a/fs/dmemfs/inode.c +++ b/fs/dmemfs/inode.c @@ -26,6 +26,7 @@ #include #include #include +#include MODULE_AUTHOR("Tencent Corporation"); MODULE_LICENSE("GPL v2"); @@ -102,7 +103,255 @@ static int dmemfs_mkdir(struct inode *dir, struct dentry *dentry, .getattr = simple_getattr, }; +static unsigned long dmem_pgoff_to_index(struct inode *inode, pgoff_t pgoff) +{ + struct super_block *sb = inode->i_sb; + + return pgoff >> (sb->s_blocksize_bits - PAGE_SHIFT); +} + +static void *dmem_addr_to_entry(struct inode *inode, phys_addr_t addr) +{ + struct super_block *sb = inode->i_sb; + + addr >>= sb->s_blocksize_bits; + return xa_mk_value(addr); +} + +static phys_addr_t dmem_entry_to_addr(struct inode *inode, void *entry) +{ + struct super_block *sb = inode->i_sb; + + WARN_ON(!xa_is_value(entry)); + return xa_to_value(entry) << sb->s_blocksize_bits; +} + +static unsigned long +dmem_addr_to_pfn(struct inode *inode, phys_addr_t addr, pgoff_t pgoff, + unsigned int fault_shift) +{ + struct super_block *sb = inode->i_sb; + unsigned long pfn = addr >> PAGE_SHIFT; + unsigned long mask; + + mask = (1UL << ((unsigned int)sb->s_blocksize_bits - fault_shift)) - 1; + mask <<= fault_shift - PAGE_SHIFT; + + return pfn + (pgoff & mask); +} + +static inline unsigned long dmem_page_size(struct inode *inode) +{ + return inode->i_sb->s_blocksize; +} + +static int check_inode_size(struct inode *inode, loff_t offset) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); + + if (offset >= i_size_read(inode)) + return -EINVAL; + + return 0; +} + +static unsigned +dmemfs_find_get_entries(struct address_space *mapping, unsigned long start, + unsigned int nr_entries, void **entries, + unsigned long *indices) +{ + XA_STATE(xas, &mapping->i_pages, start); + + void *entry; + unsigned int ret = 0; + + if (!nr_entries) + return 0; + + rcu_read_lock(); + + xas_for_each(&xas, entry, ULONG_MAX) { + if (xas_retry(&xas, entry)) + continue; + + if (xa_is_value(entry)) + goto export; + + if (unlikely(entry != xas_reload(&xas))) + goto retry; + +export: + indices[ret] = xas.xa_index; + entries[ret] = entry; + if (++ret == nr_entries) + break; + continue; +retry: + xas_reset(&xas); + } + rcu_read_unlock(); + return ret; +} + +static void *find_radix_entry_or_next(struct address_space *mapping, + unsigned long start, + unsigned long *eindex) +{ + void *entry = NULL; + + dmemfs_find_get_entries(mapping, start, 1, &entry, eindex); + return entry; +} + +/* + * find the entry in radix tree based on @index, create it if + * it does not exist + * + * return the entry with rcu locked, otherwise ERR_PTR() + * is returned + */ +static void * +radix_get_create_entry(struct vm_area_struct *vma, unsigned long fault_addr, + struct inode *inode, pgoff_t pgoff) +{ + struct address_space *mapping = inode->i_mapping; + unsigned long eindex, index; + loff_t offset; + phys_addr_t addr; + gfp_t gfp_masks = mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM; + void *entry; + unsigned int try_dpages, dpages; + int ret; + +retry: + offset = ((loff_t)pgoff << PAGE_SHIFT); + index = dmem_pgoff_to_index(inode, pgoff); + rcu_read_lock(); + ret = check_inode_size(inode, offset); + if (ret) { + rcu_read_unlock(); + return ERR_PTR(ret); + } + + try_dpages = dmem_pgoff_to_index(inode, (i_size_read(inode) - offset) + >> PAGE_SHIFT); + entry = find_radix_entry_or_next(mapping, index, &eindex); + if (entry) { + WARN_ON(!xa_is_value(entry)); + if (eindex == index) + return entry; + + WARN_ON(eindex <= index); + try_dpages = eindex - index; + } + rcu_read_unlock(); + + /* entry does not exist, create it */ + addr = dmem_alloc_pages_vma(vma, fault_addr, try_dpages, &dpages); + if (!addr) { + /* + * do not return -ENOMEM as that will trigger OOM, + * it is useless for reclaiming dmem page + */ + ret = -EINVAL; + goto exit; + } + + try_dpages = dpages; + while (dpages) { + rcu_read_lock(); + ret = check_inode_size(inode, offset); + if (ret) + goto unlock_rcu; + + entry = dmem_addr_to_entry(inode, addr); + entry = xa_store(&mapping->i_pages, index, entry, gfp_masks); + if (!xa_is_err(entry)) { + addr += inode->i_sb->s_blocksize; + offset += inode->i_sb->s_blocksize; + dpages--; + mapping->nrexceptional++; + index++; + } + +unlock_rcu: + rcu_read_unlock(); + if (ret) + break; + } + + if (dpages) + dmem_free_pages(addr, dpages); + + /* we have created some entries, let's retry it */ + if (ret == -EEXIST || try_dpages != dpages) + goto retry; +exit: + return ERR_PTR(ret); +} + +static void radix_put_entry(void) +{ + rcu_read_unlock(); +} + +static vm_fault_t dmemfs_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct inode *inode = file_inode(vma->vm_file); + phys_addr_t addr; + void *entry; + int ret; + + if (vmf->pgoff > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) + return VM_FAULT_SIGBUS; + + entry = radix_get_create_entry(vma, (unsigned long)vmf->address, + inode, vmf->pgoff); + if (IS_ERR(entry)) { + ret = PTR_ERR(entry); + goto exit; + } + + addr = dmem_entry_to_addr(inode, entry); + ret = vmf_insert_pfn(vma, (unsigned long)vmf->address, + dmem_addr_to_pfn(inode, addr, vmf->pgoff, + PAGE_SHIFT)); + radix_put_entry(); + +exit: + return ret; +} + +static unsigned long dmemfs_pagesize(struct vm_area_struct *vma) +{ + return dmem_page_size(file_inode(vma->vm_file)); +} + +static const struct vm_operations_struct dmemfs_vm_ops = { + .fault = dmemfs_fault, + .pagesize = dmemfs_pagesize, +}; + +int dmemfs_file_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct inode *inode = file_inode(file); + + if (vma->vm_pgoff & ((dmem_page_size(inode) - 1) >> PAGE_SHIFT)) + return -EINVAL; + + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + + vma->vm_flags |= VM_PFNMAP; + + file_accessed(file); + vma->vm_ops = &dmemfs_vm_ops; + return 0; +} + static const struct file_operations dmemfs_file_operations = { + .mmap = dmemfs_file_mmap, }; static int dmemfs_parse_param(struct fs_context *fc, struct fs_parameter *param) @@ -180,9 +429,86 @@ static int dmemfs_statfs(struct dentry *dentry, struct kstatfs *buf) return 0; } +/* + * should make sure the dmem page in the dropped region is not + * being mapped by any process + */ +static void inode_drop_dpages(struct inode *inode, loff_t start, loff_t end) +{ + struct address_space *mapping = inode->i_mapping; + struct pagevec pvec; + unsigned long istart, iend, indices[PAGEVEC_SIZE]; + int i; + + /* we never use normap page */ + WARN_ON(mapping->nrpages); + + /* if no dpage is allocated for the inode */ + if (!mapping->nrexceptional) + return; + + istart = dmem_pgoff_to_index(inode, start >> PAGE_SHIFT); + iend = dmem_pgoff_to_index(inode, end >> PAGE_SHIFT); + pagevec_init(&pvec); + while (istart < iend) { + pvec.nr = dmemfs_find_get_entries(mapping, istart, + min(iend - istart, + (unsigned long)PAGEVEC_SIZE), + (void **)pvec.pages, + indices); + if (!pvec.nr) + break; + + for (i = 0; i < pagevec_count(&pvec); i++) { + phys_addr_t addr; + + istart = indices[i]; + if (istart >= iend) + break; + + xa_erase(&mapping->i_pages, istart); + mapping->nrexceptional--; + + addr = dmem_entry_to_addr(inode, pvec.pages[i]); + dmem_free_page(addr); + } + + /* + * only exception entries in pagevec, it's safe to + * reinit it + */ + pagevec_reinit(&pvec); + cond_resched(); + istart++; + } +} + +static void dmemfs_evict_inode(struct inode *inode) +{ + /* no VMA works on it */ + WARN_ON(!RB_EMPTY_ROOT(&inode->i_data.i_mmap.rb_root)); + + inode_drop_dpages(inode, 0, LLONG_MAX); + clear_inode(inode); +} + +/* + * Display the mount options in /proc/mounts. + */ +static int dmemfs_show_options(struct seq_file *m, struct dentry *root) +{ + struct dmemfs_fs_info *fsi = root->d_sb->s_fs_info; + + if (check_dpage_size(fsi->mount_opts.dpage_size)) + seq_printf(m, ",pagesize=%lx", fsi->mount_opts.dpage_size); + return 0; +} + static const struct super_operations dmemfs_ops = { .statfs = dmemfs_statfs, + .evict_inode = dmemfs_evict_inode, .drop_inode = generic_delete_inode, + .show_options = dmemfs_show_options, }; static int @@ -190,6 +516,7 @@ static int dmemfs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct inode *inode; struct dmemfs_fs_info *fsi = sb->s_fs_info; + int ret; sb->s_maxbytes = MAX_LFS_FILESIZE; sb->s_blocksize = fsi->mount_opts.dpage_size; @@ -198,11 +525,17 @@ static int dmemfs_statfs(struct dentry *dentry, struct kstatfs *buf) sb->s_op = &dmemfs_ops; sb->s_time_gran = 1; + ret = dmem_alloc_init(sb->s_blocksize_bits); + if (ret) + return ret; + inode = dmemfs_get_inode(sb, NULL, S_IFDIR); sb->s_root = d_make_root(inode); - if (!sb->s_root) - return -ENOMEM; + if (!sb->s_root) { + dmem_alloc_uinit(); + return -ENOMEM; + } return 0; } @@ -238,7 +571,13 @@ int dmemfs_init_fs_context(struct fs_context *fc) static void dmemfs_kill_sb(struct super_block *sb) { + bool has_inode = !!sb->s_root; + kill_litter_super(sb); + + /* do not uninit dmem allocator if mount failed */ + if (has_inode) + dmem_alloc_uinit(); } static struct file_system_type dmemfs_fs_type = { diff --git a/include/linux/dmem.h b/include/linux/dmem.h index 476a82e..8682d63 100644 --- a/include/linux/dmem.h +++ b/include/linux/dmem.h @@ -10,6 +10,16 @@ int dmem_alloc_init(unsigned long dpage_shift); void dmem_alloc_uinit(void); +phys_addr_t +dmem_alloc_pages_nodemask(int nid, nodemask_t *nodemask, unsigned int try_max, + unsigned int *result_nr); + +phys_addr_t +dmem_alloc_pages_vma(struct vm_area_struct *vma, unsigned long addr, + unsigned int try_max, unsigned int *result_nr); + +void dmem_free_pages(phys_addr_t addr, unsigned int dpages_nr); +#define dmem_free_page(addr) dmem_free_pages(addr, 1) #else static inline int dmem_reserve_init(void) { From patchwork Mon Dec 7 11:30:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6BF2C433FE for ; Mon, 7 Dec 2020 11:33:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 301E423340 for ; Mon, 7 Dec 2020 11:33:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 301E423340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B5A208D0009; Mon, 7 Dec 2020 06:33:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B36E68D0001; Mon, 7 Dec 2020 06:33:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D3858D0009; Mon, 7 Dec 2020 06:33:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0091.hostedemail.com [216.40.44.91]) by kanga.kvack.org (Postfix) with ESMTP id 87C9C8D0001 for ; Mon, 7 Dec 2020 06:33:47 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 48A428249980 for ; Mon, 7 Dec 2020 11:33:47 +0000 (UTC) X-FDA: 77566276494.28.debt40_05002e3273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 1C65F6D64 for ; Mon, 7 Dec 2020 11:33:47 +0000 (UTC) X-HE-Tag: debt40_05002e3273de X-Filterd-Recvd-Size: 6221 Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com [209.85.216.67]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:33:46 +0000 (UTC) Received: by mail-pj1-f67.google.com with SMTP id o7so7271957pjj.2 for ; Mon, 07 Dec 2020 03:33:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=eX1IR50aPFiEwdJQV5xGne8H/FATcvY6dG1G5D19yRg=; b=eXbX382qovy7COikmy0KYqgAQma7SSrycd5mwQ9J3QDSquh0VXtGUzblKKkJuAAuVI PHSVAom05MaCyxQFaVh4X0uD1ku5P8agDSw+IP8A/Cg85jWOn4rl+hXVuErT69iZK05Z AcBFV5C+p5lDBPuGmBkyAuL5TR4/MK5NGKvZAxWYLkqPklUmSTDRcAnTp3+KTbgnYNva U4uPsEDUiXFj3Bp/DPp8dqvmHNJAQNKovFWzqT1K4nBuHDOyV8p/E0E60YvnLz8urUmn JGkze+2OWmWXsKpBGQx87ndhmBmG+X4sOrdozDV9Gp1n+V+72T9SRvDWV3Jt7aMnyj1a mpgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eX1IR50aPFiEwdJQV5xGne8H/FATcvY6dG1G5D19yRg=; b=cjftd+pVVCsMpG/3PLZpjMBGVvOIfKZTS3CH74DC33CmLxS2J9YWxRMHYLxHb54LeA CkKyu9SAIwm/rFtQv6BTvnxC5TfeuVsPkVvFl9f0YKBfIaohrdg7MT2RBsOTzrJvCBKM NUchwm4auta1+azNl+HwtZDHBLS94wXRwrdg0JOiMIWPzyDsu26Dny7D1N980ga69FSF XTxyzrh8OuVE0Qk2bL42hiJXuXnPHc2oUd94ui2Bufd3Di4LZYsLaP4cxHQEoRO2Bsm8 ZIdbyAX/6gr9dJeCilzWjM6Vkqya8fq5FFM0NQKLMm2QlI1Q19JDphwZYX1vqeFdPG1O K7Eg== X-Gm-Message-State: AOAM531t24TstezgHk/EMkc0AESr7FrR+w3LaKF/JLzUjrwHO7jW/0Rb Kiquct6C9VqkaeVyyfrQUxjYiRuYlX0= X-Google-Smtp-Source: ABdhPJwWiUXd//0TI6YPXtELG5n3boi+qEdZaWxYAMBrmQrPfGNew3+nfNYMRr95hadFZhb2SRQXRw== X-Received: by 2002:a17:90a:b38d:: with SMTP id e13mr16560118pjr.214.1607340825515; Mon, 07 Dec 2020 03:33:45 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.33.42 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:33:45 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 06/37] dmemfs: support truncating inode down Date: Mon, 7 Dec 2020 19:30:59 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang To support cut inode down, it will introduce the race between page fault handler and truncating handler as the entry to be deleted is being mapped into process's VMA in order to make page fault faster (as it's the hot path), we use RCU to sync these two handlers. When inode's size is updated, the handler makes sure the new size is visible to page fault handler who will not use truncated entry anymore and will not create new entry in that region Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- fs/dmemfs/inode.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c index 7b6e51d..9ec62dc 100644 --- a/fs/dmemfs/inode.c +++ b/fs/dmemfs/inode.c @@ -98,8 +98,73 @@ static int dmemfs_mkdir(struct inode *dir, struct dentry *dentry, .rename = simple_rename, }; +static void inode_drop_dpages(struct inode *inode, loff_t start, loff_t end); + +static int dmemfs_truncate(struct inode *inode, loff_t newsize) +{ + struct super_block *sb = inode->i_sb; + loff_t current_size; + + if (newsize & ((1 << sb->s_blocksize_bits) - 1)) + return -EINVAL; + + current_size = i_size_read(inode); + i_size_write(inode, newsize); + + if (newsize >= current_size) + return 0; + + /* it cuts the inode down */ + + /* + * we should make sure inode->i_size has been updated before + * unmapping and dropping radix entries, so that other sides + * can not create new i_mapping entry beyond inode->i_size + * and the radix entry in the truncated region is not being + * used + * + * see the comments in dmemfs_fault() + */ + synchronize_rcu(); + + /* + * should unmap all mapping first as dmem pages are freed in + * inode_drop_dpages() + * + * after that, dmem page in the truncated region is not used + * by any process + */ + unmap_mapping_range(inode->i_mapping, newsize, 0, 1); + + inode_drop_dpages(inode, newsize, LLONG_MAX); + return 0; +} + +/* + * same logic as simple_setattr but we need to handle ftruncate + * carefully as we inserted self-defined entry into radix tree + */ +static int dmemfs_setattr(struct dentry *dentry, struct iattr *iattr) +{ + struct inode *inode = dentry->d_inode; + int error; + + error = setattr_prepare(dentry, iattr); + if (error) + return error; + + if (iattr->ia_valid & ATTR_SIZE) { + error = dmemfs_truncate(inode, iattr->ia_size); + if (error) + return error; + } + setattr_copy(inode, iattr); + mark_inode_dirty(inode); + return 0; +} + static const struct inode_operations dmemfs_file_inode_operations = { - .setattr = simple_setattr, + .setattr = dmemfs_setattr, .getattr = simple_getattr, }; From patchwork Mon Dec 7 11:31:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85781C433FE for ; Mon, 7 Dec 2020 11:33:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 08CB123340 for ; Mon, 7 Dec 2020 11:33:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 08CB123340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9D3318D000A; Mon, 7 Dec 2020 06:33:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 986518D0001; Mon, 7 Dec 2020 06:33:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84A1F8D000A; Mon, 7 Dec 2020 06:33:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164]) by kanga.kvack.org (Postfix) with ESMTP id 6E1B68D0001 for ; Mon, 7 Dec 2020 06:33:51 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3511D33CD for ; Mon, 7 Dec 2020 11:33:51 +0000 (UTC) X-FDA: 77566276662.16.house95_2d0bdec273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 0B085100E6903 for ; Mon, 7 Dec 2020 11:33:51 +0000 (UTC) X-HE-Tag: house95_2d0bdec273de X-Filterd-Recvd-Size: 9550 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:33:50 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id i3so6025980pfd.6 for ; Mon, 07 Dec 2020 03:33:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QMkfGfUr8fj2hO5B5jLMQa5h9UySfQVQPVhyn3NCxd4=; b=lqKRhFOEdKV0wC0Ujvy8bkoj7sCr3aw0Z4/HDK77hQNPjJPNnVzFT7Y48IfRoK5fje 8Z3ax4VrbD+Dn0BhRTbgr8ogTL0zkSdpIuruZQ3+yy6Zr2XBsPy5eyrUcW0mVCr0/3bn qx+pBlAKJUjm/UklANHSReU2TCJRzqIA73fT6ogzAbRptOgSwK1+51QW8zzSqVkyvnwA lDTHfoho00Iof1lZ3QDDZkSwQBzQVJSl1+9+lTUfJcA7vzOvWze/xtSffZ8kB0THXJWq CodT0//aOcvnyDJGfu7Gte5EhdswKxOh89d3sLKX4/56IVsr8+P70YCf92dYnF/ATCXH EKEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QMkfGfUr8fj2hO5B5jLMQa5h9UySfQVQPVhyn3NCxd4=; b=owN7rcLUwMYn3SEVhHj6HOCmPsd7yU3x5laDIWWXI8YYXK4mcJVsZKHCZwMUiI+YRP 3TspBZ6iX6Xm/KOX0ZgbJ2kSD5HkthYCsJmemH0V1hUNyTxydzdk4H4f6i5o1uRVTaLH mLlX0y/uw37elzr7/o7m6NSZQjpEb6LVSuZx6J7g+drLJzHjdbka5abj/FFnCuJBqaKj Ciazy/rBiCBJzQcOBjblXT57G7JFIcCg0A1flZISERjcmLgT+V3n5TdEz/Bt8n/cF7S5 VfkGGH1o69Y27XNoed6yuyoB6diLm9xDXiBdDHdhR+FadQ4FJDN089LW8UOTAcD/LHBy BMfg== X-Gm-Message-State: AOAM53213fViET3XPj4/brkuM+2DmPN/eqp3N5Q0e6lsx0TrVo2cQrkZ 68GGTx4FsueexxGAUFGO1cnOHQG6C98= X-Google-Smtp-Source: ABdhPJwqvcI3wGyVmABSaYBJRnwrF2eTs+saCLZYg+lluLMzqvEOWJNH8GFu5PCjSHRlesroiuhjqA== X-Received: by 2002:a65:558a:: with SMTP id j10mr17960023pgs.370.1607340829629; Mon, 07 Dec 2020 03:33:49 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.33.46 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:33:49 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 07/37] dmem: trace core functions Date: Mon, 7 Dec 2020 19:31:00 +0800 Message-Id: <4ee2b130c35367a6a3e7b631c872b824a1f59d23.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Add tracepoints for dmem alloc_init, alloc and free functions, that helps us to figure out what is happening inside dmem allocator Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- fs/dmemfs/Makefile | 1 + fs/dmemfs/inode.c | 5 ++++ fs/dmemfs/trace.h | 54 +++++++++++++++++++++++++++++++++++ include/trace/events/dmem.h | 68 +++++++++++++++++++++++++++++++++++++++++++++ mm/dmem.c | 6 ++++ 5 files changed, 134 insertions(+) create mode 100644 fs/dmemfs/trace.h create mode 100644 include/trace/events/dmem.h diff --git a/fs/dmemfs/Makefile b/fs/dmemfs/Makefile index 73bdc9c..0b36d03 100644 --- a/fs/dmemfs/Makefile +++ b/fs/dmemfs/Makefile @@ -2,6 +2,7 @@ # # Makefile for the linux dmem-filesystem routines. # +ccflags-y += -I $(srctree)/$(src) # needed for trace events obj-$(CONFIG_DMEM_FS) += dmemfs.o dmemfs-y += inode.o diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c index 9ec62dc..7723b58 100644 --- a/fs/dmemfs/inode.c +++ b/fs/dmemfs/inode.c @@ -31,6 +31,9 @@ MODULE_AUTHOR("Tencent Corporation"); MODULE_LICENSE("GPL v2"); +#define CREATE_TRACE_POINTS +#include "trace.h" + struct dmemfs_mount_opts { unsigned long dpage_size; }; @@ -336,6 +339,7 @@ static void *find_radix_entry_or_next(struct address_space *mapping, offset += inode->i_sb->s_blocksize; dpages--; mapping->nrexceptional++; + trace_dmemfs_radix_tree_insert(index, entry); index++; } @@ -532,6 +536,7 @@ static void inode_drop_dpages(struct inode *inode, loff_t start, loff_t end) break; xa_erase(&mapping->i_pages, istart); + trace_dmemfs_radix_tree_delete(istart, pvec.pages[i]); mapping->nrexceptional--; addr = dmem_entry_to_addr(inode, pvec.pages[i]); diff --git a/fs/dmemfs/trace.h b/fs/dmemfs/trace.h new file mode 100644 index 00000000..cc11653 --- /dev/null +++ b/fs/dmemfs/trace.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/** + * trace.h - DesignWare Support + * + * Copyright (C) + * + * Author: Xiao Guangrong + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM dmemfs + +#if !defined(_TRACE_DMEMFS_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DMEMFS_H + +#include + +DECLARE_EVENT_CLASS(dmemfs_radix_tree_class, + TP_PROTO(unsigned long index, void *rentry), + TP_ARGS(index, rentry), + + TP_STRUCT__entry( + __field(unsigned long, index) + __field(void *, rentry) + ), + + TP_fast_assign( + __entry->index = index; + __entry->rentry = rentry; + ), + + TP_printk("index %lu entry %#lx", __entry->index, + (unsigned long)__entry->rentry) +); + +DEFINE_EVENT(dmemfs_radix_tree_class, dmemfs_radix_tree_insert, + TP_PROTO(unsigned long index, void *rentry), + TP_ARGS(index, rentry) +); + +DEFINE_EVENT(dmemfs_radix_tree_class, dmemfs_radix_tree_delete, + TP_PROTO(unsigned long index, void *rentry), + TP_ARGS(index, rentry) +); +#endif + +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH . + +#undef TRACE_INCLUDE_FILE +#define TRACE_INCLUDE_FILE trace + +/* This part must be outside protection */ +#include diff --git a/include/trace/events/dmem.h b/include/trace/events/dmem.h new file mode 100644 index 00000000..10d1b90 --- /dev/null +++ b/include/trace/events/dmem.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM dmem + +#if !defined(_TRACE_DMEM_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DMEM_H + +#include + +TRACE_EVENT(dmem_alloc_init, + TP_PROTO(unsigned long dpage_shift), + TP_ARGS(dpage_shift), + + TP_STRUCT__entry( + __field(unsigned long, dpage_shift) + ), + + TP_fast_assign( + __entry->dpage_shift = dpage_shift; + ), + + TP_printk("dpage_shift %lu", __entry->dpage_shift) +); + +TRACE_EVENT(dmem_alloc_pages_node, + TP_PROTO(phys_addr_t addr, int node, int try_max, int result_nr), + TP_ARGS(addr, node, try_max, result_nr), + + TP_STRUCT__entry( + __field(phys_addr_t, addr) + __field(int, node) + __field(int, try_max) + __field(int, result_nr) + ), + + TP_fast_assign( + __entry->addr = addr; + __entry->node = node; + __entry->try_max = try_max; + __entry->result_nr = result_nr; + ), + + TP_printk("addr %#lx node %d try_max %d result_nr %d", + (unsigned long)__entry->addr, __entry->node, + __entry->try_max, __entry->result_nr) +); + +TRACE_EVENT(dmem_free_pages, + TP_PROTO(phys_addr_t addr, int dpages_nr), + TP_ARGS(addr, dpages_nr), + + TP_STRUCT__entry( + __field(phys_addr_t, addr) + __field(int, dpages_nr) + ), + + TP_fast_assign( + __entry->addr = addr; + __entry->dpages_nr = dpages_nr; + ), + + TP_printk("addr %#lx dpages_nr %d", (unsigned long)__entry->addr, + __entry->dpages_nr) +); +#endif + +/* This part must be outside protection */ +#include diff --git a/mm/dmem.c b/mm/dmem.c index a77a064..aa34bf2 100644 --- a/mm/dmem.c +++ b/mm/dmem.c @@ -18,6 +18,8 @@ #include #include +#define CREATE_TRACE_POINTS +#include /* * There are two kinds of page in dmem management: * - nature page, it's the CPU's page size, i.e, 4K on x86 @@ -559,6 +561,8 @@ int dmem_alloc_init(unsigned long dpage_shift) mutex_lock(&dmem_pool.lock); + trace_dmem_alloc_init(dpage_shift); + if (dmem_pool.dpage_shift) { /* * double init on the same page size is okay @@ -686,6 +690,7 @@ int dmem_alloc_init(unsigned long dpage_shift) } } + trace_dmem_alloc_pages_node(addr, node, try_max, *result_nr); mutex_unlock(&dmem_pool.lock); } return addr; @@ -791,6 +796,7 @@ void dmem_free_pages(phys_addr_t addr, unsigned int dpages_nr) mutex_lock(&dmem_pool.lock); + trace_dmem_free_pages(addr, dpages_nr); WARN_ON(!dmem_pool.dpage_shift); dregion = find_dmem_region(addr, &pdnode); From patchwork Mon Dec 7 11:31:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DDEEC433FE for ; Mon, 7 Dec 2020 11:33:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9470F23340 for ; Mon, 7 Dec 2020 11:33:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9470F23340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 20B988D000B; Mon, 7 Dec 2020 06:33:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1BB848D0001; Mon, 7 Dec 2020 06:33:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D0748D000B; Mon, 7 Dec 2020 06:33:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id EA8AF8D0001 for ; Mon, 7 Dec 2020 06:33:55 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B249F8249980 for ; Mon, 7 Dec 2020 11:33:55 +0000 (UTC) X-FDA: 77566276830.13.hook53_4f15caf273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 9420D18140B67 for ; Mon, 7 Dec 2020 11:33:55 +0000 (UTC) X-HE-Tag: hook53_4f15caf273de X-Filterd-Recvd-Size: 7308 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:33:55 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id 11so3390199pfu.4 for ; Mon, 07 Dec 2020 03:33:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tcGaTFLZIvDv/+RZUQXw2n3pLOeGqC9LGmpKUSJxIHk=; b=LnlBNsT2Ow4sDfC0QMME7MWuZITq+AEcH1sfAXBFXueeGXTmgyO9JdwNHIFIB4RP28 gn5IWNFLijhXsJzSsKaUnVdGCQx2gyWEBTqF0Kvs61fwySFHpYos92ROZTSt5SHfNHpP OeImMtiG3vU9Uq5eC80XJMC+Whn9dkIcqFxDKCOBa6a4es99sdy7EHJfbDRbZezvytMN w02FAN9w/yfckdzYqHtTMk3cHALFlV1gyfVTQPLKMmQmnHecGcHt0t+MIVhr8i6BF3VP R9cuJswy7JR4RuuusdzKlbCS7tt+IM1+yJIPMKGbccth3u1CnENv8XPxLf4u/SW5Qm4e W00Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tcGaTFLZIvDv/+RZUQXw2n3pLOeGqC9LGmpKUSJxIHk=; b=qlvrMEn0kmrAH5YVHCviXA5eCQLwx8T8QffuvLLSHKDaJHTxDzXvO12DQBq1foInBF 1mWdHL1EN2Zan+FrKpdj9RhUwsgD76NLtvYIfLqRNPB0ssJVw4dxnurY9OhuKzCzeZtY Zchdx39cvVhxvyVJlxNylR7Wr/+oorOA69e9xWZqyzbPW1tB6Tl+QCLjs0WFUngp+tjb VORVzwcXvQ88Uuz8d2B/JrYYT+6rwvS0VLyF/ln+a4uQLK5rUuf4jwNWqid05ISucnA6 HSbvcFjR2oDxLLRbiRIXq4AyYmPeI1LrSMT0Q7I19uI2DHKgGAmRv8ZhGnFYPqJXesrg MeWQ== X-Gm-Message-State: AOAM531YypiDrl0ED1qtowdzEZdMIf2mFbOm3PCqSZHSQn/PtZk0q/yv tCR/NhLoG2cOJQ7bq9pYkMae+RvSrNk= X-Google-Smtp-Source: ABdhPJwuFOmVar8+NpPkcSNRrOAxFN6p9lyUDwrJbLZ14eN6XyJlArx3zJXtTYoeeUDbEsE3lQb1/A== X-Received: by 2002:a63:5d59:: with SMTP id o25mr17481331pgm.218.1607340834222; Mon, 07 Dec 2020 03:33:54 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.33.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:33:53 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 08/37] dmem: show some statistic in debugfs Date: Mon, 7 Dec 2020 19:31:01 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Create 'dmem' directory under debugfs and show some statistic for dmem pool, track total and free dpages on dmem pool and each numa node. Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- mm/Kconfig | 8 +++++ mm/dmem.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 107 insertions(+), 1 deletion(-) diff --git a/mm/Kconfig b/mm/Kconfig index 3a6d408..4dd8896 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -234,6 +234,14 @@ config DMEM Allow reservation of memory which could be for the dedicated use of dmem. It's the basis of dmemfs. +config DMEM_DEBUG_FS + bool "Enable debug information for direct memory" + depends on DMEM && DEBUG_FS + help + This option enables showing various statistics of direct memory + in debugfs filesystem. + +# # support for memory compaction config COMPACTION bool "Allow for memory compaction" diff --git a/mm/dmem.c b/mm/dmem.c index aa34bf2..6992e57 100644 --- a/mm/dmem.c +++ b/mm/dmem.c @@ -164,6 +164,103 @@ int dmem_region_register(int node, phys_addr_t start, phys_addr_t end) return 0; } +#ifdef CONFIG_DMEM_DEBUG_FS +struct debugfs_entry { + const char *name; + unsigned long offset; +}; + +#define DMEM_POOL_OFFSET(x) offsetof(struct dmem_pool, x) +#define DMEM_POOL_ENTRY(x) {__stringify(x), DMEM_POOL_OFFSET(x)} + +#define DMEM_NODE_OFFSET(x) offsetof(struct dmem_node, x) +#define DMEM_NODE_ENTRY(x) {__stringify(x), DMEM_NODE_OFFSET(x)} + +static struct debugfs_entry dmem_pool_entries[] = { + DMEM_POOL_ENTRY(region_num), + DMEM_POOL_ENTRY(registered_pages), + DMEM_POOL_ENTRY(unaligned_pages), + DMEM_POOL_ENTRY(dpage_shift), + DMEM_POOL_ENTRY(total_dpages), + DMEM_POOL_ENTRY(free_dpages), +}; + +static struct debugfs_entry dmem_node_entries[] = { + DMEM_NODE_ENTRY(total_dpages), + DMEM_NODE_ENTRY(free_dpages), +}; + +static int dmem_entry_get(void *offset, u64 *val) +{ + *val = *(u64 *)offset; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(dmem_fops, dmem_entry_get, NULL, "%llu\n"); + +static int dmemfs_init_debugfs_node(struct dmem_node *dnode, + struct dentry *parent) +{ + struct dentry *node_dir; + char dir_name[32]; + int i, ret = -EEXIST; + + snprintf(dir_name, sizeof(dir_name), "node%ld", + dnode - dmem_pool.nodes); + node_dir = debugfs_create_dir(dir_name, parent); + if (!node_dir) + return ret; + + for (i = 0; i < ARRAY_SIZE(dmem_node_entries); i++) + if (!debugfs_create_file(dmem_node_entries[i].name, 0444, + node_dir, (void *)dnode + dmem_node_entries[i].offset, + &dmem_fops)) + return ret; + return 0; +} + +static int dmemfs_init_debugfs(void) +{ + struct dentry *dmem_debugfs_dir; + struct dmem_node *dnode; + int i, ret = -EEXIST; + + dmem_debugfs_dir = debugfs_create_dir("dmem", NULL); + if (!dmem_debugfs_dir) + return ret; + + for (i = 0; i < ARRAY_SIZE(dmem_pool_entries); i++) + if (!debugfs_create_file(dmem_pool_entries[i].name, 0444, + dmem_debugfs_dir, + (void *)&dmem_pool + dmem_pool_entries[i].offset, + &dmem_fops)) + goto exit; + + for_each_dmem_node(dnode) { + /* + * do not create debugfs files for the node + * where no memory is available + */ + if (list_empty(&dnode->regions)) + continue; + + if (dmemfs_init_debugfs_node(dnode, dmem_debugfs_dir)) + goto exit; + } + + return 0; +exit: + debugfs_remove_recursive(dmem_debugfs_dir); + return ret; +} + +#else +static int dmemfs_init_debugfs(void) +{ + return 0; +} +#endif + #define PENALTY_FOR_DMEM_SHARED_NODE (1) static int dmem_nodeload[MAX_NUMNODES] __initdata; @@ -364,7 +461,8 @@ static int __init dmem_late_init(void) goto exit; } } - return ret; + + return dmemfs_init_debugfs(); exit: dmem_uinit(); return ret; From patchwork Mon Dec 7 11:31:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955421 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16F04C4361B for ; Mon, 7 Dec 2020 11:34:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A499323340 for ; Mon, 7 Dec 2020 11:34:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A499323340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 342E88D000C; Mon, 7 Dec 2020 06:34:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F24C8D0001; Mon, 7 Dec 2020 06:34:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E2638D000C; Mon, 7 Dec 2020 06:34:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 093258D0001 for ; Mon, 7 Dec 2020 06:34:01 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C8F7A180AD80F for ; Mon, 7 Dec 2020 11:34:00 +0000 (UTC) X-FDA: 77566277040.12.badge16_0a08931273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id A81331801202B for ; Mon, 7 Dec 2020 11:34:00 +0000 (UTC) X-HE-Tag: badge16_0a08931273de X-Filterd-Recvd-Size: 5631 Received: from mail-pg1-f193.google.com (mail-pg1-f193.google.com [209.85.215.193]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:00 +0000 (UTC) Received: by mail-pg1-f193.google.com with SMTP id g18so8670474pgk.1 for ; Mon, 07 Dec 2020 03:34:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hZ7s/TPFWElvYaRd2Z4ImP4zL6ID97NqxxjlgrDo46E=; b=AJAm2JQDxedsaO81FFlH0EqySZx0V1JZAWUyZSgG2JMejw/2P/2YNsn13l0harXSnv U3Gyun4C/tDEp5GY75ukYQ8NKpN7HVEwvvgyY8BzE7ExfFNdBMDEZ/aOgvx075MECqL7 pYdV9tIyA5ilsGCR6SY4i+lSEV5W2rx9BftPjIVtKxyjCDnbwztLge9MJFNY8KOYMREE wy7RQIV+YMYcC9VSimCcE1RnrIyOSmzDJjr9L/rCvjGXRPmPiQpqhp+CP4j7obar6c5B qxRX1Y83xXr1EcYfim8OijyqJDwojAyLtA46wHnMUKFxo4aQcxCnXcAgkjCcPB8f/9ef owvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hZ7s/TPFWElvYaRd2Z4ImP4zL6ID97NqxxjlgrDo46E=; b=G0mHdmIppe1dDI4SO8We5UtbY6PL2Pz5S8+74XZRbmx90ToUQg4OAZfi/HwCetf1TN k8tSKe+6sbTr/tIqHhj79KvBVOhuIB6hg8VN7m7PQU6prUYE+CF3Gf0W1ErN6zQ/W81D n4wXSiaQW9nUepyxZUbpmwBo2QbDOFeuA1isSNX5dOyag6ynyvmtrK84aX0rT9cpzJ60 /B2BcQ7kDrwMMsoUv9bzFTuxdES6Z0N08mchlJ7auow4r35F0uQbD58NkW7lLpaVvwzZ mOAahCyvhjtjNWO8Myup00v8X37hNME3kX4k6IgNz0qRDyroPNmeQLyWUNTG5fzjCZ5l MU9g== X-Gm-Message-State: AOAM530dDgdbi7cuXdSHxqp8RY5YfPMNvSMhfdDmkoTh0uoVl6ZECB13 No1E8RZ5VH9buqKKWNGDrWsuFs2Z/DU= X-Google-Smtp-Source: ABdhPJwNOCi3KR2PAp9drrOXOVZPvcKF+MCuZ8w1Uu6pFaCaU7qErwtZSzlK+Cv5rjtTzxX3eZsSAw== X-Received: by 2002:a17:902:aa84:b029:da:f114:6022 with SMTP id d4-20020a170902aa84b02900daf1146022mr5850388plr.46.1607340839351; Mon, 07 Dec 2020 03:33:59 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.33.56 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:33:58 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 09/37] dmemfs: support remote access Date: Mon, 7 Dec 2020 19:31:02 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang It is required by ptrace_writedata and ptrace_readdata to access dmem memory remotely. The typical user is gdb, after this patch, gdb is able to read & write memory owned by the attached process Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- fs/dmemfs/inode.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c index 7723b58..3192f31 100644 --- a/fs/dmemfs/inode.c +++ b/fs/dmemfs/inode.c @@ -364,6 +364,51 @@ static void radix_put_entry(void) rcu_read_unlock(); } +static bool check_vma_access(struct vm_area_struct *vma, int write) +{ + vm_flags_t vm_flags = write ? VM_WRITE : VM_READ; + + return !!(vm_flags & vma->vm_flags); +} + +static int +dmemfs_access_dmem(struct vm_area_struct *vma, unsigned long addr, + void *buf, int len, int write) +{ + struct inode *inode = file_inode(vma->vm_file); + struct super_block *sb = inode->i_sb; + void *entry, *maddr; + int offset, pgoff; + + if (!check_vma_access(vma, write)) + return -EACCES; + + pgoff = linear_page_index(vma, addr); + if (pgoff > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) + return -EFAULT; + + entry = radix_get_create_entry(vma, addr, inode, pgoff); + if (IS_ERR(entry)) + return PTR_ERR(entry); + + offset = addr & (sb->s_blocksize - 1); + addr = dmem_entry_to_addr(inode, entry); + + /* + * it is not beyond vma's region as the vma should be aligned + * to blocksize + */ + len = min(len, (int)(sb->s_blocksize - offset)); + maddr = __va(addr); + if (write) + memcpy(maddr + offset, buf, len); + else + memcpy(buf, maddr + offset, len); + radix_put_entry(); + + return len; +} + static vm_fault_t dmemfs_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; @@ -400,6 +445,7 @@ static unsigned long dmemfs_pagesize(struct vm_area_struct *vma) static const struct vm_operations_struct dmemfs_vm_ops = { .fault = dmemfs_fault, .pagesize = dmemfs_pagesize, + .access = dmemfs_access_dmem, }; int dmemfs_file_mmap(struct file *file, struct vm_area_struct *vma) From patchwork Mon Dec 7 11:31:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955423 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6047C433FE for ; Mon, 7 Dec 2020 11:34:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7E0B123340 for ; Mon, 7 Dec 2020 11:34:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7E0B123340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 147F08D000D; Mon, 7 Dec 2020 06:34:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 11E718D0001; Mon, 7 Dec 2020 06:34:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00BFC8D000D; Mon, 7 Dec 2020 06:34:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id DF93A8D0001 for ; Mon, 7 Dec 2020 06:34:05 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9FC031EE6 for ; Mon, 7 Dec 2020 11:34:05 +0000 (UTC) X-FDA: 77566277250.13.room48_5106398273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 7CBBF18140B60 for ; Mon, 7 Dec 2020 11:34:05 +0000 (UTC) X-HE-Tag: room48_5106398273de X-Filterd-Recvd-Size: 5890 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:04 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id i3so6026827pfd.6 for ; Mon, 07 Dec 2020 03:34:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qTNI65Ra3SvOxRqRKBzegtUpucaZ64C3XLV6YdV+FWI=; b=WZ7Du0ULB95oU81wnLc5gl3z/aytDZ/R8hi+jxeHPvL3Uj6qbl1w6qJYy8PYblreD7 KkS3EAfoEHij1nALKdPH1U4pxPS0IyvqqemIQiGQMyDBC7uDpffb/MAkinfFKdVcbzW+ ccqmpEumCFtRql9bw4TNMZX2lqp8gs6naZPHyGLUqJUJGO5hakqfFrv5eThcK9prsbyq rA5jkLCAuckYLJvmJ/7JoZy+hde7C1WAvWqFV42rW7dKwCVNU9p/OVP/OE9bqgO1xMQS M+Ap2sPI4y9/jiLNfsu4126AgBFlUfyiLrevHYALZF4wOYdHEUv5qkHHoVqj09LrY8u0 OGAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qTNI65Ra3SvOxRqRKBzegtUpucaZ64C3XLV6YdV+FWI=; b=avvrMANJXpUjQa5P9ERidAuT/Pp6ltVy+JtuBTQ9FBrYH8SinD3wEz7Kvdd8LSY7oc eTjCi9Zs04NL1OZWMQ67gXYcVEQP+NsUczjmgmyNpa4qh/HS//fkvt7PwlvVxZUjoOX6 P3aNPMaMEMcbnIMjqKxSicAb+oRRgfbfkI/usIL4uDU7A/ar1camyDpFBHexhFCfx9/Z KOonxQwF2Lh0LhdWP28F9gXu6gywqk9PS6ChkM/vw1Q43pEAktD512zOH4jnncnfIoKM ozfSfMYP/vEhvBN3h91ytEGDcHTAVQzDOFbvXGfCZ0uKOjYoYnikrFBrzqaTOfbXEtGf WNMA== X-Gm-Message-State: AOAM5307rpBx+QySBMPhQtgcTCxPHp3dtOJypqJQL1xF3cujGJvpGWKC MBMrNi8QaqDHbhn9aWOwVGAptzCE7Kc= X-Google-Smtp-Source: ABdhPJzLlbx7N/WW+brtTZmzWtaMEPGG1TMIRCFOt3UUqUvhW972s9iQVTyzpcnADKQPTclv8/zM4g== X-Received: by 2002:a62:5205:0:b029:19e:a0f:2c81 with SMTP id g5-20020a6252050000b029019e0a0f2c81mr4652442pfb.50.1607340844089; Mon, 07 Dec 2020 03:34:04 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.00 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:03 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 10/37] dmemfs: introduce max_alloc_try_dpages parameter Date: Mon, 7 Dec 2020 19:31:03 +0800 Message-Id: <08ff7e40806a2342720835b95f9be24d5778c703.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang It specifies the dmem page number allocated at one time, then multiple radix entries can be created. That will relief the allocation pressure and make page fault more fast. However that could cause no dmem page mmapped to userspace even if there are some free dmem pages. Set it to 1 to completely disable this behavior. Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- fs/dmemfs/inode.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c index 3192f31..443f2e1 100644 --- a/fs/dmemfs/inode.c +++ b/fs/dmemfs/inode.c @@ -34,6 +34,8 @@ #define CREATE_TRACE_POINTS #include "trace.h" +static uint __read_mostly max_alloc_try_dpages = 1; + struct dmemfs_mount_opts { unsigned long dpage_size; }; @@ -46,6 +48,44 @@ enum dmemfs_param { Opt_dpagesize, }; +static int +max_alloc_try_dpages_set(const char *val, const struct kernel_param *kp) +{ + uint sval; + int ret; + + ret = kstrtouint(val, 0, &sval); + if (ret) + return ret; + + /* should be 1 at least */ + if (!sval) + return -EINVAL; + + max_alloc_try_dpages = sval; + return 0; +} + +static struct kernel_param_ops alloc_max_try_dpages_ops = { + .set = max_alloc_try_dpages_set, + .get = param_get_uint, +}; + +/* + * it specifies the dmem page number allocated at one time, then + * multiple radix entries can be created. That will relief the + * allocation pressure and make page fault more fast. + * + * however that could cause no dmem page mmapped to userspace + * even if there are some free dmem pages + * + * set it to 1 to completely disable this behavior + */ +fs_param_cb(max_alloc_try_dpages, &alloc_max_try_dpages_ops, + &max_alloc_try_dpages, 0644); +__MODULE_PARM_TYPE(max_alloc_try_dpages, "uint"); +MODULE_PARM_DESC(max_alloc_try_dpages, "Set the dmem page number allocated at one time, should be 1 at least"); + const struct fs_parameter_spec dmemfs_fs_parameters[] = { fsparam_string("pagesize", Opt_dpagesize), {} @@ -314,6 +354,7 @@ static void *find_radix_entry_or_next(struct address_space *mapping, } rcu_read_unlock(); + try_dpages = min(try_dpages, max_alloc_try_dpages); /* entry does not exist, create it */ addr = dmem_alloc_pages_vma(vma, fault_addr, try_dpages, &dpages); if (!addr) { From patchwork Mon Dec 7 11:31:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01258C433FE for ; Mon, 7 Dec 2020 11:34:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 936C4233A0 for ; Mon, 7 Dec 2020 11:34:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 936C4233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 31A928D000E; Mon, 7 Dec 2020 06:34:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F2E18D0001; Mon, 7 Dec 2020 06:34:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F6668D000E; Mon, 7 Dec 2020 06:34:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0175.hostedemail.com [216.40.44.175]) by kanga.kvack.org (Postfix) with ESMTP id 06F338D0001 for ; Mon, 7 Dec 2020 06:34:18 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A2894362D for ; Mon, 7 Dec 2020 11:34:17 +0000 (UTC) X-FDA: 77566277754.01.spade19_05151f9273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 84C2210046469 for ; Mon, 7 Dec 2020 11:34:17 +0000 (UTC) X-HE-Tag: spade19_05151f9273de X-Filterd-Recvd-Size: 5214 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:17 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id c79so9615533pfc.2 for ; Mon, 07 Dec 2020 03:34:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=oiWdzjaw1OmADmhfyVOwsM+WBviDc5Qv5LoJ9srjCn8=; b=GojkmbMdLZzqltnpM8dqoYBuO8JaFV8vAaCoFvmgCZQqrwLo0y16hu1kT/YygK2K44 QgsvracpXFSOoq10bZ7qguwwQu1ru9XSbbiMvPp7eesw75iZj6ZVQsKNtR0IjjS26QMH cFd6l1CrJHxoPJQJu6/K0t0PkR9kVQ4MSXDRj7v5lcuc0TVQB2XR1UsK6wwur1N5vxSU AwwTCEI7iBkKTY/Nxu9R233DLfjsq3wT7bb41Rt1La/5OFAq97XS6aCduvxnutsk9ctX w+0uFdcrlcAyAqvQc0wAp/QKxdlcnw+cQ5Vx+NJlV1uHfISi3iC7tddWQEiSnJWzi119 ZeCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oiWdzjaw1OmADmhfyVOwsM+WBviDc5Qv5LoJ9srjCn8=; b=AKAMgogHb/kEI45kGCCxhwxB3AN3vxhiTcRQ2AZMCiE3mPTUPFkjmrmYqidS9mdTgI gG1xAZumWl8oSxP+gTKdDsAb914H8WMLO+nnYZj1iVWcZgBnnqBMttqTD0PWxx28KaUn Qb9nrVdilLH6nulGa1mw1kD1a9bSI7xOrQWX1PcqH3TP/HBLBQIPvL9aPhSHMwH2Sk8r sY9tNxp9NE71VbWizt6cMNlqCCXKtOiZwEnFVonqRnrKrkmdb2IqAp7W+lxfqLFf5QNE yNKDYKIlGkOdgvrZaRz/F+DpO3x88N4oh9kQCrnGOqvi7B4LDboW9SkcXs1L3B5vXgnT 593A== X-Gm-Message-State: AOAM531OJ2KcmycQYlqnLNHZwdoOw/jNgLvvOkX54qpfHaDAQq5ew2Fv //JamdfNxP9n308uulv5IQuRC41UX9E= X-Google-Smtp-Source: ABdhPJxqMp4yjvhA/Bg9lRA+V9tn9lbXheFRUjSeNQl1fvqRvMg3Qy9PvhPeukmBJN6/2hsxmmnCGQ== X-Received: by 2002:a63:ce0c:: with SMTP id y12mr18084599pgf.208.1607340856109; Mon, 07 Dec 2020 03:34:16 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:15 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang Subject: [RFC V2 11/37] mm: export mempolicy interfaces to serve dmem allocator Date: Mon, 7 Dec 2020 19:31:04 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Export interface interleave_nid() to serve dmem allocator. Signed-off-by: Yulei Zhang --- include/linux/mempolicy.h | 3 +++ mm/mempolicy.c | 4 ++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 5f1c74d..4789661 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -139,6 +139,9 @@ struct mempolicy *mpol_shared_policy_lookup(struct shared_policy *sp, struct mempolicy *get_task_policy(struct task_struct *p); struct mempolicy *__get_vma_policy(struct vm_area_struct *vma, unsigned long addr); +struct mempolicy *get_vma_policy(struct vm_area_struct *vma, unsigned long addr); +unsigned interleave_nid(struct mempolicy *pol, struct vm_area_struct *vma, + unsigned long addr, int shift); bool vma_policy_mof(struct vm_area_struct *vma); extern void numa_default_policy(void); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 3ca4898..efd80e5 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1813,7 +1813,7 @@ struct mempolicy *__get_vma_policy(struct vm_area_struct *vma, * freeing by another task. It is the caller's responsibility to free the * extra reference for shared policies. */ -static struct mempolicy *get_vma_policy(struct vm_area_struct *vma, +struct mempolicy *get_vma_policy(struct vm_area_struct *vma, unsigned long addr) { struct mempolicy *pol = __get_vma_policy(vma, addr); @@ -1978,7 +1978,7 @@ static unsigned offset_il_node(struct mempolicy *pol, unsigned long n) } /* Determine a node number for interleave */ -static inline unsigned interleave_nid(struct mempolicy *pol, +unsigned interleave_nid(struct mempolicy *pol, struct vm_area_struct *vma, unsigned long addr, int shift) { if (vma) { From patchwork Mon Dec 7 11:31:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955427 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 765F2C433FE for ; Mon, 7 Dec 2020 11:34:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1934F233A0 for ; Mon, 7 Dec 2020 11:34:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1934F233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B32EC8D000F; Mon, 7 Dec 2020 06:34:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B08A18D0001; Mon, 7 Dec 2020 06:34:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A20C68D000F; Mon, 7 Dec 2020 06:34:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0203.hostedemail.com [216.40.44.203]) by kanga.kvack.org (Postfix) with ESMTP id 8B59F8D0001 for ; Mon, 7 Dec 2020 06:34:21 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4A28033CD for ; Mon, 7 Dec 2020 11:34:21 +0000 (UTC) X-FDA: 77566277922.29.cook68_250c4fc273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 28A19180868E3 for ; Mon, 7 Dec 2020 11:34:21 +0000 (UTC) X-HE-Tag: cook68_250c4fc273de X-Filterd-Recvd-Size: 10760 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:20 +0000 (UTC) Received: by mail-pg1-f174.google.com with SMTP id q3so8672133pgr.3 for ; Mon, 07 Dec 2020 03:34:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ydAF98InOXvLOUHWcQbefHcTxGQjuTY7EYp8f1Ij7rU=; b=CbIO/lKMhlmMybymchHGdjYBFjooSqYR87ZMJNCpzamSy0rqGGupNjGohNmWYyuuVF +xqHe0wJyPOa8tX4wGwI9xsV0mLlIIhWZv7yi7zuh1i+ZjbHzI7SroHpaRk8tPLHv8j9 TY3DoXxPGxxbJvXcTFLjLpG9ptLpcORhfr4m/KQm4y0U00zVK1J9SxRLVrdKucZO6fgt Iq53lTb5pWIqLsYf0RPyKOtYavEZdCQ2SpdfqSEQDGGfienBvS0lHPdVmAi0he3F9oSM 2qO8yJB8UNJ6tM0GvVovraZIBAc/ZtLNNhUmoqFgjDcE70JCp5Sfe/0KBwOp0Vx9h2oD SRmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ydAF98InOXvLOUHWcQbefHcTxGQjuTY7EYp8f1Ij7rU=; b=q/4MzsIEpW4tabj8Px7LhoAL2XF7o/ZmzrzX7aBD0Y52Ny0oACYqAok7s7kteFyzng Y5XszABggwaxCqGwAm/lnAZ9s2dN07aYgbQluYVAFcsiwlx3eWl0POXQMxLvQIUrK7/V ReX8IlVB47BlfbXPI50jhwV+ob2CSo4+cfbHgWXaLT19TENP//UyryCoc6jkWOr3OZNB fv9eu6JM2C8EC3B2niaoZnvHueOyP7qZts8/wKomkwPdx4GYeKGMcBkrKSmfI8V1jhwp eEIS/Nzfqu2wLWrCk3kAEyFOSlgFMoOtraoA5IfQN8on7y4vb3bj+vix0QgXZwxx2UZA DLsw== X-Gm-Message-State: AOAM532Ze/FGWeLTFl3RVN+UWueSJ3qL+MAEyfjduSKPtaGxhMfpKzrn Gv2069Gqsvimb9lm4EpEbbzyVekssVY= X-Google-Smtp-Source: ABdhPJzAC/Gy2AUUTW9sU9fx/7AU0/diRQdzJN4dFsjfTKyWiDt0U8rw4OyPNmxFkhpistj53GkGWw== X-Received: by 2002:aa7:9244:0:b029:19a:b335:754b with SMTP id 4-20020aa792440000b029019ab335754bmr15898559pfp.29.1607340859523; Mon, 07 Dec 2020 03:34:19 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:19 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Haiwei Li Subject: [RFC V2 12/37] dmem: introduce mempolicy support Date: Mon, 7 Dec 2020 19:31:05 +0800 Message-Id: <28718e3b8886b9ec3e4700c2d55a9629ca9fc27c.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang It adds mempolicy support for dmem to allocates memory from mempolicy specified nodes. Signed-off-by: Haiwei Li Signed-off-by: Yulei Zhang --- arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable.h | 7 ++++ arch/x86/include/asm/pgtable_types.h | 13 +++++++- fs/dmemfs/Kconfig | 3 ++ include/linux/pgtable.h | 7 ++++ mm/Kconfig | 3 ++ mm/dmem.c | 63 ++++++++++++++++++++++++++++++++++-- 7 files changed, 94 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f6946b8..9ccee76 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -73,6 +73,7 @@ config X86 select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PMEM_API if X86_64 select ARCH_HAS_PTE_DEVMAP if X86_64 + select ARCH_HAS_PTE_DMEM if X86_64 select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_UACCESS_FLUSHCACHE if X86_64 select ARCH_HAS_COPY_MC if X86_64 diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a02c672..dd4aff6 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -452,6 +452,13 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd) return pmd_set_flags(pmd, _PAGE_DEVMAP); } +#ifdef CONFIG_ARCH_HAS_PTE_DMEM +static inline pmd_t pmd_mkdmem(pmd_t pmd) +{ + return pmd_set_flags(pmd, _PAGE_SPECIAL | _PAGE_DMEM); +} +#endif + static inline pmd_t pmd_mkhuge(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_PSE); diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 816b31c..ee4cae1 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -23,6 +23,15 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ +#define _PAGE_BIT_DMEM 57 /* Flag used to indicate dmem pmd. + * Since _PAGE_BIT_SPECIAL is defined + * same as _PAGE_BIT_CPA_TEST, we can + * not only use _PAGE_BIT_SPECIAL, so + * add _PAGE_BIT_DMEM to help + * indicate it. Since dmem pte will + * never be splitting, setting + * _PAGE_BIT_SPECIAL for pte is enough. + */ #define _PAGE_BIT_SOFTW4 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ @@ -112,9 +121,11 @@ #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) #define _PAGE_DEVMAP (_AT(u64, 1) << _PAGE_BIT_DEVMAP) +#define _PAGE_DMEM (_AT(u64, 1) << _PAGE_BIT_DMEM) #else #define _PAGE_NX (_AT(pteval_t, 0)) #define _PAGE_DEVMAP (_AT(pteval_t, 0)) +#define _PAGE_DMEM (_AT(pteval_t, 0)) #endif #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) @@ -128,7 +139,7 @@ #define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \ _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC | \ - _PAGE_UFFD_WP) + _PAGE_UFFD_WP | _PAGE_DMEM) #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE) /* diff --git a/fs/dmemfs/Kconfig b/fs/dmemfs/Kconfig index d2894a5..19ca391 100644 --- a/fs/dmemfs/Kconfig +++ b/fs/dmemfs/Kconfig @@ -1,5 +1,8 @@ config DMEM_FS tristate "Direct Memory filesystem support" + depends on DMEM + depends on TRANSPARENT_HUGEPAGE + depends on ARCH_HAS_PTE_DMEM help dmemfs (Direct Memory filesystem) is device memory or reserved memory based filesystem. This kind of memory is special as it diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 71125a4..9e65694 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1157,6 +1157,13 @@ static inline int pud_trans_unstable(pud_t *pud) #endif } +#ifndef CONFIG_ARCH_HAS_PTE_DMEM +static inline pmd_t pmd_mkdmem(pmd_t pmd) +{ + return pmd; +} +#endif + #ifndef pmd_read_atomic static inline pmd_t pmd_read_atomic(pmd_t *pmdp) { diff --git a/mm/Kconfig b/mm/Kconfig index 4dd8896..10fd7ff 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -794,6 +794,9 @@ config IDLE_PAGE_TRACKING config ARCH_HAS_PTE_DEVMAP bool +config ARCH_HAS_PTE_DMEM + bool + config ZONE_DEVICE bool "Device memory (pmem, HMM, etc...) hotplug support" depends on MEMORY_HOTPLUG diff --git a/mm/dmem.c b/mm/dmem.c index 6992e57..2e61dbd 100644 --- a/mm/dmem.c +++ b/mm/dmem.c @@ -822,6 +822,56 @@ int dmem_alloc_init(unsigned long dpage_shift) } EXPORT_SYMBOL(dmem_alloc_pages_nodemask); +/* Return a nodelist indicated for current node representing a mempolicy */ +static int *policy_nodelist(struct mempolicy *policy) +{ + int nd = numa_node_id(); + + switch (policy->mode) { + case MPOL_PREFERRED: + if (!(policy->flags & MPOL_F_LOCAL)) + nd = policy->v.preferred_node; + break; + case MPOL_BIND: + if (unlikely(!node_isset(nd, policy->v.nodes))) + nd = first_node(policy->v.nodes); + break; + default: + WARN_ON(1); + } + return dmem_nodelist(nd); +} + +static nodemask_t *dmem_policy_nodemask(struct mempolicy *policy) +{ + if (unlikely(policy->mode == MPOL_BIND) && + cpuset_nodemask_valid_mems_allowed(&policy->v.nodes)) + return &policy->v.nodes; + + return NULL; +} + +static void +get_mempolicy_nlist_and_nmask(struct mempolicy *pol, + struct vm_area_struct *vma, unsigned long addr, + int **nl, nodemask_t **nmask) +{ + if (pol->mode == MPOL_INTERLEAVE) { + unsigned int nid; + + /* + * we use dpage_shift to interleave numa nodes although + * multiple dpages may be allocated + */ + nid = interleave_nid(pol, vma, addr, dmem_pool.dpage_shift); + *nl = dmem_nodelist(nid); + *nmask = NULL; + } else { + *nl = policy_nodelist(pol); + *nmask = dmem_policy_nodemask(pol); + } +} + /* * dmem_alloc_pages_vma - Allocate pages for a VMA. * @@ -830,6 +880,9 @@ int dmem_alloc_init(unsigned long dpage_shift) * @try_max: try to allocate @try_max dpages if possible * @result_nr: allocated dpage number returned to the caller * + * This function allocates pages from dmem pool and applies a NUMA policy + * associated with the VMA. + * * Return the physical address of the first dpage allocated from dmem * pool, or 0 on failure. The allocated dpage number is filled into * @result_nr @@ -839,13 +892,19 @@ int dmem_alloc_init(unsigned long dpage_shift) unsigned int try_max, unsigned int *result_nr) { phys_addr_t phys_addr; + struct mempolicy *pol; int *nl; + nodemask_t *nmask; unsigned int cpuset_mems_cookie; retry_cpuset: - nl = dmem_nodelist(numa_node_id()); + pol = get_vma_policy(vma, addr); + cpuset_mems_cookie = read_mems_allowed_begin(); + + get_mempolicy_nlist_and_nmask(pol, vma, addr, &nl, &nmask); + mpol_cond_put(pol); - phys_addr = dmem_alloc_pages_from_nodelist(nl, NULL, try_max, + phys_addr = dmem_alloc_pages_from_nodelist(nl, nmask, try_max, result_nr); if (unlikely(!phys_addr && read_mems_allowed_retry(cpuset_mems_cookie))) goto retry_cpuset; From patchwork Mon Dec 7 11:31:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955429 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C73E1C4361B for ; Mon, 7 Dec 2020 11:34:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6AA1A233A0 for ; Mon, 7 Dec 2020 11:34:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6AA1A233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DCB1B8D0010; Mon, 7 Dec 2020 06:34:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DA0E58D0001; Mon, 7 Dec 2020 06:34:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C999C8D0010; Mon, 7 Dec 2020 06:34:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0166.hostedemail.com [216.40.44.166]) by kanga.kvack.org (Postfix) with ESMTP id B21738D0001 for ; Mon, 7 Dec 2020 06:34:24 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 742401EE6 for ; Mon, 7 Dec 2020 11:34:24 +0000 (UTC) X-FDA: 77566278048.15.unit21_4310724273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 533131814B0C1 for ; Mon, 7 Dec 2020 11:34:24 +0000 (UTC) X-HE-Tag: unit21_4310724273de X-Filterd-Recvd-Size: 5250 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:23 +0000 (UTC) Received: by mail-pf1-f170.google.com with SMTP id 131so9599623pfb.9 for ; Mon, 07 Dec 2020 03:34:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JTgCCtK3neklBBZP7ALPG1Lf4m1ypdSzhK8xGX8aX9M=; b=SMiGuwjR9VTZexC3BrV+8B6agmZASS9JEaHnUZ2E1SWgJbsxMfQOcftyniDMaI4UtS HqH6yx1ijBUM+P9SqB+YEbgYrBhqCjmLdV5Rg7vXsyZ5jaSaXRaZJmc3Yjxq+eEZG4oV nX66/M0q5EhEDZF9/LBmyncihaCPD6I2LGmGewQmo7Ga9FfsB0ouuSEkUUjD6lxI+caB D6bY535ziVmj3fxBixW3OY+kJ46tqvHPAMERQCF66rPJh2efyUOOl5tjzf+zzNRblBCI Xb2Fj9L2Eaf0AdeetZGtjqVgVV4QkTfvH80QfKPg+23LmZ8AusMLBfv6lkdZ/XnKYpIr aHeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JTgCCtK3neklBBZP7ALPG1Lf4m1ypdSzhK8xGX8aX9M=; b=HAhGf88YgrDRpevE1jf87Adw6syfiAyfTsAmjwnh9Hvc/G0JHzcdLzBYdJIR062JW5 Kh4Qp258QfhDD2Pnf9gs3WCdZA9rWk3U/XDAg9amR7Gkw0z7Y4PwhsSiSOCCDN9SaJAh FO1KJgnrVi0GVvKj/AxQ8IuK2leeZvqwaN3TEKIgbjNw2rS9YegcFlCt9MzUj0DQGLQn fbb2XIPxU+9m/mqLX5LxGNq9+/YH7G4fEXW/87KMCwCrjRBtn02yAtQahgtG7OaZh7Oa dlMa8oMv6kzOvnqVfrMPQOTz9bgkrFY6ntyS5ySeOGrLUDOaxhwFKpe+6HN9TT5Dltfz WSyg== X-Gm-Message-State: AOAM533OvXbvJWmEaaNLkcFBHorM0gfPyUfmeXWnl3yJHS5D2gmJgwso ckBAeFkRHAfl6jc8wso35EvHirGIBMs= X-Google-Smtp-Source: ABdhPJzkG6zV67FqF8oje1zqM80ZaXke6YmOfj+FEi4/qDBrNc9iaM3WstHda8DEvRc0vm268vDR+g== X-Received: by 2002:a63:445c:: with SMTP id t28mr17760750pgk.373.1607340862967; Mon, 07 Dec 2020 03:34:22 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.19 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:22 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 13/37] mm, dmem: introduce PFN_DMEM and pfn_t_dmem Date: Mon, 7 Dec 2020 19:31:06 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Introduce PFN_DMEM as a new pfn flag for dmem pfn, define it by setting (BITS_PER_LONG_LONG - 6) bit. Introduce pfn_t_dmem() helper to recognize dmem pfn. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- include/linux/pfn_t.h | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/include/linux/pfn_t.h b/include/linux/pfn_t.h index 2d91482..c6c0f1f 100644 --- a/include/linux/pfn_t.h +++ b/include/linux/pfn_t.h @@ -11,6 +11,7 @@ * PFN_MAP - pfn has a dynamic page mapping established by a device driver * PFN_SPECIAL - for CONFIG_FS_DAX_LIMITED builds to allow XIP, but not * get_user_pages + * PFN_DMEM - pfn references a dmem page */ #define PFN_FLAGS_MASK (((u64) (~PAGE_MASK)) << (BITS_PER_LONG_LONG - PAGE_SHIFT)) #define PFN_SG_CHAIN (1ULL << (BITS_PER_LONG_LONG - 1)) @@ -18,13 +19,15 @@ #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3)) #define PFN_MAP (1ULL << (BITS_PER_LONG_LONG - 4)) #define PFN_SPECIAL (1ULL << (BITS_PER_LONG_LONG - 5)) +#define PFN_DMEM (1ULL << (BITS_PER_LONG_LONG - 6)) #define PFN_FLAGS_TRACE \ { PFN_SPECIAL, "SPECIAL" }, \ { PFN_SG_CHAIN, "SG_CHAIN" }, \ { PFN_SG_LAST, "SG_LAST" }, \ { PFN_DEV, "DEV" }, \ - { PFN_MAP, "MAP" } + { PFN_MAP, "MAP" }, \ + { PFN_DMEM, "DMEM" } static inline pfn_t __pfn_to_pfn_t(unsigned long pfn, u64 flags) { @@ -128,4 +131,16 @@ static inline bool pfn_t_special(pfn_t pfn) return false; } #endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */ + +#ifdef CONFIG_ARCH_HAS_PTE_DMEM +static inline bool pfn_t_dmem(pfn_t pfn) +{ + return (pfn.val & PFN_DMEM) == PFN_DMEM; +} +#else +static inline bool pfn_t_dmem(pfn_t pfn) +{ + return false; +} +#endif /* CONFIG_ARCH_HAS_PTE_DMEM */ #endif /* _LINUX_PFN_T_H_ */ From patchwork Mon Dec 7 11:31:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955433 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26404C2BBCF for ; Mon, 7 Dec 2020 11:34:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C0D3723403 for ; Mon, 7 Dec 2020 11:34:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C0D3723403 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 53E918D0011; Mon, 7 Dec 2020 06:34:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 517428D0001; Mon, 7 Dec 2020 06:34:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DD748D0011; Mon, 7 Dec 2020 06:34:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0149.hostedemail.com [216.40.44.149]) by kanga.kvack.org (Postfix) with ESMTP id 28C3C8D0001 for ; Mon, 7 Dec 2020 06:34:28 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id DF12A181AEF1F for ; Mon, 7 Dec 2020 11:34:27 +0000 (UTC) X-FDA: 77566278174.14.roof94_4b0b902273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id BF03818229996 for ; Mon, 7 Dec 2020 11:34:27 +0000 (UTC) X-HE-Tag: roof94_4b0b902273de X-Filterd-Recvd-Size: 5477 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:27 +0000 (UTC) Received: by mail-pf1-f175.google.com with SMTP id 131so9599804pfb.9 for ; Mon, 07 Dec 2020 03:34:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=R3/DmgnhBXN4Hz69/t1CkODYALK4F+PnSmV8LC6JbSA=; b=i533nrfJosg5xurLfijup+2bdmgF+Al9N2r8hxZlMbkp+JC7XK4unPQu0mjh1PsG8f iRrrzCItR+iYj35Lj6OLsH2XeXSKTc9vtDLt82Om+/Qri/9Jof/xUBTWrvLM7VIfc1ja 9q9N/D8Fo/wp68q/S/DtBI1xr7NXgoezjHV7qYiHhpssHLqTvReszbbS2wmGP+GVzxdM ChEPA2iflLd0spt7SslHX2zIfs/7DCmxkXr3kDcBaa01uerLz6FraqlK+JHb8L1qFieK ZeAa+v+hd0ftuYQfi4Lo/TJdTWxi9w7I6yqaeACSWEArGkEl6yRcCrp4s1/tCgml3G3i Pwmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R3/DmgnhBXN4Hz69/t1CkODYALK4F+PnSmV8LC6JbSA=; b=NM8SLrBbQ+tqH9kI8wv+GCq1HEWizFsHjxN2yXw9kkofiASW3k//DaLVI520NqGAoC EMd+YRtckFW7fqW67Rn3CzbXrbNqC4h5iuY6ZIlEiKyOo2c2j6Mc48UkJEmyRRmH+y9G r8gGWI9nPyTtKveHiTnzI6VdfjLRJEMKEylHa98RUBUv/wpElVy1aIH21JBcVabKI/OC zOaYSQSdDlC3hOEl/o0sUZ1gi/bD3PbcgJolxBsa6+e7xDwW9bVT6hk8Lw1tlqGt0nLT XOlHUyCZXqAh/I6VkE7P8J3fW/w/sZ4Ffi8g+Mz87W93x26m+JpBnXnlVccbi/0CrTVP 9BZg== X-Gm-Message-State: AOAM532BTnfqW4ncb6+4J2ugchkse9iU6G8iUGLDDNSJQhPiGW+nqv0S gF8Nax9lLZ8fQJyPRtOzeVuO7THzhSQ= X-Google-Smtp-Source: ABdhPJz88HczwHM2v2Saqr/8f+J+9LkiKsk1pqJ18lze4vI837EannlLkxaMIwLu/EjTW1jpYWYrvg== X-Received: by 2002:a63:4905:: with SMTP id w5mr17941895pga.124.1607340866415; Mon, 07 Dec 2020 03:34:26 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:25 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 14/37] mm, dmem: differentiate dmem-pmd and thp-pmd Date: Mon, 7 Dec 2020 19:31:07 +0800 Message-Id: <9e1413b30d1cd4777af732e0995a7e7a03baeea6.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang A dmem huge page is ultimately not a transparent huge page. As we decided to use pmd_special() to distinguish dmem-pmd from thp-pmd, we should make some slightly different semantics between pmd_special() and pmd_trans_huge(), just as pmd_devmap() in upstream. This distinction is especially important in some mm-core paths such as zap_pmd_range(). Explicitly mark the pmd_trans_huge() helpers that dmem needs by adding pmd_special() checks. This method could be reused in many mm-core paths. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- arch/x86/include/asm/pgtable.h | 10 +++++++++- include/linux/pgtable.h | 5 +++++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index dd4aff6..6ce85d4 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -259,7 +259,7 @@ static inline int pmd_large(pmd_t pte) /* NOTE: when predicate huge page, consider also pmd_devmap, or use pmd_large */ static inline int pmd_trans_huge(pmd_t pmd) { - return (pmd_val(pmd) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE; + return (pmd_val(pmd) & (_PAGE_PSE|_PAGE_DEVMAP|_PAGE_DMEM)) == _PAGE_PSE; } #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD @@ -275,6 +275,14 @@ static inline int has_transparent_hugepage(void) return boot_cpu_has(X86_FEATURE_PSE); } +#ifdef CONFIG_ARCH_HAS_PTE_DMEM +static inline int pmd_special(pmd_t pmd) +{ + return (pmd_val(pmd) & (_PAGE_SPECIAL | _PAGE_DMEM)) == + (_PAGE_SPECIAL | _PAGE_DMEM); +} +#endif + #ifdef CONFIG_ARCH_HAS_PTE_DEVMAP static inline int pmd_devmap(pmd_t pmd) { diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 9e65694..30342b8 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1162,6 +1162,11 @@ static inline pmd_t pmd_mkdmem(pmd_t pmd) { return pmd; } + +static inline int pmd_special(pmd_t pmd) +{ + return 0; +} #endif #ifndef pmd_read_atomic From patchwork Mon Dec 7 11:31:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955431 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7DC8C4167B for ; Mon, 7 Dec 2020 11:34:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4685E233A0 for ; Mon, 7 Dec 2020 11:34:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4685E233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DA35A8D0012; Mon, 7 Dec 2020 06:34:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D51D48D0001; Mon, 7 Dec 2020 06:34:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1A568D0012; Mon, 7 Dec 2020 06:34:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0156.hostedemail.com [216.40.44.156]) by kanga.kvack.org (Postfix) with ESMTP id A897E8D0001 for ; Mon, 7 Dec 2020 06:34:31 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 63EF8180AD80F for ; Mon, 7 Dec 2020 11:34:31 +0000 (UTC) X-FDA: 77566278342.07.tank41_5d14b69273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 472121803F9A4 for ; Mon, 7 Dec 2020 11:34:31 +0000 (UTC) X-HE-Tag: tank41_5d14b69273de X-Filterd-Recvd-Size: 4821 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:30 +0000 (UTC) Received: by mail-pf1-f180.google.com with SMTP id d2so5721570pfq.5 for ; Mon, 07 Dec 2020 03:34:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=V0y+XXlhGalJMcAfPtOZoEyFt4J5WODB5ittFdj4EUc=; b=rWRkAEepoZ2AAXIS01SYEF9ocTxtc5cj/PFyxRiP37W7nq7Baq7lKLqQ1sovJAs//a d1zGmvUF7iz+BGuUZQV/x3/OWgGxZ4E7wXvRZkhRUY/MccWgFzrVjp+JaxSQpMhgj3S/ ETYvw6meqsHRfGqjL0+qIm9fYduavZsTZkx11ljgBhgD+PZktQFe4FiX8UwISy1teSLt x10aQxM2etDIKPUlcc+xzqN2U3igNRSiFUiFiZ7DWpXSC1epub1tWNPs1sKaUeVMftWu g19anHRd+XMhMmIs4xMVQKHWUmqjfinlCw5B+IgBh6gxXvlvHMT+PYd+rjI2vYOikyBZ Ym4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=V0y+XXlhGalJMcAfPtOZoEyFt4J5WODB5ittFdj4EUc=; b=rxJp305EeGVNPpQlGh1CSdOCFXETjkfzcRz6WHccWgPTDOg7FMmtzj2CJLoIICsvuH RJEVFserRy2UCEt4DDJlhuRZ0LV6IKzuNJ8L2zSYYK6P0FVexnkOv3tORfV/IMLyofH4 3/ecRxXvIhpGdV4xuwjyeGAsP1JO1Sa0viSWru7AgInrz/YyrIHKlQ4lUSx07bSteBSd kaKzDJ6hEcb5tJfPwHpVC5LyChkyZ+TUsxZovLGwcLkhoX/Idrfiu4Q/U97y+UNbcQRL JtY7o7SSJVy7W4e9xXrgFjjEDKDRCdlCKlPWBT1dFNQY4jtZ320BY/P/PNSA0WcpLil3 ba8A== X-Gm-Message-State: AOAM531LTiGWYkwvfBpfbr7+pbmN6SDQNLbGsoudhZf2A8BdSCW8BKmM YgLEiX6sZS5Z/nzjDRU6fxnYEy+EUTA= X-Google-Smtp-Source: ABdhPJxxb4HMiIpDPIafiIQODWbFpWWDR0MIcugLliMblGl9ynQK2cghFVJD2Y7Luh2Xq+P1vMYFgw== X-Received: by 2002:a63:a551:: with SMTP id r17mr2997461pgu.13.1607340869896; Mon, 07 Dec 2020 03:34:29 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.26 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:29 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 15/37] mm: add pmd_special() check for pmd_trans_huge_lock() Date: Mon, 7 Dec 2020 19:31:08 +0800 Message-Id: <789d8a9a23887c20e4966b6e1c9b52a320ab87af.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang As dmem-pmd had been distinguished from thp-pmd, we need to add pmd_special() such that pmd_trans_huge_lock could fetch ptl for dmem huge pmd and treat it as stable pmd. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- include/linux/huge_mm.h | 3 ++- mm/huge_memory.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0365aa9..2514b90 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -242,7 +242,8 @@ static inline int is_swap_pmd(pmd_t pmd) static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) { - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) + if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) + || pmd_devmap(*pmd) || pmd_special(*pmd)) return __pmd_trans_huge_lock(pmd, vma); else return NULL; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9474dbc..31f9e83 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1890,7 +1890,7 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) spinlock_t *ptl; ptl = pmd_lock(vma->vm_mm, pmd); if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || - pmd_devmap(*pmd))) + pmd_devmap(*pmd) || pmd_special(*pmd))) return ptl; spin_unlock(ptl); return NULL; From patchwork Mon Dec 7 11:31:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76FB5C0018C for ; Mon, 7 Dec 2020 11:34:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 105CA233A0 for ; Mon, 7 Dec 2020 11:34:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 105CA233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7123F8D0013; Mon, 7 Dec 2020 06:34:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C1FF8D0001; Mon, 7 Dec 2020 06:34:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D8BF8D0013; Mon, 7 Dec 2020 06:34:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40]) by kanga.kvack.org (Postfix) with ESMTP id 4586F8D0001 for ; Mon, 7 Dec 2020 06:34:35 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 092F1181AEF1F for ; Mon, 7 Dec 2020 11:34:35 +0000 (UTC) X-FDA: 77566278510.14.thing20_2b0595a273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id E07741822987B for ; Mon, 7 Dec 2020 11:34:34 +0000 (UTC) X-HE-Tag: thing20_2b0595a273de X-Filterd-Recvd-Size: 4510 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:34 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id d2so5721753pfq.5 for ; Mon, 07 Dec 2020 03:34:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TR3cC+AByvGn3JshaPXKc+3rxof2L0xMPbaiJBaNE0A=; b=PwcAOe4heCxF7XTuomjeUtwKezkdJmM0HmTX1yw4AeADa+3Z5jYH2gnncSNx6tF/dK nqc3OY3XklNb6fsWrMNV9SLhYauoxenHh7jF/LUEVNXitOpU59OZoN9JC+QCR46jU/Ns sBCj+XDvRx916ayOQcnxhHqZ/yh9ih/c9sVknDZkZs/LBEO/v5jpO2wZ1oEm3FfjX0hE TbWTBt9N8W/HGCkWK2IvBBhkklXYtf5Ke1t4gjU8zkkB1O5YL8vGHwUoJwXrPE3CU4Q8 1EfVWBeVZsU6g44z81Jf/b0p33RSG/Hwm+qr6c5NqBuzkeVYX26s7OBZzLc31WX/Gbiy jVjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TR3cC+AByvGn3JshaPXKc+3rxof2L0xMPbaiJBaNE0A=; b=uTlN5dpBDL0lRpZUCWXib6aebQA04sxBBLkwMJe+Jc5twQUFkZW5JBVfRJQRdNxZRv xtpn0fJsyKMDi2a19AD4ZfHMUOfkfT56Sq0uFLzVoI0kyJBRdXi+CgdX/RStoZgcP0+I 73rzE/f5f0ysAg+CUFz8CRcQiDpy5Gpe7CMxzWIReqCtpuQBg3ygRjp65+JiRWG3Tpc3 dcXmWp7NUQ+iluei/oCMH2zGsFmGyxBiTOOMelIMfAFh0GNofCibd/1FKqh+0Vqmsmh4 VfO2Vl/cNZ5zf4ZLeVq0mQct1gy97wFCV4Bhzf+1ZMHIz7mJ3qU2zSUPMemWAr1GwD5w BDLg== X-Gm-Message-State: AOAM5305UPRc8d0qW/5dgzf/YBETzZlCw6dusbwPTiWpHbNBVtoiLlZN DjRO4+rZdsSww+fzwV6TmyBjva0O6RM= X-Google-Smtp-Source: ABdhPJwrnzftH6wwtqZGjpZb2mr88aLorhMPJcvx0VqVsyAFRqovNa0wPGz47Osa45K8P+P4zlb4ww== X-Received: by 2002:a63:e20:: with SMTP id d32mr18339042pgl.94.1607340873505; Mon, 07 Dec 2020 03:34:33 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.30 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:32 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 16/37] dmemfs: introduce ->split() to dmemfs_vm_ops Date: Mon, 7 Dec 2020 19:31:09 +0800 Message-Id: <6b3c166a8d5827a1f6f2a33d85feae1c1633a45d.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang It is required by __split_vma() to adjust vma. munmap() which create hole unaligned to pagesize in dmemfs-mapping should be forbidden. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- fs/dmemfs/inode.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c index 443f2e1..ab6a492 100644 --- a/fs/dmemfs/inode.c +++ b/fs/dmemfs/inode.c @@ -450,6 +450,13 @@ static bool check_vma_access(struct vm_area_struct *vma, int write) return len; } +static int dmemfs_split(struct vm_area_struct *vma, unsigned long addr) +{ + if (addr & (dmem_page_size(file_inode(vma->vm_file)) - 1)) + return -EINVAL; + return 0; +} + static vm_fault_t dmemfs_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; @@ -484,6 +491,7 @@ static unsigned long dmemfs_pagesize(struct vm_area_struct *vma) } static const struct vm_operations_struct dmemfs_vm_ops = { + .split = dmemfs_split, .fault = dmemfs_fault, .pagesize = dmemfs_pagesize, .access = dmemfs_access_dmem, From patchwork Mon Dec 7 11:31:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955435 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBD40C433FE for ; Mon, 7 Dec 2020 11:34:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 54916233A0 for ; Mon, 7 Dec 2020 11:34:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 54916233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E8C448D0006; Mon, 7 Dec 2020 06:34:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E639B8D0001; Mon, 7 Dec 2020 06:34:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7BEF8D0006; Mon, 7 Dec 2020 06:34:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id C3D378D0001 for ; Mon, 7 Dec 2020 06:34:38 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 876B61EF1 for ; Mon, 7 Dec 2020 11:34:38 +0000 (UTC) X-FDA: 77566278636.14.wing37_5e14420273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 5E9FE1822987A for ; Mon, 7 Dec 2020 11:34:38 +0000 (UTC) X-HE-Tag: wing37_5e14420273de X-Filterd-Recvd-Size: 4956 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:37 +0000 (UTC) Received: by mail-pf1-f178.google.com with SMTP id i3so6028704pfd.6 for ; Mon, 07 Dec 2020 03:34:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=L3L3i0S6/nYZYxaW9mSnagt8eN+myZy0Uq4CS77lTAc=; b=M9bgqlb46kjnGVIXaRr/XrGDP6bC/rLRPYv7m3v/0Tiu7JDV84U0SZvFNZ7h9g0mEN bBxITDw2c/WSRAvLCXDehd3vLLN3xOZRGtx8xAdqFPlUtgf0NDSnl/cm4DQU6HYdhqnj XIE3zBNV6Shsi00GD7rhQ9OGpYBEFRA6EJNcBupMSM4kBJ3vqmJbFJ17uokCJLONmFvj j4aheVme/XXOMbJUFyX8F9Y/uo/0tsBnYn38LCXo3jZ2xdS+c3YRz1/eIOr1BvC5G/96 NWHaYdsbBaoLmjFo6L3oHM36BEKSsWUTu6v9jm+USREYts6euWqrB7FGFAV22r22MwIS C6Gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=L3L3i0S6/nYZYxaW9mSnagt8eN+myZy0Uq4CS77lTAc=; b=gQ1AqdPDWMVkOLwEcufXWmLRsiR2nMQmn9Wxsmo/jgbda65YLzY0zXsjOJj9UfAjrU e5XMStm05zDjcXzoBxQb/LvkoPfN9WX9bAnJqqtkmamGO9QP387uHWYU6/1ZYamBhgpO ZME6gvnkNopnj+WvKQ8ifXbZDTXIvYD0PJprrCYS1HOwY6Wy8IK3ezDOzyoddQDCcdF6 d4+WfZ3pAj+dDGP5saWjxq7v8K7OP9IwjokYkOUN3HEZ+t9QfQl6PniT6oE74DMTov5L eWA5ZXns8R2aMP1kyUhR5Fnz4jL0ir6IJ/p3prkoci7JmwT4GIS8IiFPxrIOJb3RIjhG wcEQ== X-Gm-Message-State: AOAM533JG5v7riI+JoLf0tvHhGMHoeCedrRVJBwgCWLbavykpkZXdBxK RIBBMSWJC9V6FPD11rF8nsDZfNeCH2Y= X-Google-Smtp-Source: ABdhPJxAoj+fbea/saUCeXAmigtbj1zZV+shVlXg+5snAfgXq3xCMvRxVvmTAbjuFi907z6lXAIqyA== X-Received: by 2002:a63:494f:: with SMTP id y15mr17974005pgk.364.1607340877090; Mon, 07 Dec 2020 03:34:37 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.33 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:36 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 17/37] mm, dmemfs: support unmap_page_range() for dmemfs pmd Date: Mon, 7 Dec 2020 19:31:10 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang It is required by munmap() for dmemfs mapping. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- mm/huge_memory.c | 2 ++ mm/memory.c | 8 +++++--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 31f9e83..2a818ec 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1664,6 +1664,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, spin_unlock(ptl); if (is_huge_zero_pmd(orig_pmd)) tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE); + } else if (pmd_special(orig_pmd)) { + spin_unlock(ptl); } else if (is_huge_zero_pmd(orig_pmd)) { zap_deposited_table(tlb->mm, pmd); spin_unlock(ptl); diff --git a/mm/memory.c b/mm/memory.c index c48f8df..6b60981 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1338,10 +1338,12 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { - if (next - addr != HPAGE_PMD_SIZE) + if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || + pmd_devmap(*pmd) || pmd_special(*pmd)) { + if (next - addr != HPAGE_PMD_SIZE) { + VM_BUG_ON(pmd_special(*pmd)); __split_huge_pmd(vma, pmd, addr, false, NULL); - else if (zap_huge_pmd(tlb, vma, pmd, addr)) + } else if (zap_huge_pmd(tlb, vma, pmd, addr)) goto next; /* fall through */ } From patchwork Mon Dec 7 11:31:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD4CEC4361B for ; Mon, 7 Dec 2020 11:34:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5D23C233A0 for ; Mon, 7 Dec 2020 11:34:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D23C233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E92E18D0007; Mon, 7 Dec 2020 06:34:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E3CD78D0001; Mon, 7 Dec 2020 06:34:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2C3A8D0007; Mon, 7 Dec 2020 06:34:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0047.hostedemail.com [216.40.44.47]) by kanga.kvack.org (Postfix) with ESMTP id BD6F98D0001 for ; Mon, 7 Dec 2020 06:34:42 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 869F133CD for ; Mon, 7 Dec 2020 11:34:42 +0000 (UTC) X-FDA: 77566278804.11.sky23_35086c8273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 570E5180F8B81 for ; Mon, 7 Dec 2020 11:34:42 +0000 (UTC) X-HE-Tag: sky23_35086c8273de X-Filterd-Recvd-Size: 5677 Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:41 +0000 (UTC) Received: by mail-pf1-f194.google.com with SMTP id c79so9616909pfc.2 for ; Mon, 07 Dec 2020 03:34:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=b/DFMWcoqrBGhrdQgSqqomqzF7zS5T+kFvHPOchokBk=; b=R9NM+LrICITDCSs61NAJazJcQQLLZOkAyWgF91yroQ30cXHAJppkpODFQHCiPkKmQn Di7C72F9xbWBRqscYUPkTutsiEPfArp2uJQhUJMFrFwCQvS74l6dGAFEUlRjW/Ws/WLb i25GvziBXB93UcCddRs/vcJrzZtwz/5Tc+/VbAAIFz9dRUGeAp5DTeSbJkdcv7XB2Gwx 3sN5hKnAOKf+OmoM35ez+biBZKuU9mmCC3Z0vH8q0qPs6i2+ilvJACrd43BNKlUYdfn+ zXXNnE5OgQCQ2GjXJLLtVka3prnPTyUz7X2Y7d5VNYlzQNmiGKs5m0h9G1bEKMENRYYK JILg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=b/DFMWcoqrBGhrdQgSqqomqzF7zS5T+kFvHPOchokBk=; b=VKiLel9kySlZb102nqgn6OsEH8hPg4pNZYyLOaRpGbprMlRFMXEIxN6s+bwPZ529U2 9QxomFuD4BLIWM3eHQPjvwNMVAsdOz6rd2feh5+nART1am2zfjf9aPP0LJ6zhd4jUfMC GFJnOG/9XACZGs4/6L8cwlHyBvEtUimhatxZlf1cZ/ETCxEKFTFW1um9BZ0bGgDM5x9c abfX+BOnn1GpZNVO8b7PwqfXE0OhflgdCAq8xPicbAAzJzvmhQoONSmOYcSxVg7rzWNE N/ouPwluuJ9BrX+kfdUkzWp3RG0xBT6zdHYgHibfu5Y0e4bQ7Z7qIDTX3R8N5MABehYu 8K9Q== X-Gm-Message-State: AOAM5322sra7yCXIc0slrX84rQbLBxqx4dMU7FLaidmalZYLVqsmDbEY 4FZx8WkVq+mnZ4HZo55QT0tPgSyxqCA= X-Google-Smtp-Source: ABdhPJwpLJSMFHHVuzFaNJtee4l+tKxhbMqfSGyZXpTjn8Xdsw0wr9LXWXlJyzhmY0qsdp5ekgOemg== X-Received: by 2002:aa7:928c:0:b029:19a:de9d:fb11 with SMTP id j12-20020aa7928c0000b029019ade9dfb11mr15718593pfa.21.1607340880960; Mon, 07 Dec 2020 03:34:40 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:39 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 18/37] mm: follow_pmd_mask() for dmem huge pmd Date: Mon, 7 Dec 2020 19:31:11 +0800 Message-Id: <1401155e1db8221b892fb935204ad2d358c2808f.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang While follow_pmd_mask(), dmem huge pmd should be recognized and return error pointer of '-EEXIST' to indicate that proper page table entry exists in pmd special but no corresponding struct page, because dmem page means non struct page backend. We update pmd if foll_flags takes FOLL_TOUCH. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- mm/gup.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index 98eb8e6..ad1aede 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -387,6 +387,42 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, return -EEXIST; } +static struct page * +follow_special_pmd(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmd, unsigned int flags) +{ + spinlock_t *ptl; + + if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd)) + /* Avoid special (like zero) pages in core dumps */ + return ERR_PTR(-EFAULT); + + /* No page to get reference */ + if (flags & FOLL_GET) + return ERR_PTR(-EFAULT); + + if (flags & FOLL_TOUCH) { + pmd_t _pmd; + + ptl = pmd_lock(vma->vm_mm, pmd); + if (!pmd_special(*pmd)) { + spin_unlock(ptl); + return NULL; + } + _pmd = pmd_mkyoung(*pmd); + if (flags & FOLL_WRITE) + _pmd = pmd_mkdirty(_pmd); + if (pmdp_set_access_flags(vma, address & HPAGE_PMD_MASK, + pmd, _pmd, + flags & FOLL_WRITE)) + update_mmu_cache_pmd(vma, address, pmd); + spin_unlock(ptl); + } + + /* Proper page table entry exists, but no corresponding struct page */ + return ERR_PTR(-EEXIST); +} + /* * FOLL_FORCE can write to even unwritable pte's, but only * after we've gone through a COW cycle and they are dirty. @@ -571,6 +607,12 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return page; return no_page_table(vma, flags); } + if (pmd_special(*pmd)) { + page = follow_special_pmd(vma, address, pmd, flags); + if (page) + return page; + return no_page_table(vma, flags); + } if (is_hugepd(__hugepd(pmd_val(pmdval)))) { page = follow_huge_pd(vma, address, __hugepd(pmd_val(pmdval)), flags, From patchwork Mon Dec 7 11:31:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33EE6C433FE for ; Mon, 7 Dec 2020 11:34:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CE5B0233A0 for ; Mon, 7 Dec 2020 11:34:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE5B0233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 69E0B8D0014; Mon, 7 Dec 2020 06:34:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 674518D0001; Mon, 7 Dec 2020 06:34:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 589FF8D0014; Mon, 7 Dec 2020 06:34:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0225.hostedemail.com [216.40.44.225]) by kanga.kvack.org (Postfix) with ESMTP id 42DBF8D0001 for ; Mon, 7 Dec 2020 06:34:46 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 08832181AEF1F for ; Mon, 7 Dec 2020 11:34:46 +0000 (UTC) X-FDA: 77566278972.04.waste73_4117019273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id E0ACB8000422 for ; Mon, 7 Dec 2020 11:34:45 +0000 (UTC) X-HE-Tag: waste73_4117019273de X-Filterd-Recvd-Size: 5152 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:45 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id q22so9591430pfk.12 for ; Mon, 07 Dec 2020 03:34:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PsVgOpQIgSKKX9sKyFMtyV6DCXy6X7BZeXZh0AyMkHE=; b=QMmaSsp/kPkYAOoJQsM9EoXqVH9lanjFe06YoMyS63YE92kVOJCcMfFXgEA7yjn87P y28+hU0OuDWvBX7SY2rxnQlWNrE6D+KzZK9M4ko1y+IRrfeYC8ZoXYQ5j/FHutEdRnUA CFnWcAyRmrIXQyQXqs6RE+G5CKAhGupvDHeYaQpkizYK/LR7w2A0MDaQ4i13inwSb1f4 eMV/GFBnZ9S+DZABkEr6AOMkeB8HeEP6uKXvczuBshAKSC1DzmYlsaa6DQXz6YZzMVS5 wqQYorLHoqsJF9fPQ4cEzpX39pO4YzMWouHhku6ju81R+o5U+fSNpP/dEc35m36/1jNg h5Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PsVgOpQIgSKKX9sKyFMtyV6DCXy6X7BZeXZh0AyMkHE=; b=VetbjZpvhgF7pllGYICaLELQzWXPv7Zpa/U5wwMrVouLiZxRCn7P69/Gu2aXQESVpH cuLOAqpAsM3Tln0WKlBYnvHzO2Vb3tqOtz0iiy18G1oADSezBoz+mUc6f93WmVlLtq2e tv53s4E4WNlG6emROrDTioaTQ03eq9YwvHcMTpGg2N6spgqJ54gva58ojfWSfhDuUVc8 3Uc3OYOboXxvqIAr8II03K1L3arYpM8niilpQ6k4mr6J8hAK3t8wV3KJgBEmQkGd16k7 AA3GiUCVR0fcND6ULWSzqq512EHbmZQf+MTxkGKQ4k3mGQZ84Mb7bf8g9bnktLFFIA0n jgjA== X-Gm-Message-State: AOAM533TE3AL9iChAn2JuKFgvpEyZXrRArrcoQLxPawwYRTvPUqFQ1Dc xoRRTyeQ5nORM8ksWyKgab3S1ohNPS8= X-Google-Smtp-Source: ABdhPJynrUsQxj48xNiMU/huEKvcNjt/ObGmP+cORAqqJJ7WrkW6MqDOQEZH+sv1gpYZpKvKfsdZ0g== X-Received: by 2002:a62:6d06:0:b029:19d:9728:2b71 with SMTP id i6-20020a626d060000b029019d97282b71mr15195431pfc.69.1607340884475; Mon, 07 Dec 2020 03:34:44 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.41 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:44 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 19/37] mm: gup_huge_pmd() for dmem huge pmd Date: Mon, 7 Dec 2020 19:31:12 +0800 Message-Id: <1a8eaaf72af4bd98c8fa1a90d36a64612f7c14b0.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Add pmd_special() check in gup_huge_pmd() to support dmem huge pmd. GUP will return zero if enconter dmem page, and we could handle it outside GUP routine. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- mm/gup.c | 6 +++++- mm/pagewalk.c | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index ad1aede..47c8197 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2470,6 +2470,10 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, if (!pmd_access_permitted(orig, flags & FOLL_WRITE)) return 0; + /* Bypass dmem huge pmd. It will be handled in outside routine. */ + if (pmd_special(orig)) + return 0; + if (pmd_devmap(orig)) { if (unlikely(flags & FOLL_LONGTERM)) return 0; @@ -2572,7 +2576,7 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo return 0; if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) || - pmd_devmap(pmd))) { + pmd_devmap(pmd) || pmd_special(pmd))) { /* * NUMA hinting faults need to be handled in the GUP * slowpath for accounting purposes and so that they diff --git a/mm/pagewalk.c b/mm/pagewalk.c index e81640d..e7c4575 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -71,7 +71,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, do { again: next = pmd_addr_end(addr, end); - if (pmd_none(*pmd) || (!walk->vma && !walk->no_vma)) { + if (pmd_none(*pmd) || (!walk->vma && !walk->no_vma) || pmd_special(*pmd)) { if (ops->pte_hole) err = ops->pte_hole(addr, next, depth, walk); if (err) From patchwork Mon Dec 7 11:31:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E36F1C4361B for ; Mon, 7 Dec 2020 11:34:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 711C0233A0 for ; Mon, 7 Dec 2020 11:34:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 711C0233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F2E718D0008; Mon, 7 Dec 2020 06:34:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EB1648D0001; Mon, 7 Dec 2020 06:34:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D99548D0008; Mon, 7 Dec 2020 06:34:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id C27068D0001 for ; Mon, 7 Dec 2020 06:34:49 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 872E9181AEF1F for ; Mon, 7 Dec 2020 11:34:49 +0000 (UTC) X-FDA: 77566279098.21.beast49_2512160273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 64C1A180442C3 for ; Mon, 7 Dec 2020 11:34:49 +0000 (UTC) X-HE-Tag: beast49_2512160273de X-Filterd-Recvd-Size: 4759 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:48 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id t7so9610097pfh.7 for ; Mon, 07 Dec 2020 03:34:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=irx3d3NJeQeYfZPwYnMd1dXXT7HLdVgOj1onbLngswY=; b=eJYAcqAfXe8LOuaHKdKllKlDVNqxancjE8M5qAXrhETqb6j0v9x0H6NmUSzw8StI14 J2rM/AstiKHgOTM5AVfKtlpjtk+Fy+75hlmDPxd9jG4tEq1SyKQGWIYTm7uq/2uxcFlg W65vIQCLcSGlLabBaPbAuQhsHwJbWDWT8I5hWT/k/rJ7sxp6PR9YsdF4n62gC431AOpJ aIaVvanL5902eANwLRH1qq5hk7Snj3uV5PeZ7O6x1IM11CDxzRxpQlAinRhXhnHtReTC Trs4VME+idkpV6820spwytsexIK78pl4wM0D44rVyvONHPP9rysLFUxoF40Jm3dYeJDg jyOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=irx3d3NJeQeYfZPwYnMd1dXXT7HLdVgOj1onbLngswY=; b=J0znN4goyemOVzImptQe0xKG0ukSxXY2OF+I0S+U4hvpuHjbDZgWRtjCQu3RzDw3ao qhxf6b3QaQSMIuAHAjJJm6H0yD/gr6UuoZMcRYe4mh1OVMJ6vjc7BcI/WtmNeNOmK8ly KGRZRn47miU1DPu7Z07riuZ1psvrZrm14BDXAkn+McJYFE0/MG5e3syNeHli3gizEmVv V6qKlG5TYGXk3YezHl1C1uo3Eu59qxm45T60yiwXF9zq9w+7I2okTCyMLHmxfmqVYQd+ C+WPMjtfPhRcIHv4YxJ+E7lu/N3ZiSKV3SsKtmWwq8ILR/4v1TvDeKuM3ZC+zvmymx4S eYaA== X-Gm-Message-State: AOAM533+UJjPY4che7Chlvp7x4gaLnVZK97oLKWguvIpxAa/bxY+evtG QFUqB5f2YRXzIXl194RbLoIOW+Lijbc= X-Google-Smtp-Source: ABdhPJy7T+zOWSaaaUAy9bCSHgKA4JkL2mqiQ0XfQp/lTQid14KX+alS2W1IiCRqphNjeXsYneYbLA== X-Received: by 2002:a17:902:7b97:b029:d8:ec6e:5c28 with SMTP id w23-20020a1709027b97b02900d8ec6e5c28mr15836888pll.40.1607340887900; Mon, 07 Dec 2020 03:34:47 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.44 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:47 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 20/37] mm: support dmem huge pmd for vmf_insert_pfn_pmd() Date: Mon, 7 Dec 2020 19:31:13 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.005902, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Since vmf_insert_pfn_pmd will BUG_ON non-pmd-devmap, we make pfn dmem pass the check. Dmem huge pmd will be marked with _PAGE_SPECIAL and _PAGE_DMEM, so that follow_pfn() could recognize it. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- mm/huge_memory.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2a818ec..6e52d57 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -781,6 +781,8 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pmd_mkdevmap(entry); + else if (pfn_t_dmem(pfn)) + entry = pmd_mkdmem(entry); if (write) { entry = pmd_mkyoung(pmd_mkdirty(entry)); entry = maybe_pmd_mkwrite(entry, vma); @@ -827,7 +829,7 @@ vm_fault_t vmf_insert_pfn_pmd_prot(struct vm_fault *vmf, pfn_t pfn, * can't support a 'special' bit. */ BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) && - !pfn_t_devmap(pfn)); + !pfn_t_devmap(pfn) && !pfn_t_dmem(pfn)); BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) == (VM_PFNMAP|VM_MIXEDMAP)); BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); From patchwork Mon Dec 7 11:31:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 253A2C4361B for ; Mon, 7 Dec 2020 11:34:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A9F6623340 for ; Mon, 7 Dec 2020 11:34:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A9F6623340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 340ED8D0009; Mon, 7 Dec 2020 06:34:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F1438D0001; Mon, 7 Dec 2020 06:34:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 207C28D0009; Mon, 7 Dec 2020 06:34:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id 0B29C8D0001 for ; Mon, 7 Dec 2020 06:34:53 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C752C33CD for ; Mon, 7 Dec 2020 11:34:52 +0000 (UTC) X-FDA: 77566279224.02.nest17_580be68273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id A479410097AA0 for ; Mon, 7 Dec 2020 11:34:52 +0000 (UTC) X-HE-Tag: nest17_580be68273de X-Filterd-Recvd-Size: 4504 Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:52 +0000 (UTC) Received: by mail-pg1-f181.google.com with SMTP id f17so8655014pge.6 for ; Mon, 07 Dec 2020 03:34:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=DaKSoEd8oO/EoBIYBO4UbeyIy7f7hbBSPRHqMQ3AnvY=; b=PzhKbqA5fZx1L/DdeOWbYK9XMQgh6O7ZLstwldRGSShPdyufij9rEkyE6yyPtqiQFR DvWZPV6HXWx47aXdj0c5QGrJYx5jyAb/hJu6ZyCLoid/KOSX4lnYuS1fkyC5qERNn/or fi52i7cA0Ydltn8kGuWGFLJllgTdoSaKxXa27xZA7ACXlEmO/koMkREMuXdB83dzkDs3 bplK/YxucWjx5cD+h5V/dR0hD0kyXDZPv5eYvSZXBUPl8fSplms6k6vGbok9ayTpD/3z aXsvfT7CwDAq3W8zodGBCvxkyjKmS/58WaE/6abU2h1kr6g7+ePfVSLGuuKFpsJcEC4S +BmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DaKSoEd8oO/EoBIYBO4UbeyIy7f7hbBSPRHqMQ3AnvY=; b=PeNpwHNIcKV/QhQcGQDbIxf8nPH58jNUz5TJom1hRfbNsILPNJwKehyL7ZMznAU5tt BLtASx4qTyxvCH7+9OrYOXwBHb1KWiVEVdfu3y/upj2mM8crmsUBcThr9rwc1vsLAr9R /bbj2eqqvo6g68XtYavlVjg25+nAvT8+474svJXEEZC7Qzi8Sw2u4UyCiCAQYw73ldTc ozNgk54TuwbR2OHjg663eorCDZiG8Z9d97JbTqr8hy+LWDmHcg/89J/PUBEC5TZyQ840 cIcZhDGibG2q0rKk1YiGvYBMygfv1bbSzz1iyC2bQ9fijjUfJEzHgJex5vtn2fkSMGkR E2PA== X-Gm-Message-State: AOAM530zB0K78B5e0xTsbrTdVNkY4XyI90ehz3w1fDmQPrSfB9rbQtKP ER2d1SWURY+1BSN/SXCT7dkGc2AUAA8= X-Google-Smtp-Source: ABdhPJyqqW0fQyetyIn3XyHRcLkK2zOeQJNTeAqGI4VkfBmrNDEhNpiPMRzTChGwx86K9TucHV5VBg== X-Received: by 2002:a17:902:7144:b029:da:7268:d730 with SMTP id u4-20020a1709027144b02900da7268d730mr15412723plm.20.1607340891184; Mon, 07 Dec 2020 03:34:51 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:50 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 21/37] mm: support dmem huge pmd for follow_pfn() Date: Mon, 7 Dec 2020 19:31:14 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang follow_pfn() will get pfn of pmd if huge pmd is encountered. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- mm/memory.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 6b60981..abb9148 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4807,15 +4807,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, int ret = -EINVAL; spinlock_t *ptl; pte_t *ptep; + pmd_t *pmdp = NULL; if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) return ret; - ret = follow_pte(vma->vm_mm, address, &ptep, &ptl); + ret = follow_pte_pmd(vma->vm_mm, address, NULL, &ptep, &pmdp, &ptl); if (ret) return ret; - *pfn = pte_pfn(*ptep); - pte_unmap_unlock(ptep, ptl); + + if (pmdp) { + *pfn = pmd_pfn(*pmdp) + ((address & ~PMD_MASK) >> PAGE_SHIFT); + spin_unlock(ptl); + } else { + *pfn = pte_pfn(*ptep); + pte_unmap_unlock(ptep, ptl); + } + return 0; } EXPORT_SYMBOL(follow_pfn); From patchwork Mon Dec 7 11:31:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94A45C433FE for ; Mon, 7 Dec 2020 11:34:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2611723340 for ; Mon, 7 Dec 2020 11:34:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2611723340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ABB988D0015; Mon, 7 Dec 2020 06:34:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A6D6E8D0001; Mon, 7 Dec 2020 06:34:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 981798D0015; Mon, 7 Dec 2020 06:34:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0175.hostedemail.com [216.40.44.175]) by kanga.kvack.org (Postfix) with ESMTP id 81C9B8D0001 for ; Mon, 7 Dec 2020 06:34:56 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 53C208249980 for ; Mon, 7 Dec 2020 11:34:56 +0000 (UTC) X-FDA: 77566279392.20.pig76_1f03bdb273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 362C5180C0609 for ; Mon, 7 Dec 2020 11:34:56 +0000 (UTC) X-HE-Tag: pig76_1f03bdb273de X-Filterd-Recvd-Size: 5107 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:55 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id i3so6029677pfd.6 for ; Mon, 07 Dec 2020 03:34:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cfW0rkOgMIuxNIUMjXUU1M20WeHJvF5wHx5SDbaa7WE=; b=tAKAgseG2CCONQ+JBLfVPuz/DEVm6e6uJwCvBS1RREKrlZrlOUdzYsuLEUhueHfOFF uFV+57rP3yHKObWXA15tn3JY9Y2Mp7P0iPIKN8JWG9DeMFVnD03wWM3MnRRZiJQT6AF2 WR2vxkVxf0MNXxhaYUqnCBkgNMZXajnPprZpauiW0v4cZtIPknD9A9IWTbwA3N/nERtu oiMqPUyiyheuzpFUWKvr/tZb1S1TJE2PoPMHS+zlAU086PYQuSRhzWuPFYiB7b1Acfcl yxBbTTVnn5ROFZOGNlweU2k6Hu7UN+pZY6i4XQztkumh8Gv00oB9pvKIHKFEHgAYZYi/ 4JrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cfW0rkOgMIuxNIUMjXUU1M20WeHJvF5wHx5SDbaa7WE=; b=Eka/OnV0MgWE6COGv5rHlyVe6DwKE+7fI3X9ulq4f2GAVG2UthMoX4FZDxGrWQusr5 r9eMmckaynEM+DAA4MccvE4zL9J2ffI3Fzrt7tZuVjhitvB7lyZKnt5320Vj3ldeGf1F fwLdALbnmrF2FpBO99E3uOQHTEFszNYHnOUDHLgVt1UP4j2AeAnV1bdhLyIFv/2GsBiY BtZdaPJweHawwW9eQnzYrbPnmIzlvEXubcFZq5fyNkLEQojlYw/UowC30cM1OqQXuYpq 95yY14Bw+JOuxswdeu8KXRgvxk0o3HSTy8bB1RV3m103ZWp0TCjwqAgjiBfYyx7zLkhB ZK6Q== X-Gm-Message-State: AOAM532tLLlAAH4MrbLaPhzNzFm0Q1sVX/WyTHe/u6cShnxpvydu0ZXx +VrY3pWpYz7nAtWe11pu9PbUOMS7nPc= X-Google-Smtp-Source: ABdhPJwFz1981DhHI3jR9VGpOCOSaq+Ig5CXILTKJ1ERZg74A7VBmwCCkjGQ9zkn+U6XHw4ztmn59g== X-Received: by 2002:a17:902:5581:b029:da:a817:1753 with SMTP id g1-20020a1709025581b02900daa8171753mr15580441pli.76.1607340894707; Mon, 07 Dec 2020 03:34:54 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:54 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 22/37] kvm, x86: Distinguish dmemfs page from mmio page Date: Mon, 7 Dec 2020 19:31:15 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Dmem page is pfn invalid but not mmio, introduce API is_dmem_pfn() to distinguish that. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- arch/x86/kvm/mmu/mmu.c | 1 + include/linux/dmem.h | 7 +++++++ mm/dmem.c | 7 +++++++ 3 files changed, 15 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5bb1939..394508f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #include diff --git a/include/linux/dmem.h b/include/linux/dmem.h index 8682d63..59d3ef14 100644 --- a/include/linux/dmem.h +++ b/include/linux/dmem.h @@ -19,11 +19,18 @@ unsigned int try_max, unsigned int *result_nr); void dmem_free_pages(phys_addr_t addr, unsigned int dpages_nr); +bool is_dmem_pfn(unsigned long pfn); #define dmem_free_page(addr) dmem_free_pages(addr, 1) #else static inline int dmem_reserve_init(void) { return 0; } + +static inline bool is_dmem_pfn(unsigned long pfn) +{ + return 0; +} + #endif #endif /* _LINUX_DMEM_H */ diff --git a/mm/dmem.c b/mm/dmem.c index 2e61dbd..eb6df70 100644 --- a/mm/dmem.c +++ b/mm/dmem.c @@ -972,3 +972,10 @@ void dmem_free_pages(phys_addr_t addr, unsigned int dpages_nr) } EXPORT_SYMBOL(dmem_free_pages); +bool is_dmem_pfn(unsigned long pfn) +{ + struct dmem_node *dnode; + + return !!find_dmem_region(__pfn_to_phys(pfn), &dnode); +} +EXPORT_SYMBOL(is_dmem_pfn); From patchwork Mon Dec 7 11:31:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 802B0C4361B for ; Mon, 7 Dec 2020 11:35:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F2EB723340 for ; Mon, 7 Dec 2020 11:35:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F2EB723340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7D5AA8D0016; Mon, 7 Dec 2020 06:35:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7AC798D0001; Mon, 7 Dec 2020 06:35:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 69B828D0016; Mon, 7 Dec 2020 06:35:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0135.hostedemail.com [216.40.44.135]) by kanga.kvack.org (Postfix) with ESMTP id 551BA8D0001 for ; Mon, 7 Dec 2020 06:35:00 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 20BE5824999B for ; Mon, 7 Dec 2020 11:35:00 +0000 (UTC) X-FDA: 77566279560.28.pen00_53168ad273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id EE2776D64 for ; Mon, 7 Dec 2020 11:34:59 +0000 (UTC) X-HE-Tag: pen00_53168ad273de X-Filterd-Recvd-Size: 7864 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:34:59 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id t8so9608587pfg.8 for ; Mon, 07 Dec 2020 03:34:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=3Q5MS2UXjUQqUjTpuPXnY5Lc/xhe4Tibwul6rnRwWL4=; b=nW7tHnXavpC5taczb2SrIKzWwWUMvoY4U2MTkN3vDPPis+BUGXKaGqPDCsIxEfquXK 6tmf6cEkdZTLCJdDXSh6MnNQCPBTV6kFB9HnhWkc91ozY0AAtLOOKkEMOqDBzYi4fqmn vakqjy46p1cUwbWXP/xi7QIAw9YzTSRnmgmnJrHoKXeAFo3V1L5iM3PHOBMX46KN5v3Z 34HgCx9eDmAlKhggnpFYM8ACTmm6p267BNhSJCIFO9+Der0ooC1Q9+xIQNrzo5WPJwHE 9owlGXBvga1yQ91B0OHtPxgNUCzR6oEuiJ6+Tvv4ZAklNhM3orxx6h4kzT/nmxGA3nGL lpuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3Q5MS2UXjUQqUjTpuPXnY5Lc/xhe4Tibwul6rnRwWL4=; b=f7spxUudkbXvHZZJBiDcD04rKe+kqCwWtd8ciGqjhxnM0LMeTO4UNiZONlfAwViwnu DuspnQ9A0VXzMCmgr4ljZaU5Jr06DHI5doXGXAc1cBICKG52vcjGrSGcx8bNHUCwcOOZ ji/EitWXkfSGvyTVNq15R/H5P1tj9HtmmgIkAYs6X3ZrUoINGEyEJ+CPq4Kzthdkm4V2 76aZ1ETUZYYBUYWmf8spYHp9l20Dc9QZXXOcIH5+8AwisRT1Qk+2Ky69N4ZvGnXznXOp IM1ttBg2lju7yBNq2Fc3EzN0If/QUcBhuZv35/2nUwdcB7dqTMMW+ENECj4CRHu9GhL9 VTtA== X-Gm-Message-State: AOAM5319rGlzJSYH7CLSMDiq5UMcM5mGZEVBlGOUjMlEFZYp80GrzpDh GoxxOZcHDGHnBOwjup8GCvNBSdY4NDI= X-Google-Smtp-Source: ABdhPJzvNvnKPK0JB4i5KGpQCvE8rYhnqcuhj31K/F7yMqFfiJ9DwwOo6y8xk/RvjaqCiew/ogf9Kg== X-Received: by 2002:a62:9205:0:b029:19d:bab0:ba17 with SMTP id o5-20020a6292050000b029019dbab0ba17mr15435970pfd.37.1607340898388; Mon, 07 Dec 2020 03:34:58 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.54 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:34:57 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 23/37] kvm, x86: introduce VM_DMEM for syscall support usage Date: Mon, 7 Dec 2020 19:31:16 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Currently dmemfs do not support memory readonly, so change_protection() will be disabled for dmemfs vma. Since vma->vm_flags could be changed to new flag in mprotect_fixup(), so we introduce a new vma flag VM_DMEM and check this flag in mprotect_fixup() to avoid changing vma->vm_flags. We also check it in vma_to_resize() to disable mremap() for dmemfs vma. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- fs/dmemfs/inode.c | 2 +- include/linux/mm.h | 7 +++++++ mm/gup.c | 7 +++++-- mm/mincore.c | 8 ++++++-- mm/mprotect.c | 5 ++++- mm/mremap.c | 3 +++ 6 files changed, 26 insertions(+), 6 deletions(-) diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c index ab6a492..b165bd3 100644 --- a/fs/dmemfs/inode.c +++ b/fs/dmemfs/inode.c @@ -507,7 +507,7 @@ int dmemfs_file_mmap(struct file *file, struct vm_area_struct *vma) if (!(vma->vm_flags & VM_SHARED)) return -EINVAL; - vma->vm_flags |= VM_PFNMAP; + vma->vm_flags |= VM_PFNMAP | VM_DMEM | VM_IO; file_accessed(file); vma->vm_ops = &dmemfs_vm_ops; diff --git a/include/linux/mm.h b/include/linux/mm.h index db6ae4d..2f3135fe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -311,6 +311,8 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ +#define VM_DMEM BIT(38) /* Dmem page VM */ + #ifdef CONFIG_ARCH_HAS_PKEYS # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 # define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ @@ -666,6 +668,11 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma) return vma->vm_flags & VM_ACCESS_FLAGS; } +static inline bool vma_is_dmem(struct vm_area_struct *vma) +{ + return !!(vma->vm_flags & VM_DMEM); +} + #ifdef CONFIG_SHMEM /* * The vma_is_shmem is not inline because it is used only by slow diff --git a/mm/gup.c b/mm/gup.c index 47c8197..0ea9071 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -492,8 +492,11 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, goto no_page; } else if (unlikely(!page)) { if (flags & FOLL_DUMP) { - /* Avoid special (like zero) pages in core dumps */ - page = ERR_PTR(-EFAULT); + if (vma_is_dmem(vma)) + page = ERR_PTR(-EEXIST); + else + /* Avoid special (like zero) pages in core dumps */ + page = ERR_PTR(-EFAULT); goto out; } diff --git a/mm/mincore.c b/mm/mincore.c index 02db1a8..f8d10e4 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -78,8 +78,12 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff_t pgoff; pgoff = linear_page_index(vma, addr); - for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + for (i = 0; i < nr; i++, pgoff++) { + if (vma_is_dmem(vma)) + vec[i] = 1; + else + vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + } } else { for (i = 0; i < nr; i++) vec[i] = 0; diff --git a/mm/mprotect.c b/mm/mprotect.c index 56c02be..b1650b5 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -236,7 +236,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, * for all the checks. */ if (!is_swap_pmd(*pmd) && !pmd_devmap(*pmd) && - pmd_none_or_clear_bad_unless_trans_huge(pmd)) + pmd_none_or_clear_bad_unless_trans_huge(pmd) && !pmd_special(*pmd)) goto next; /* invoke the mmu notifier if the pmd is populated */ @@ -412,6 +412,9 @@ static int prot_none_test(unsigned long addr, unsigned long next, return 0; } + if (vma_is_dmem(vma)) + return -EINVAL; + /* * Do PROT_NONE PFN permission checks here when we can still * bail out without undoing a lot of state. This is a rather diff --git a/mm/mremap.c b/mm/mremap.c index 138abba..598e681 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -482,6 +482,9 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, if (!vma || vma->vm_start > addr) return ERR_PTR(-EFAULT); + if (vma_is_dmem(vma)) + return ERR_PTR(-EINVAL); + /* * !old_len is a special case where an attempt is made to 'duplicate' * a mapping. This makes no sense for private mappings as it will From patchwork Mon Dec 7 11:31:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 032E2C4361B for ; Mon, 7 Dec 2020 11:35:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8CA9F23340 for ; Mon, 7 Dec 2020 11:35:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8CA9F23340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 22F668D0017; Mon, 7 Dec 2020 06:35:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E02C8D0001; Mon, 7 Dec 2020 06:35:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FA6B8D0017; Mon, 7 Dec 2020 06:35:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0046.hostedemail.com [216.40.44.46]) by kanga.kvack.org (Postfix) with ESMTP id E811F8D0001 for ; Mon, 7 Dec 2020 06:35:03 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A879933CD for ; Mon, 7 Dec 2020 11:35:03 +0000 (UTC) X-FDA: 77566279686.03.pear50_400f91b273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 7B1BF28A4E8 for ; Mon, 7 Dec 2020 11:35:03 +0000 (UTC) X-HE-Tag: pear50_400f91b273de X-Filterd-Recvd-Size: 8418 Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:02 +0000 (UTC) Received: by mail-pg1-f196.google.com with SMTP id t37so8654634pga.7 for ; Mon, 07 Dec 2020 03:35:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=d/DMcQ5p9kWqt/8+2V3dACGpqFuk8uMzI8jd59zbvK8=; b=uDgPrdaW7QyZet4jLziwzIsT2FNC+muT99czm8HXIiXGHPutl3zp4sPeraiv07ll3r BQwlcD3KzLyJ9zAdfZ/NPlv4v98ZJhyXKeYwxNluZgjB08IT0zg+7sgClpKtweMJHL2i L6gt6Sx8ux6fk/NHOVacX/B4cu+FfYGthe2oAKzj2STci8jOmsss5yDhreWOWvLf3u2L c14M826QMvoUlMzulslcHONuIx1sPeUjQUjZ8rtHti7e48GSHl4Yi/RFWw5lNMrL8MSM k7CHx/oPR5izM2UgYMUgqFLnQKcs7gnrSlWrGV5pRWN31vfQBOKvYyB71NjPuk9WUbRK SBDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d/DMcQ5p9kWqt/8+2V3dACGpqFuk8uMzI8jd59zbvK8=; b=m7qrd4Qh/gOlUs1Bc8A3BpvLYhaOoQ0dFXjaHYAjXnIP75cw5QeuXNAnhBfjWyzJu0 2y8F3Gs2oeO+XZD5e8XERase02eSjgt5Z0cBWL0uQTXt3qZFhJBh7jEoA3G7ROiMYpBh uaSQyeh5XbFIJPw7xL5GSddlSiJ1z/vbhin48cORhP5vv8jSf5gvKFBGtH+pkY9u4LRi DXcl8XCKOOVaYyL5JhdT6Saj5NxkVH8sb4uZSTXn4wC8Aj7wCdJlC0KjHfpo88OhS9pG zACqGRP6zUpZ7yxymeltXrmuoPnbT9oryzqWL8rK0h5KqmKDQhb6xq/0pNLMU7D3ARRN 8IjQ== X-Gm-Message-State: AOAM5321U7inxgBlo4yJC3VVFmkBT0gNDl/ZSD3FqdhnUI6ruJkkPyFI 6XK6a05jT/JRbV3PvgyN4dPKORtdxs0= X-Google-Smtp-Source: ABdhPJzsTfFyeDeSqCA2nVBB3th4mGCgQXxAf/7SC6+PJE0lBNEtPZT6GhKeWck8kJzv61kWa4w/EA== X-Received: by 2002:aa7:8b15:0:b029:196:59ad:ab93 with SMTP id f21-20020aa78b150000b029019659adab93mr15268290pfd.16.1607340901970; Mon, 07 Dec 2020 03:35:01 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.34.58 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:01 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 24/37] dmemfs: support hugepage for dmemfs Date: Mon, 7 Dec 2020 19:31:17 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang It add hugepage support for dmemfs. We use PFN_DMEM to notify vmf_insert_pfn_pmd, and dmem huge pmd will be marked with _PAGE_SPECIAL and _PAGE_DMEM. So that GUP-fast can separate dmemfs page from other page type and handle it correctly. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- fs/dmemfs/inode.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 111 insertions(+), 2 deletions(-) diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c index b165bd3..17a518c 100644 --- a/fs/dmemfs/inode.c +++ b/fs/dmemfs/inode.c @@ -457,7 +457,7 @@ static int dmemfs_split(struct vm_area_struct *vma, unsigned long addr) return 0; } -static vm_fault_t dmemfs_fault(struct vm_fault *vmf) +static vm_fault_t __dmemfs_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct inode *inode = file_inode(vma->vm_file); @@ -485,6 +485,63 @@ static vm_fault_t dmemfs_fault(struct vm_fault *vmf) return ret; } +static vm_fault_t __dmemfs_pmd_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + unsigned long pmd_addr = vmf->address & PMD_MASK; + unsigned long page_addr; + struct inode *inode = file_inode(vma->vm_file); + void *entry; + phys_addr_t phys; + pfn_t pfn; + int ret; + + if (dmem_page_size(inode) < PMD_SIZE) + return VM_FAULT_FALLBACK; + + WARN_ON(pmd_addr < vma->vm_start || + vma->vm_end < pmd_addr + PMD_SIZE); + + page_addr = vmf->address & ~(dmem_page_size(inode) - 1); + entry = radix_get_create_entry(vma, page_addr, inode, + linear_page_index(vma, page_addr)); + if (IS_ERR(entry)) + return (PTR_ERR(entry) == -ENOMEM) ? + VM_FAULT_OOM : VM_FAULT_SIGBUS; + + phys = dmem_addr_to_pfn(inode, dmem_entry_to_addr(inode, entry), + linear_page_index(vma, pmd_addr), PMD_SHIFT); + phys <<= PAGE_SHIFT; + pfn = phys_to_pfn_t(phys, PFN_DMEM); + ret = vmf_insert_pfn_pmd(vmf, pfn, !!(vma->vm_flags & VM_WRITE)); + + radix_put_entry(); + return ret; +} + +static vm_fault_t dmemfs_huge_fault(struct vm_fault *vmf, enum page_entry_size pe_size) +{ + int ret; + + switch (pe_size) { + case PE_SIZE_PTE: + ret = __dmemfs_fault(vmf); + break; + case PE_SIZE_PMD: + ret = __dmemfs_pmd_fault(vmf); + break; + default: + ret = VM_FAULT_SIGBUS; + } + + return ret; +} + +static vm_fault_t dmemfs_fault(struct vm_fault *vmf) +{ + return dmemfs_huge_fault(vmf, PE_SIZE_PTE); +} + static unsigned long dmemfs_pagesize(struct vm_area_struct *vma) { return dmem_page_size(file_inode(vma->vm_file)); @@ -495,6 +552,7 @@ static unsigned long dmemfs_pagesize(struct vm_area_struct *vma) .fault = dmemfs_fault, .pagesize = dmemfs_pagesize, .access = dmemfs_access_dmem, + .huge_fault = dmemfs_huge_fault, }; int dmemfs_file_mmap(struct file *file, struct vm_area_struct *vma) @@ -507,15 +565,66 @@ int dmemfs_file_mmap(struct file *file, struct vm_area_struct *vma) if (!(vma->vm_flags & VM_SHARED)) return -EINVAL; - vma->vm_flags |= VM_PFNMAP | VM_DMEM | VM_IO; + vma->vm_flags |= VM_PFNMAP | VM_DONTCOPY | VM_DMEM | VM_IO; + + if (dmem_page_size(inode) != PAGE_SIZE) + vma->vm_flags |= VM_HUGEPAGE; file_accessed(file); vma->vm_ops = &dmemfs_vm_ops; return 0; } +/* + * If the size of area returned by mm->get_unmapped_area() is one + * dmem pagesize larger than 'len', the returned addr by + * mm->get_unmapped_area() could be aligned to dmem pagesize to + * meet alignment demand. + */ +static unsigned long +dmemfs_get_unmapped_area(struct file *file, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags) +{ + unsigned long len_pad; + unsigned long off = pgoff << PAGE_SHIFT; + unsigned long align; + + align = dmem_page_size(file_inode(file)); + + /* For pud or pmd pagesize, could not support fault fallback. */ + if (len & (align - 1)) + return -EINVAL; + if (len > TASK_SIZE) + return -ENOMEM; + + if (flags & MAP_FIXED) { + if (addr & (align - 1)) + return -EINVAL; + return addr; + } + + /* + * Pad a extra align space for 'len', as we want to find a unmapped + * area which is larger enough to align with dmemfs pagesize, if + * pagesize of dmem is larger than 4K. + */ + len_pad = (align == PAGE_SIZE) ? len : len + align; + + /* 'len' or 'off' is too large for pad. */ + if (len_pad < len || (off + len_pad) < off) + return -EINVAL; + + addr = current->mm->get_unmapped_area(file, addr, len_pad, + pgoff, flags); + + /* Now 'addr' could be aligned to upper boundary. */ + return IS_ERR_VALUE(addr) ? addr : round_up(addr, align); +} + static const struct file_operations dmemfs_file_operations = { .mmap = dmemfs_file_mmap, + .get_unmapped_area = dmemfs_get_unmapped_area, }; static int dmemfs_parse_param(struct fs_context *fc, struct fs_parameter *param) From patchwork Mon Dec 7 11:31:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24274C4361B for ; Mon, 7 Dec 2020 11:35:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9EBEE23340 for ; Mon, 7 Dec 2020 11:35:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9EBEE23340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3695C8D0018; Mon, 7 Dec 2020 06:35:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 31AC68D0001; Mon, 7 Dec 2020 06:35:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E1BB8D0018; Mon, 7 Dec 2020 06:35:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0003.hostedemail.com [216.40.44.3]) by kanga.kvack.org (Postfix) with ESMTP id 09ACC8D0001 for ; Mon, 7 Dec 2020 06:35:07 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C11D11EE6 for ; Mon, 7 Dec 2020 11:35:06 +0000 (UTC) X-FDA: 77566279812.03.basin83_5b10f03273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id A600E28A4E8 for ; Mon, 7 Dec 2020 11:35:06 +0000 (UTC) X-HE-Tag: basin83_5b10f03273de X-Filterd-Recvd-Size: 4263 Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:06 +0000 (UTC) Received: by mail-pf1-f194.google.com with SMTP id s21so9602897pfu.13 for ; Mon, 07 Dec 2020 03:35:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=DlQnH7arXkpV3AxvylV8KPwG+QOIFG2k6zceWF8EMek=; b=slULaRgClfSRtlq6MZQolWl14N3WoKzCDl91aATl6tX8HOfVXd7kfIwWn5h47oBm3h xiPGbI6vPy35U0R/c4QR8TcprKIJDFu9YT12UcQuplY38vOUzuIMOUmOng8ftygUWfTr R1ZAIx1wp1N0xM3YOBmolFbpCccHmRq/UeETV4HjisGBwUXZHQZrrafX6WsVEOPwXQu9 0N2lNLtp3gJavRT6Vww4WvELq4NKpp8UbezAjxjLamlZR2rUohbuyED91YZZx9c5W52C 8RHuWAx0H4siG5FrBXMPIyV8dIchxJEE581mAM4LCKH+oYKBsM2HDcxznhasrQeZ1BcL O2Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DlQnH7arXkpV3AxvylV8KPwG+QOIFG2k6zceWF8EMek=; b=qndnJx9DFjma8tl91xdkiRxCKbaXvTnI2Dd6HNRNZJwhvQ+Bu/ULX6dBz8AJM9T3r2 kMC3JtD2U356yOkX7g35nFthvfz7mpP1v3OUfLidolWdlfFCxR1Ktar/IP+bkrfKa+NN Eo27ppxmC+DWcjAsbTBQNiMiXg+SRndT+6sxRSKgbe3vrlrdNlzZ+E9uchbrM4SWZCET aAK/hfx2TRPRZLAV2oleDyCc1cpHWGL013hbGRtC4zC0CBKvmKmfgRZSMPF5eZYOimRM f9GbsaAChCb1yDqfvsCuFXV6OL2oK9GeriE/8VcqcgNFoisHAxuksLcp8eV4ipoZhXLY l27g== X-Gm-Message-State: AOAM532iUCIUl+7jpWD6ViowS0SVufwX7NoULrBmTwqMe2eDW++ic2d5 ewmHfXtU5GXXbccEuzZ4eRogEVasKiQ= X-Google-Smtp-Source: ABdhPJzalODO+5vSfJ8urE4UuOEPrIa+ijmCZKRr1s/BpX28tmoQbHCWWrOgmsnDQaiZ1nRUpwNqDg== X-Received: by 2002:a17:902:bc4b:b029:db:2d61:5f37 with SMTP id t11-20020a170902bc4bb02900db2d615f37mr99717plz.79.1607340905372; Mon, 07 Dec 2020 03:35:05 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.02 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:04 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 25/37] mm, x86, dmem: fix estimation of reserved page for vaddr_get_pfn() Date: Mon, 7 Dec 2020 19:31:18 +0800 Message-Id: <3bd3e2d485c46fae9eaeb501e1a6c51d19570b49.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Fix estimation of reserved page for vaddr_get_pfn() and check 'ret' before checking writable permission Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- drivers/vfio/vfio_iommu_type1.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 67e8276..c465d1a 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -471,6 +471,10 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, if (ret == -EAGAIN) goto retry; + if (!ret && (prot & IOMMU_WRITE) && + !(vma->vm_flags & VM_WRITE)) + ret = -EFAULT; + if (!ret && !is_invalid_reserved_pfn(*pfn)) ret = -EFAULT; } From patchwork Mon Dec 7 11:31:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955455 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E760C1B0E3 for ; Mon, 7 Dec 2020 11:35:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3732623340 for ; Mon, 7 Dec 2020 11:35:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3732623340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BCBC48D0019; Mon, 7 Dec 2020 06:35:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BA4DD8D0001; Mon, 7 Dec 2020 06:35:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB99D8D0019; Mon, 7 Dec 2020 06:35:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0065.hostedemail.com [216.40.44.65]) by kanga.kvack.org (Postfix) with ESMTP id 9700B8D0001 for ; Mon, 7 Dec 2020 06:35:10 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5F8C6180AD80F for ; Mon, 7 Dec 2020 11:35:10 +0000 (UTC) X-FDA: 77566279980.18.team04_4e0a488273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 3D01F100EDBD1 for ; Mon, 7 Dec 2020 11:35:10 +0000 (UTC) X-HE-Tag: team04_4e0a488273de X-Filterd-Recvd-Size: 5145 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:09 +0000 (UTC) Received: by mail-pf1-f172.google.com with SMTP id w6so9620387pfu.1 for ; Mon, 07 Dec 2020 03:35:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wOxoYAByFgjYHcAfYN08FzgKY6QNyvq8eeqHRPrva4Q=; b=IJifq/wHL5++Uh9KafwXFdFhAIVbfzAETWaT0Wz412DL1zioa8ks0IHqQMobHE0lb7 il49a16YoybvNwcEnH1LbheLRq9esiNl+KUZTTKeEYT3GykktEU9CyjsiyRRlcrL8V9i xjdjnm7I93RF446dlPxAOTolle3sJKf7sCBQPeMYaghlxfkhBNp+bJGbw3s+o46xRf18 SFvcQ0hHzZ5p8+RSv8+DKLLtLxoiI57o2MK8qgH6IXd7lmwhhobC8+LpPY6VhvKxG+CC hFuzLfRe+F25M9UbsIsS/AdyWNtgCGjD+2gfmnfUtyTDHFpMLDYdfzABVdQIvMR60xX3 6Y1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wOxoYAByFgjYHcAfYN08FzgKY6QNyvq8eeqHRPrva4Q=; b=g/Oym4vE52NEUpHy/XY5qm54mLxHxChbgYNjk2yFxypW1umMKrEiO/Hy3J7hE8nvwi HuwLimXFXuorgZ2SQ8qgvU/YhwSRtyxm61AnCLl9ZItpdCIMm+GXHO1mGwIO9BHlzt6k S0Pi+cYu9h0fH871/yk7fOdKa130FsR9DK4NpNSAROGNIDp8YH6UGOEZg2dlLMpR+x0O AAdsS/1lop755qm0KVa3v6m+GoDhnXJnv7F00cxp1RJTd4OeiyOWDT9ogi9ViusGTavl aRV05GjcPS/GgsRz0dEIl4hifIN6flMbXZzHK4ATMk0pdCM4P3CR6v225rBWDy2jGstn fr5w== X-Gm-Message-State: AOAM530v7fdlmoYxJD/Ph5+OAjiv/4ZejF2ZARwioKvYDNFaFzbQ8775 Nyzl8xJy6qzf+6Y0wxbMxBP43Pv+PsU= X-Google-Smtp-Source: ABdhPJwhpFxBYkxRsLS+BV+lvSO+B7245m5N1dEnwhx2kuzLaOR+A4L9InETPPJipSBryp7sZwn4Zw== X-Received: by 2002:a62:80ce:0:b029:19d:b280:5019 with SMTP id j197-20020a6280ce0000b029019db2805019mr15544917pfd.43.1607340908718; Mon, 07 Dec 2020 03:35:08 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:08 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 26/37] mm, dmem: introduce pud_special() for dmem huge pud support Date: Mon, 7 Dec 2020 19:31:19 +0800 Message-Id: <24c19b7db2fa3b405358489fc74a02cf648bfaf1.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang pud_special() will check both _PAGE_SPECIAL and _PAGE_DMEM bit as pmd_special() does. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- arch/x86/include/asm/pgtable.h | 13 +++++++++++++ include/linux/pgtable.h | 10 ++++++++++ 2 files changed, 23 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 6ce85d4..9e36d42 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -281,6 +281,12 @@ static inline int pmd_special(pmd_t pmd) return (pmd_val(pmd) & (_PAGE_SPECIAL | _PAGE_DMEM)) == (_PAGE_SPECIAL | _PAGE_DMEM); } + +static inline int pud_special(pud_t pud) +{ + return (pud_val(pud) & (_PAGE_SPECIAL | _PAGE_DMEM)) == + (_PAGE_SPECIAL | _PAGE_DMEM); +} #endif #ifdef CONFIG_ARCH_HAS_PTE_DEVMAP @@ -516,6 +522,13 @@ static inline pud_t pud_mkdirty(pud_t pud) return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); } +#ifdef CONFIG_ARCH_HAS_PTE_DMEM +static inline pud_t pud_mkdmem(pud_t pud) +{ + return pud_set_flags(pud, _PAGE_SPECIAL | _PAGE_DMEM); +} +#endif + static inline pud_t pud_mkdevmap(pud_t pud) { return pud_set_flags(pud, _PAGE_DEVMAP); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 30342b8..0ef03ff 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1167,6 +1167,16 @@ static inline int pmd_special(pmd_t pmd) { return 0; } + +static inline pud_t pud_mkdmem(pud_t pud) +{ + return pud; +} + +static inline int pud_special(pud_t pud) +{ + return 0; +} #endif #ifndef pmd_read_atomic From patchwork Mon Dec 7 11:31:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B245C4167B for ; Mon, 7 Dec 2020 11:35:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AE32C23407 for ; Mon, 7 Dec 2020 11:35:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AE32C23407 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 368E28D001A; Mon, 7 Dec 2020 06:35:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3409F8D0001; Mon, 7 Dec 2020 06:35:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 255688D001A; Mon, 7 Dec 2020 06:35:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207]) by kanga.kvack.org (Postfix) with ESMTP id 101008D0001 for ; Mon, 7 Dec 2020 06:35:14 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D2914180AD815 for ; Mon, 7 Dec 2020 11:35:13 +0000 (UTC) X-FDA: 77566280106.30.bait15_2413136273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id AF028180B3C83 for ; Mon, 7 Dec 2020 11:35:13 +0000 (UTC) X-HE-Tag: bait15_2413136273de X-Filterd-Recvd-Size: 11005 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:13 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id p21so6863197pjv.0 for ; Mon, 07 Dec 2020 03:35:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=R1pOMiSZIT4zzWwCC3RI8cqklW1Nb035cekwxKtaaVc=; b=tdedZG4TA6YIeSkLXKLwKJSIDRJsgaWD49qSThY/5qRbEOGMhswjKF+y+3tE//Q9GX Z46nH7F5RztgV7IpQF+nvQh5fP0PRvVuDbCtmL+DqLUnzwsCtwEWrcKYUUE4IOslWky6 zQLJvUhDs403MDnqw27lWTBP71HJqyW9dk+grHTF+6OPgJ452jBXTUZi8b4Iy1LCRY0a B7KsP2/DYqTQU9tErE4rG6jV75Gvt9+RD+EOGMwwsh3JBsAoxFJxie7XD44CJODdyBMH ZZjkx5xM1o3RMpaknmRKIUEsPRJuMTFE4JxUIeBNSdaI67DN+EkZESbAsjtXaH7s5v8J dDgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R1pOMiSZIT4zzWwCC3RI8cqklW1Nb035cekwxKtaaVc=; b=W+DuLzeZigDJN0nq8lpmipDFPLSZ3r7RkjlepvklzgJky/NUo2fCC0yG0RsBVBwcKa lGOMr6e/NoWQHM92NM/a9B9JkY/D5um1OXmotyk12KNPbcjieusGtkbbORWu21TlJ0Sl Nz17cX6K4V4GyG3MuC7XPJFsh16nW4ClNQfPn+MMGyMTzSGGRnS3QaBydBNRRpHMP3wO 9fBmjRsX4bLrxqLXRrB9NA9xuzy4kXGteIAmCldWW9F7X0ye5hFbr3sEIjLBBai7Q3ZL SHh4NW8Zaog9L+/Nru4Y6ReoiPg3GXFz5W3xuNXEsEsry2dP68GuflkDw/dVmhi/F1ZD cZgw== X-Gm-Message-State: AOAM530I3O8qYbFwRlckb1ilWHwcIk4pUACCd0xv64hYZ9thqkNrViy0 N/aPLB2tKX3YUfM0AkVyw/RMM2bmo94= X-Google-Smtp-Source: ABdhPJwOADOp4BHe+nOTWmOnCV7Krmz1UonOWASME9633gE5AHOao5YhBEMB9G6gr5genfai3+Mg9g== X-Received: by 2002:a17:90a:9e5:: with SMTP id 92mr16288519pjo.176.1607340911988; Mon, 07 Dec 2020 03:35:11 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:11 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 27/37] mm: add pud_special() check to support dmem huge pud Date: Mon, 7 Dec 2020 19:31:20 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Add pud_special() and follow_special_pud() to support dmem huge pud as we do for dmem huge pmd. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- arch/x86/include/asm/pgtable.h | 2 +- include/linux/huge_mm.h | 2 +- mm/gup.c | 46 ++++++++++++++++++++++++++++++++++++++++++ mm/huge_memory.c | 11 ++++++---- mm/memory.c | 4 ++-- mm/mprotect.c | 2 ++ mm/pagewalk.c | 2 +- 7 files changed, 60 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 9e36d42..2284387 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -265,7 +265,7 @@ static inline int pmd_trans_huge(pmd_t pmd) #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD static inline int pud_trans_huge(pud_t pud) { - return (pud_val(pud) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE; + return (pud_val(pud) & (_PAGE_PSE|_PAGE_DEVMAP|_PAGE_DMEM)) == _PAGE_PSE; } #endif diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2514b90..b69c940 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -251,7 +251,7 @@ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) { - if (pud_trans_huge(*pud) || pud_devmap(*pud)) + if (pud_trans_huge(*pud) || pud_devmap(*pud) || pud_special(*pud)) return __pud_trans_huge_lock(pud, vma); else return NULL; diff --git a/mm/gup.c b/mm/gup.c index 0ea9071..8eb85ba 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -423,6 +423,42 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, return ERR_PTR(-EEXIST); } +static struct page * +follow_special_pud(struct vm_area_struct *vma, unsigned long address, + pud_t *pud, unsigned int flags) +{ + spinlock_t *ptl; + + if ((flags & FOLL_DUMP) && is_huge_zero_pud(*pud)) + /* Avoid special (like zero) pages in core dumps */ + return ERR_PTR(-EFAULT); + + /* No page to get reference */ + if (flags & FOLL_GET) + return ERR_PTR(-EFAULT); + + if (flags & FOLL_TOUCH) { + pud_t _pud; + + ptl = pud_lock(vma->vm_mm, pud); + if (!pud_special(*pud)) { + spin_unlock(ptl); + return NULL; + } + _pud = pud_mkyoung(*pud); + if (flags & FOLL_WRITE) + _pud = pud_mkdirty(_pud); + if (pudp_set_access_flags(vma, address & HPAGE_PMD_MASK, + pud, _pud, + flags & FOLL_WRITE)) + update_mmu_cache_pud(vma, address, pud); + spin_unlock(ptl); + } + + /* Proper page table entry exists, but no corresponding struct page */ + return ERR_PTR(-EEXIST); +} + /* * FOLL_FORCE can write to even unwritable pte's, but only * after we've gone through a COW cycle and they are dirty. @@ -726,6 +762,12 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma, return page; return no_page_table(vma, flags); } + if (pud_special(*pud)) { + page = follow_special_pud(vma, address, pud, flags); + if (page) + return page; + return no_page_table(vma, flags); + } if (is_hugepd(__hugepd(pud_val(*pud)))) { page = follow_huge_pd(vma, address, __hugepd(pud_val(*pud)), flags, @@ -2511,6 +2553,10 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, if (!pud_access_permitted(orig, flags & FOLL_WRITE)) return 0; + /* Bypass dmem pud. It will be handled in outside routine. */ + if (pud_special(orig)) + return 0; + if (pud_devmap(orig)) { if (unlikely(flags & FOLL_LONGTERM)) return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6e52d57..7c5385a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -883,6 +883,8 @@ static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr, entry = pud_mkhuge(pfn_t_pud(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pud_mkdevmap(entry); + if (pfn_t_dmem(pfn)) + entry = pud_mkdmem(entry); if (write) { entry = pud_mkyoung(pud_mkdirty(entry)); entry = maybe_pud_mkwrite(entry, vma); @@ -919,7 +921,7 @@ vm_fault_t vmf_insert_pfn_pud_prot(struct vm_fault *vmf, pfn_t pfn, * can't support a 'special' bit. */ BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) && - !pfn_t_devmap(pfn)); + !pfn_t_devmap(pfn) && !pfn_t_dmem(pfn)); BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) == (VM_PFNMAP|VM_MIXEDMAP)); BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); @@ -1911,7 +1913,7 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) spinlock_t *ptl; ptl = pud_lock(vma->vm_mm, pud); - if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) + if (likely(pud_trans_huge(*pud) || pud_devmap(*pud) || pud_special(*pud))) return ptl; spin_unlock(ptl); return NULL; @@ -1922,6 +1924,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, unsigned long addr) { spinlock_t *ptl; + pud_t orig_pud; ptl = __pud_trans_huge_lock(pud, vma); if (!ptl) @@ -1932,9 +1935,9 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, * pgtable_trans_huge_withdraw after finishing pudp related * operations. */ - pudp_huge_get_and_clear_full(tlb->mm, addr, pud, tlb->fullmm); + orig_pud = pudp_huge_get_and_clear_full(tlb->mm, addr, pud, tlb->fullmm); tlb_remove_pud_tlb_entry(tlb, pud, addr); - if (vma_is_special_huge(vma)) { + if (vma_is_special_huge(vma) || pud_special(orig_pud)) { spin_unlock(ptl); /* No zero page support yet */ } else { diff --git a/mm/memory.c b/mm/memory.c index abb9148..01f3b05 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1078,7 +1078,7 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, src_pud = pud_offset(src_p4d, addr); do { next = pud_addr_end(addr, end); - if (pud_trans_huge(*src_pud) || pud_devmap(*src_pud)) { + if (pud_trans_huge(*src_pud) || pud_devmap(*src_pud) || pud_special(*src_pud)) { int err; VM_BUG_ON_VMA(next-addr != HPAGE_PUD_SIZE, src_vma); @@ -1375,7 +1375,7 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb, pud = pud_offset(p4d, addr); do { next = pud_addr_end(addr, end); - if (pud_trans_huge(*pud) || pud_devmap(*pud)) { + if (pud_trans_huge(*pud) || pud_devmap(*pud) || pud_special(*pud)) { if (next - addr != HPAGE_PUD_SIZE) { mmap_assert_locked(tlb->mm); split_huge_pud(vma, pud, addr); diff --git a/mm/mprotect.c b/mm/mprotect.c index b1650b5..05fa453 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -292,6 +292,8 @@ static inline unsigned long change_pud_range(struct vm_area_struct *vma, pud = pud_offset(p4d, addr); do { next = pud_addr_end(addr, end); + if (pud_special(*pud)) + continue; if (pud_none_or_clear_bad(pud)) continue; pages += change_pmd_range(vma, pud, addr, next, newprot, diff --git a/mm/pagewalk.c b/mm/pagewalk.c index e7c4575..afd8bca 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -129,7 +129,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, do { again: next = pud_addr_end(addr, end); - if (pud_none(*pud) || (!walk->vma && !walk->no_vma)) { + if (pud_none(*pud) || (!walk->vma && !walk->no_vma) || pud_special(*pud)) { if (ops->pte_hole) err = ops->pte_hole(addr, next, depth, walk); if (err) From patchwork Mon Dec 7 11:31:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955469 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.7 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNWANTED_LANGUAGE_BODY,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A59CC4361B for ; Mon, 7 Dec 2020 11:35:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CDF3823340 for ; Mon, 7 Dec 2020 11:35:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CDF3823340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 64F528D001B; Mon, 7 Dec 2020 06:35:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6246F8D0001; Mon, 7 Dec 2020 06:35:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EDF68D001B; Mon, 7 Dec 2020 06:35:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id 397308D0001 for ; Mon, 7 Dec 2020 06:35:17 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E542C1EE6 for ; Mon, 7 Dec 2020 11:35:16 +0000 (UTC) X-FDA: 77566280232.25.plant52_1f0a26d273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id B47051804E3A0 for ; Mon, 7 Dec 2020 11:35:16 +0000 (UTC) X-HE-Tag: plant52_1f0a26d273de X-Filterd-Recvd-Size: 5409 Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:16 +0000 (UTC) Received: by mail-pg1-f175.google.com with SMTP id e2so316825pgi.5 for ; Mon, 07 Dec 2020 03:35:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WOqN3wAFWNmwPLLbitVjeNVP3/1EeNDqwEgp1U7by0Y=; b=t2mhCRx4ddW/Vy1XTuXt7xLNO/CzgHXZeydB8wf2nGDpcuoOfUMGpcemrPer6RwsAu uNrNzd2nFAftQMHWukm68XBY2ZReEiQDmMuknuogHStEJ92RpzjB90S9K9a1cbyDhi2W 9Vm6qF/G07WKj7L5wgUVDnQOr5/Zh79lLesUmR9Lh/aRQyBdGzd14FIUZji0fOJd8pl+ /jy3wxMeIMmrTvfNtbWdW+RdAKQyWUr9HmQ31ldStU+rb3XP8uT16WXN0+pGuRlr4ZTK AOGRKulren3uPBcL5hwFYJaFeUE2vqsl9ziKH6yRfEJ//oEfAk2A5bJm4l2xsQwEEOOy Hptg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WOqN3wAFWNmwPLLbitVjeNVP3/1EeNDqwEgp1U7by0Y=; b=WPGp/QZipQv5FFXZxt2uUfrC06JJcLtCZq+4nlf7joLmtNobZ/7RbVCnDphfZ/3TmF RVNsCDDvk+QJ1SHHCHQRIPT1NO/FLW0qUZaqggPWB4Q9QR4N6YeZe03Uoo9pQCmF6+41 jhp/7QB1888THlCevFY7piwkiMOZLn5qgnPSZCiIMnLSX+vIuh21ub22CV46aNZNyj5R vA96rbMlJPSqSUUYG2crkJzBiS04gEk8kDqDqNB+DEQ6fqi4VpdrCKmlYVsdwHf1/d1F moLtvjY37KoTdIzjHNAqwyxBSUMuCu9XHEU+19g1fgC2syrC2ZPpaBTdgai0gjSjP4Zi MCxg== X-Gm-Message-State: AOAM531AmSDe4iQFpBnnRtwBA7I/TTXclS5t9KAvYignibt5KHGGQKga DVGK/bsqdbalgguKCphnIgdNRlC4Vx4= X-Google-Smtp-Source: ABdhPJwbvfVevLMEXxhQw8TM4SqqQZ/7ZuW0O/nKHs8QddAXnvoNRyEBcv9DCTj5vDpAlxEcd07BrA== X-Received: by 2002:a17:902:860a:b029:da:e83a:7f1f with SMTP id f10-20020a170902860ab02900dae83a7f1fmr7784842plo.60.1607340915254; Mon, 07 Dec 2020 03:35:15 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.12 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:14 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 28/37] mm, dmemfs: support huge_fault() for dmemfs Date: Mon, 7 Dec 2020 19:31:21 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Introduce __dmemfs_huge_fault() to handle 1G huge pud for dmemfs. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- fs/dmemfs/inode.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c index 17a518c..f698b9d 100644 --- a/fs/dmemfs/inode.c +++ b/fs/dmemfs/inode.c @@ -519,6 +519,43 @@ static vm_fault_t __dmemfs_pmd_fault(struct vm_fault *vmf) return ret; } +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static vm_fault_t __dmemfs_huge_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + unsigned long pud_addr = vmf->address & PUD_MASK; + struct inode *inode = file_inode(vma->vm_file); + void *entry; + phys_addr_t phys; + pfn_t pfn; + int ret; + + if (dmem_page_size(inode) < PUD_SIZE) + return VM_FAULT_FALLBACK; + + WARN_ON(pud_addr < vma->vm_start || + vma->vm_end < pud_addr + PUD_SIZE); + + entry = radix_get_create_entry(vma, pud_addr, inode, + linear_page_index(vma, pud_addr)); + if (IS_ERR(entry)) + return (PTR_ERR(entry) == -ENOMEM) ? + VM_FAULT_OOM : VM_FAULT_SIGBUS; + + phys = dmem_entry_to_addr(inode, entry); + pfn = phys_to_pfn_t(phys, PFN_DMEM); + ret = vmf_insert_pfn_pud(vmf, pfn, !!(vma->vm_flags & VM_WRITE)); + + radix_put_entry(); + return ret; +} +#else +static vm_fault_t __dmemfs_huge_fault(struct vm_fault *vmf) +{ + return VM_FAULT_FALLBACK; +} +#endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ + static vm_fault_t dmemfs_huge_fault(struct vm_fault *vmf, enum page_entry_size pe_size) { int ret; @@ -530,6 +567,9 @@ static vm_fault_t dmemfs_huge_fault(struct vm_fault *vmf, enum page_entry_size p case PE_SIZE_PMD: ret = __dmemfs_pmd_fault(vmf); break; + case PE_SIZE_PUD: + ret = __dmemfs_huge_fault(vmf); + break; default: ret = VM_FAULT_SIGBUS; } From patchwork Mon Dec 7 11:31:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955461 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 587FDC0018C for ; Mon, 7 Dec 2020 11:35:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E7E6D233A0 for ; Mon, 7 Dec 2020 11:35:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E7E6D233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8236D8D001C; Mon, 7 Dec 2020 06:35:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D2F18D0001; Mon, 7 Dec 2020 06:35:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E87B8D001C; Mon, 7 Dec 2020 06:35:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id 5908B8D0001 for ; Mon, 7 Dec 2020 06:35:20 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2028A181AEF1F for ; Mon, 7 Dec 2020 11:35:20 +0000 (UTC) X-FDA: 77566280400.21.meat83_250c2aa273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 023AB180442C4 for ; Mon, 7 Dec 2020 11:35:19 +0000 (UTC) X-HE-Tag: meat83_250c2aa273de X-Filterd-Recvd-Size: 7084 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:19 +0000 (UTC) Received: by mail-pg1-f174.google.com with SMTP id 69so806469pgg.8 for ; Mon, 07 Dec 2020 03:35:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TpyAX8bjmgKWxsxprFNaku2nEJ5v4XWrim2WqmZHDpg=; b=eS4D42NrIJAv240gGh41rvA64V7u41KBS4tRtOuPLsuYqHzqarSS8/n7/tX9diZSle XcvHurUEWFhPNZhnW01+0VblwSa4+nqPfvuelH67hPrhQTsGl0xYiozhsfaNQdt9UcuU xTIW+q4n5aGxXZ9kufVbFlTvHWOCocxT0VIsxD+mL1ugo2Z2vjK6AUIMi/F+wUKrXguJ KblNy3cm9NJAk6ucQE5OjL/xyx5c3YMwL7Ulxvlteu0bTmPrUEiEIsIfNOYUO1n/oHDR 4l0DL0eiG4Uvtj1Wdkd9w/kb6JwDma9v7EE8NNl/eVDjMBv/foOGibW+Nd/9kNb3U8bT M6PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TpyAX8bjmgKWxsxprFNaku2nEJ5v4XWrim2WqmZHDpg=; b=eSfOTH7rLEjQlKkiX+5eVCJ8t0RQtqikAMZy88l2OEbvJbW5pQXLq2DeMPd5BsaBTe 996UkuPqWyBeyfWnvkZIc4ylp02xluoULNqzXKC2XA/S6RRldk5KAYy0zHVQaQz67KEP 9XJmUUv7NHlBDbw7LCet6D4Qk7CPYNjTFz+xLI5dMqvhR/Gl/6fnOqpdlf+5A8587Acd cJQDHK/ksmmhdDKYo/G3y7eSuZzefIEdoUAP8Z7I4lGG3xMqXG9WDneiEEbHPwld30l0 2kdh8feKi12FGd4beHLf7VhuCInWkRwLTmm+2JlS0QynDXlLZNEJAJRkH87aWimDR8S/ IZBQ== X-Gm-Message-State: AOAM53087lZpohc21ZRkVqS2kO3GIJb0RI7W1BuunC9wl/qOT+pRI6RX iU7eM/gijcaiSuhh/feRLoq7g9VM7Do= X-Google-Smtp-Source: ABdhPJwyNYWW30gHq9pOZotI8O5TvgnyfvtFboa27PYJJsWjcCMUClmt5pZogoHFcS4MUOgwfk1x+Q== X-Received: by 2002:a63:7f03:: with SMTP id a3mr17930488pgd.313.1607340918580; Mon, 07 Dec 2020 03:35:18 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.15 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:18 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 29/37] mm: add follow_pte_pud() to support huge pud look up Date: Mon, 7 Dec 2020 19:31:22 +0800 Message-Id: <43e14d8a452789e5321b93810a50acfe95672e99.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Since we had supported dmem huge pud, here support dmem huge pud for hva_to_pfn(). Similar to follow_pte_pmd(), follow_pte_pud() allows a PTE lead or a huge page PMD or huge page PUD to be found and returned. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- mm/memory.c | 52 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 44 insertions(+), 8 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 01f3b05..dfc95be 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4698,9 +4698,9 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address) } #endif /* __PAGETABLE_PMD_FOLDED */ -static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address, +static int __follow_pte_pud(struct mm_struct *mm, unsigned long address, struct mmu_notifier_range *range, - pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp) + pte_t **ptepp, pmd_t **pmdpp, pud_t **pudpp, spinlock_t **ptlp) { pgd_t *pgd; p4d_t *p4d; @@ -4717,6 +4717,26 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address, goto out; pud = pud_offset(p4d, address); + VM_BUG_ON(pud_trans_huge(*pud)); + if (pud_huge(*pud)) { + if (!pudpp) + goto out; + + if (range) { + mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, 0, + NULL, mm, address & PUD_MASK, + (address & PUD_MASK) + PUD_SIZE); + mmu_notifier_invalidate_range_start(range); + } + *ptlp = pud_lock(mm, pud); + if (pud_huge(*pud)) { + *pudpp = pud; + return 0; + } + spin_unlock(*ptlp); + if (range) + mmu_notifier_invalidate_range_end(range); + } if (pud_none(*pud) || unlikely(pud_bad(*pud))) goto out; @@ -4772,8 +4792,8 @@ static inline int follow_pte(struct mm_struct *mm, unsigned long address, /* (void) is needed to make gcc happy */ (void) __cond_lock(*ptlp, - !(res = __follow_pte_pmd(mm, address, NULL, - ptepp, NULL, ptlp))); + !(res = __follow_pte_pud(mm, address, NULL, + ptepp, NULL, NULL, ptlp))); return res; } @@ -4785,12 +4805,24 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address, /* (void) is needed to make gcc happy */ (void) __cond_lock(*ptlp, - !(res = __follow_pte_pmd(mm, address, range, - ptepp, pmdpp, ptlp))); + !(res = __follow_pte_pud(mm, address, range, + ptepp, pmdpp, NULL, ptlp))); return res; } EXPORT_SYMBOL(follow_pte_pmd); +int follow_pte_pud(struct mm_struct *mm, unsigned long address, + struct mmu_notifier_range *range, + pte_t **ptepp, pmd_t **pmdpp, pud_t **pudpp, spinlock_t **ptlp) +{ + int res; + + /* (void) is needed to make gcc happy */ + (void) __cond_lock(*ptlp, + !(res = __follow_pte_pud(mm, address, range, + ptepp, pmdpp, pudpp, ptlp))); + return res; +} /** * follow_pfn - look up PFN at a user virtual address * @vma: memory mapping @@ -4808,15 +4840,19 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl; pte_t *ptep; pmd_t *pmdp = NULL; + pud_t *pudp = NULL; if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) return ret; - ret = follow_pte_pmd(vma->vm_mm, address, NULL, &ptep, &pmdp, &ptl); + ret = follow_pte_pud(vma->vm_mm, address, NULL, &ptep, &pmdp, &pudp, &ptl); if (ret) return ret; - if (pmdp) { + if (pudp) { + *pfn = pud_pfn(*pudp) + ((address & ~PUD_MASK) >> PAGE_SHIFT); + spin_unlock(ptl); + } else if (pmdp) { *pfn = pmd_pfn(*pmdp) + ((address & ~PMD_MASK) >> PAGE_SHIFT); spin_unlock(ptl); } else { From patchwork Mon Dec 7 11:31:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E363FC1B0D8 for ; Mon, 7 Dec 2020 11:35:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 586EF23340 for ; Mon, 7 Dec 2020 11:35:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 586EF23340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E3E6C8D001D; Mon, 7 Dec 2020 06:35:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E14828D0001; Mon, 7 Dec 2020 06:35:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D312B8D001D; Mon, 7 Dec 2020 06:35:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id BB1EE8D0001 for ; Mon, 7 Dec 2020 06:35:23 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 7F71433CD for ; Mon, 7 Dec 2020 11:35:23 +0000 (UTC) X-FDA: 77566280526.01.hen07_320aa71273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 5CA2110046469 for ; Mon, 7 Dec 2020 11:35:23 +0000 (UTC) X-HE-Tag: hen07_320aa71273de X-Filterd-Recvd-Size: 9054 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:22 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id e2so317037pgi.5 for ; Mon, 07 Dec 2020 03:35:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=amXRNC1mfdywzpWrbOff714/vb9YHjvjsXiNQKXcqcA=; b=k6DezM51fJayfcLxJQektp8YJvDSsBbz0aVlfSnHNBt+2Wxn4RS9e6sNgSGv12LZdn fScIMhUj9dLNh7Arm9VHs4vEpVXoSPwObS0+Csb8MMe9MYzmH6dg0lqwqyG6nAde1wOi 34bTopBZUATvrIHVmW1MmgCkKXfJgTCxl5kBjmRFwuXuFss9pzdJv061rMlRT0sR6cvm QdbctetT25eotvN/DrxAcuWPDaSg3ZFzoyaDSMKXEBdg8nUVJsjJWPK3XNCksnNUGHkB SEu+RmRnibNi7ublI6HydqMUjgwWBf6MWZu/qay+0SGsOUxJkYzQEZGovb73STeCAsJh 9Bug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=amXRNC1mfdywzpWrbOff714/vb9YHjvjsXiNQKXcqcA=; b=rb7Oxqjzq9vD5i3Hkxr3/ucZncUeLA1Dgs0Zmqd6Lzy4OAWdtasKQuNimC9cQMelVS tXWbhpuGJAqZV9ZKpJTW27Dx2qC8vb2okQh7ewhMu83zI2DL8V7eXWZH+gjtSH3h8tvn 7HNAU6FLzU5yVooBU+ap1YweHsiUC26b6WmsyeT3BEWMrBh2t5qLe1ZdoceG0gyfm3wG rbEKaIMMuOIWOlQ0IfkpYq/MUYzgm3ZsEVyXyBg5PqoIyZuDKWSZASrtU+TG0G+X3avX VYSKIunfgdsR/vqrfbmYspM4DIYTRXMLLJ2srCyCl+23mLJ1mgSKMFcBqP4jiZXMDm7G IMXQ== X-Gm-Message-State: AOAM532BTZXkUn7DIGAFIvLa2eWJGSRgywO7NiDnIG6rg9hTi2bbiwde d7d556wNfktFFb3Gg7rqHwi49u+saf0= X-Google-Smtp-Source: ABdhPJzktfbAut0a4U/9/77Kk3IPuy2wYdqsh6ukZ0LN08xk9L3dyRMq6PVyY4Nz0UXg4U/HCNjWVQ== X-Received: by 2002:a17:902:b101:b029:da:c50e:cd56 with SMTP id q1-20020a170902b101b02900dac50ecd56mr15820085plr.59.1607340921829; Mon, 07 Dec 2020 03:35:21 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.18 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:21 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 30/37] dmem: introduce dmem_bitmap_alloc() and dmem_bitmap_free() Date: Mon, 7 Dec 2020 19:31:23 +0800 Message-Id: <6eca6b9b58b3cf9a52c8227ee92d9b926c249f0b.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang If dmem contained in dmem region is too large and dmemfs is mounted as 4K pagesize, size of bitmap in this dmem region maybe exceed maximal available memory of kzalloc(). It would cause kzalloc() fail. So introduce dmem_bitmap_alloc() and use vzalloc() if bitmap is larger than PAGE_SIZE as vzalloc() will get sparse page. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- fs/inode.c | 6 +++++ include/linux/fs.h | 1 + mm/dmem.c | 69 ++++++++++++++++++++++++++++++++++-------------------- 3 files changed, 50 insertions(+), 26 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 9d78c37..9b6363d3 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -210,6 +210,12 @@ int inode_init_always(struct super_block *sb, struct inode *inode) } EXPORT_SYMBOL(inode_init_always); +struct inode *alloc_inode_nonrcu(void) +{ + return kmem_cache_alloc(inode_cachep, GFP_KERNEL); +} +EXPORT_SYMBOL(alloc_inode_nonrcu); + void free_inode_nonrcu(struct inode *inode) { kmem_cache_free(inode_cachep, inode); diff --git a/include/linux/fs.h b/include/linux/fs.h index 8667d0c..bc7a89c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2937,6 +2937,7 @@ static inline bool is_zero_ino(ino_t ino) extern void __destroy_inode(struct inode *); extern struct inode *new_inode_pseudo(struct super_block *sb); extern struct inode *new_inode(struct super_block *sb); +extern struct inode *alloc_inode_nonrcu(void); extern void free_inode_nonrcu(struct inode *inode); extern int should_remove_suid(struct dentry *); extern int file_remove_privs(struct file *); diff --git a/mm/dmem.c b/mm/dmem.c index eb6df70..50cdff9 100644 --- a/mm/dmem.c +++ b/mm/dmem.c @@ -17,6 +17,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -362,9 +363,38 @@ static int __init dmem_node_init(struct dmem_node *dnode) return 0; } +static unsigned long *dmem_bitmap_alloc(unsigned long pages, + unsigned long *static_bitmap) +{ + unsigned long *bitmap, size; + + size = BITS_TO_LONGS(pages) * sizeof(long); + if (size <= sizeof(*static_bitmap)) + bitmap = static_bitmap; + else if (size <= PAGE_SIZE) + bitmap = kzalloc(size, GFP_KERNEL); + else + bitmap = vzalloc(size); + + return bitmap; +} + +static void dmem_bitmap_free(unsigned long pages, + unsigned long *bitmap, + unsigned long *static_bitmap) +{ + unsigned long size; + + size = BITS_TO_LONGS(pages) * sizeof(long); + if (size > PAGE_SIZE) + vfree(bitmap); + else if (bitmap != static_bitmap) + kfree(bitmap); +} + static void __init dmem_region_uinit(struct dmem_region *dregion) { - unsigned long nr_pages, size, *bitmap = dregion->error_bitmap; + unsigned long nr_pages, *bitmap = dregion->error_bitmap; if (!bitmap) return; @@ -374,9 +404,7 @@ static void __init dmem_region_uinit(struct dmem_region *dregion) WARN_ON(!nr_pages); - size = BITS_TO_LONGS(nr_pages) * sizeof(long); - if (size > sizeof(dregion->static_bitmap)) - kfree(bitmap); + dmem_bitmap_free(nr_pages, bitmap, &dregion->static_error_bitmap); dregion->error_bitmap = NULL; } @@ -405,19 +433,15 @@ static void __init dmem_uinit(void) static int __init dmem_region_init(struct dmem_region *dregion) { - unsigned long *bitmap, size, nr_pages; + unsigned long *bitmap, nr_pages; nr_pages = __phys_to_pfn(dregion->reserved_end_addr) - __phys_to_pfn(dregion->reserved_start_addr); - size = BITS_TO_LONGS(nr_pages) * sizeof(long); - if (size <= sizeof(dregion->static_error_bitmap)) { - bitmap = &dregion->static_error_bitmap; - } else { - bitmap = kzalloc(size, GFP_KERNEL); - if (!bitmap) - return -ENOMEM; - } + bitmap = dmem_bitmap_alloc(nr_pages, &dregion->static_error_bitmap); + if (!bitmap) + return -ENOMEM; + dregion->error_bitmap = bitmap; return 0; } @@ -472,7 +496,7 @@ static int __init dmem_late_init(void) static int dmem_alloc_region_init(struct dmem_region *dregion, unsigned long *dpages) { - unsigned long start, end, *bitmap, size; + unsigned long start, end, *bitmap; start = DMEM_PAGE_UP(dregion->reserved_start_addr); end = DMEM_PAGE_DOWN(dregion->reserved_end_addr); @@ -481,14 +505,9 @@ static int dmem_alloc_region_init(struct dmem_region *dregion, if (!*dpages) return 0; - size = BITS_TO_LONGS(*dpages) * sizeof(long); - if (size <= sizeof(dregion->static_bitmap)) - bitmap = &dregion->static_bitmap; - else { - bitmap = kzalloc(size, GFP_KERNEL); - if (!bitmap) - return -ENOMEM; - } + bitmap = dmem_bitmap_alloc(*dpages, &dregion->static_bitmap); + if (!bitmap) + return -ENOMEM; dregion->bitmap = bitmap; dregion->next_free_pos = 0; @@ -582,7 +601,7 @@ static void dmem_uinit_check_alloc_bitmap(struct dmem_region *dregion) static void dmem_alloc_region_uinit(struct dmem_region *dregion) { - unsigned long dpages, size, *bitmap = dregion->bitmap; + unsigned long dpages, *bitmap = dregion->bitmap; if (!bitmap) return; @@ -592,9 +611,7 @@ static void dmem_alloc_region_uinit(struct dmem_region *dregion) dmem_uinit_check_alloc_bitmap(dregion); - size = BITS_TO_LONGS(dpages) * sizeof(long); - if (size > sizeof(dregion->static_bitmap)) - kfree(bitmap); + dmem_bitmap_free(dpages, bitmap, &dregion->static_bitmap); dregion->bitmap = NULL; } From patchwork Mon Dec 7 11:31:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955459 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49791C1B0D9 for ; Mon, 7 Dec 2020 11:35:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD01723340 for ; Mon, 7 Dec 2020 11:35:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD01723340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 64D608D001E; Mon, 7 Dec 2020 06:35:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FF8B8D0001; Mon, 7 Dec 2020 06:35:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 514E38D001E; Mon, 7 Dec 2020 06:35:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id 3BF068D0001 for ; Mon, 7 Dec 2020 06:35:27 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id ED72433CD for ; Mon, 7 Dec 2020 11:35:26 +0000 (UTC) X-FDA: 77566280652.17.van73_2001ba2273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id CE439180D0181 for ; Mon, 7 Dec 2020 11:35:26 +0000 (UTC) X-HE-Tag: van73_2001ba2273de X-Filterd-Recvd-Size: 10468 Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:26 +0000 (UTC) Received: by mail-pj1-f65.google.com with SMTP id lb18so4650815pjb.5 for ; Mon, 07 Dec 2020 03:35:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RqR44fauEeBZpjBAC1kdf97KbTscq65KeJ9aEDRCpGI=; b=Rw3amSd3rBlJpXc7dX1ne8f178KphM/6QUoRTeExFUzM83n8/SHLhQ1jf5VnNMZZoq WuAjjkZMPHqpQpdQvXvFrk1Lrq3tYST8SBgJNrKuHm5OX6smgd6BEFs4OIMr7YERjyHE +l8qRaSkfTcRGuTg4cWZJ7rhUL/vn+7/MfHju29iuvNwlvUzvEEXuWed5lRO12cXlsAx XrCYPpBkWM0g4QcxIQmTZCq99AtRHUV0xUPFWj3Jr2NdsV9RrOanCNVSLbsuvivzchZf +SeAeVSMNE20dUY9LBSn+wbwDwf7toSjY/933D0g8CyRml2C5K4r/casmfAA+xjS1HuQ ICHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RqR44fauEeBZpjBAC1kdf97KbTscq65KeJ9aEDRCpGI=; b=X9wETdUhlxqNHmrmf1oPUuPQACQej+ic5rZHHLHiQ/j+ZPv7XaTAotj/0nSBwRp2Vv a3dYN7BWtOCjUXSXXMZ05xhyhvj2D1XHEV1lRyFx6iE8Z9Xp8KfPMU5KPa6spnC5g1WH QeXJGmYXP4zDo1XM75NMPNcbG8znUqCy3ArjAhtgZ3B/W2fVnunHkYtsb7zpznRWSW0o 723KRTfnBWTCnQjCZQWSgBqnZOfx93+4ICaDUaFikMjQk/hMTDSFvRBJ+UD+u0Fz48B2 MQ2zdKRchmY+7vILzGxo0ygpp5tZMqnislG8fEb3az0fqZCQOXWNzhln86NxNNSJEFDF c8Hw== X-Gm-Message-State: AOAM532ucD5doAFR/xXjB/VUeWDZ8LOEC7VqoyPxpLVWE/Qkyh166ymy 4nowLR6cIehW2VovyE8ueL0pTEiZCzk= X-Google-Smtp-Source: ABdhPJyk3zQTpnyvz7bT7T3Uf0mnoGZKCzyksipWfZnA8cr+UkGx6yntGVpICOae5nxRYw96Y20SUg== X-Received: by 2002:a17:90b:11d5:: with SMTP id gv21mr930902pjb.12.1607340925289; Mon, 07 Dec 2020 03:35:25 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.22 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:24 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Haiwei Li Subject: [RFC V2 31/37] dmem: introduce mce handler Date: Mon, 7 Dec 2020 19:31:24 +0800 Message-Id: <6a5471107b81ee999f776547f2fccb045967701e.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang dmem handle the mce if the pfn belongs to dmem when mce occurs. 1. check whether the pfn is handled by dmem. return if true. 2. mark the pfn in a new error bitmap defined in page. 3. a series of mechanism to ensure that the mce pfn is not allocated. Signed-off-by: Haiwei Li Signed-off-by: Yulei Zhang --- include/linux/dmem.h | 6 +++ include/trace/events/dmem.h | 17 ++++++++ mm/dmem.c | 103 +++++++++++++++++++++++++++++++------------- mm/memory-failure.c | 6 +++ 4 files changed, 102 insertions(+), 30 deletions(-) diff --git a/include/linux/dmem.h b/include/linux/dmem.h index 59d3ef14..cd17a91 100644 --- a/include/linux/dmem.h +++ b/include/linux/dmem.h @@ -21,6 +21,8 @@ void dmem_free_pages(phys_addr_t addr, unsigned int dpages_nr); bool is_dmem_pfn(unsigned long pfn); #define dmem_free_page(addr) dmem_free_pages(addr, 1) + +bool dmem_memory_failure(unsigned long pfn, int flags); #else static inline int dmem_reserve_init(void) { @@ -32,5 +34,9 @@ static inline bool is_dmem_pfn(unsigned long pfn) return 0; } +static inline bool dmem_memory_failure(unsigned long pfn, int flags) +{ + return false; +} #endif #endif /* _LINUX_DMEM_H */ diff --git a/include/trace/events/dmem.h b/include/trace/events/dmem.h index 10d1b90..f8eeb3c 100644 --- a/include/trace/events/dmem.h +++ b/include/trace/events/dmem.h @@ -62,6 +62,23 @@ TP_printk("addr %#lx dpages_nr %d", (unsigned long)__entry->addr, __entry->dpages_nr) ); + +TRACE_EVENT(dmem_memory_failure, + TP_PROTO(unsigned long pfn, bool used), + TP_ARGS(pfn, used), + + TP_STRUCT__entry( + __field(unsigned long, pfn) + __field(bool, used) + ), + + TP_fast_assign( + __entry->pfn = pfn; + __entry->used = used; + ), + + TP_printk("pfn=%#lx used=%d", __entry->pfn, __entry->used) +); #endif /* This part must be outside protection */ diff --git a/mm/dmem.c b/mm/dmem.c index 50cdff9..16438db 100644 --- a/mm/dmem.c +++ b/mm/dmem.c @@ -431,6 +431,41 @@ static void __init dmem_uinit(void) dmem_pool.registered_pages = 0; } +/* set or clear corresponding bit on allocation bitmap based on error bitmap */ +static unsigned long dregion_alloc_bitmap_set_clear(struct dmem_region *dregion, + bool set) +{ + unsigned long pos_pfn, pos_offset; + unsigned long valid_pages, mce_dpages = 0; + phys_addr_t dpage, reserved_start_pfn; + + reserved_start_pfn = __phys_to_pfn(dregion->reserved_start_addr); + + valid_pages = dpage_to_pfn(dregion->dpage_end_pfn) - reserved_start_pfn; + pos_offset = dpage_to_pfn(dregion->dpage_start_pfn) + - reserved_start_pfn; +try_set: + pos_pfn = find_next_bit(dregion->error_bitmap, valid_pages, pos_offset); + + if (pos_pfn >= valid_pages) + return mce_dpages; + mce_dpages++; + dpage = pfn_to_dpage(pos_pfn + reserved_start_pfn); + if (set) + WARN_ON(__test_and_set_bit(dpage - dregion->dpage_start_pfn, + dregion->bitmap)); + else + WARN_ON(!__test_and_clear_bit(dpage - dregion->dpage_start_pfn, + dregion->bitmap)); + pos_offset = dpage_to_pfn(dpage + 1) - reserved_start_pfn; + goto try_set; +} + +static unsigned long dmem_region_mark_mce_dpages(struct dmem_region *dregion) +{ + return dregion_alloc_bitmap_set_clear(dregion, true); +} + static int __init dmem_region_init(struct dmem_region *dregion) { unsigned long *bitmap, nr_pages; @@ -514,6 +549,8 @@ static int dmem_alloc_region_init(struct dmem_region *dregion, dregion->dpage_start_pfn = start; dregion->dpage_end_pfn = end; + *dpages -= dmem_region_mark_mce_dpages(dregion); + dmem_pool.unaligned_pages += __phys_to_pfn((dpage_to_phys(start) - dregion->reserved_start_addr)); dmem_pool.unaligned_pages += __phys_to_pfn(dregion->reserved_end_addr @@ -558,36 +595,6 @@ static bool dmem_dpage_is_error(struct dmem_region *dregion, phys_addr_t dpage) return err_num; } -/* set or clear corresponding bit on allocation bitmap based on error bitmap */ -static unsigned long dregion_alloc_bitmap_set_clear(struct dmem_region *dregion, - bool set) -{ - unsigned long pos_pfn, pos_offset; - unsigned long valid_pages, mce_dpages = 0; - phys_addr_t dpage, reserved_start_pfn; - - reserved_start_pfn = __phys_to_pfn(dregion->reserved_start_addr); - - valid_pages = dpage_to_pfn(dregion->dpage_end_pfn) - reserved_start_pfn; - pos_offset = dpage_to_pfn(dregion->dpage_start_pfn) - - reserved_start_pfn; -try_set: - pos_pfn = find_next_bit(dregion->error_bitmap, valid_pages, pos_offset); - - if (pos_pfn >= valid_pages) - return mce_dpages; - mce_dpages++; - dpage = pfn_to_dpage(pos_pfn + reserved_start_pfn); - if (set) - WARN_ON(__test_and_set_bit(dpage - dregion->dpage_start_pfn, - dregion->bitmap)); - else - WARN_ON(!__test_and_clear_bit(dpage - dregion->dpage_start_pfn, - dregion->bitmap)); - pos_offset = dpage_to_pfn(dpage + 1) - reserved_start_pfn; - goto try_set; -} - static void dmem_uinit_check_alloc_bitmap(struct dmem_region *dregion) { unsigned long dpages, size; @@ -989,6 +996,42 @@ void dmem_free_pages(phys_addr_t addr, unsigned int dpages_nr) } EXPORT_SYMBOL(dmem_free_pages); +bool dmem_memory_failure(unsigned long pfn, int flags) +{ + struct dmem_region *dregion; + struct dmem_node *pdnode = NULL; + u64 pos; + phys_addr_t addr = __pfn_to_phys(pfn); + bool used = false; + + dregion = find_dmem_region(addr, &pdnode); + if (!dregion) + return false; + + WARN_ON(!pdnode || !dregion->error_bitmap); + + mutex_lock(&dmem_pool.lock); + pos = pfn - __phys_to_pfn(dregion->reserved_start_addr); + if (__test_and_set_bit(pos, dregion->error_bitmap)) + goto out; + + if (!dregion->bitmap || pfn < dpage_to_pfn(dregion->dpage_start_pfn) || + pfn >= dpage_to_pfn(dregion->dpage_end_pfn)) + goto out; + + pos = phys_to_dpage(addr) - dregion->dpage_start_pfn; + if (__test_and_set_bit(pos, dregion->bitmap)) { + used = true; + } else { + pr_info("MCE: free dpage, mark %#lx disabled in dmem\n", pfn); + dnode_count_free_dpages(pdnode, -1); + } +out: + trace_dmem_memory_failure(pfn, used); + mutex_unlock(&dmem_pool.lock); + return true; +} + bool is_dmem_pfn(unsigned long pfn) { struct dmem_node *dnode; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5d880d4..dda45d2 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -35,6 +35,7 @@ */ #include #include +#include #include #include #include @@ -1323,6 +1324,11 @@ int memory_failure(unsigned long pfn, int flags) if (!sysctl_memory_failure_recovery) panic("Memory failure on page %lx", pfn); + if (dmem_memory_failure(pfn, flags)) { + pr_info("MCE %#lx: handled by dmem\n", pfn); + return 0; + } + p = pfn_to_online_page(pfn); if (!p) { if (pfn_valid(pfn)) { From patchwork Mon Dec 7 11:31:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955471 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C51EFC1B0E3 for ; Mon, 7 Dec 2020 11:35:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4E85923340 for ; Mon, 7 Dec 2020 11:35:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4E85923340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D7A6F8D001F; Mon, 7 Dec 2020 06:35:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D2C3B8D0001; Mon, 7 Dec 2020 06:35:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C19528D001F; Mon, 7 Dec 2020 06:35:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id AB5D28D0001 for ; Mon, 7 Dec 2020 06:35:30 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7114F8249980 for ; Mon, 7 Dec 2020 11:35:30 +0000 (UTC) X-FDA: 77566280820.28.ducks97_63022aa273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 507116D64 for ; Mon, 7 Dec 2020 11:35:30 +0000 (UTC) X-HE-Tag: ducks97_63022aa273de X-Filterd-Recvd-Size: 16361 Received: from mail-pj1-f66.google.com (mail-pj1-f66.google.com [209.85.216.66]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:29 +0000 (UTC) Received: by mail-pj1-f66.google.com with SMTP id m5so7275255pjv.5 for ; Mon, 07 Dec 2020 03:35:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nVWSYdYM+m3VgpFdUaNYp2nPK15v+oLFac3l4i+Xkck=; b=s62bw1Ds5PE8L4XzidHHxWO7dowi92tTp2jvyxNTtrK0syogpxYuEVLqaYD0RVv631 owhJ9XHEiEAUC9LLWjMTm9Pe+rm9eGDdeJHWBiBJ6IybF16oZrZVOSyni6mDTkHbJNtj lV1WDgxMDJ4vPzpOm/VOel/dhx+0n4F5S7mRxVlZDCS15ym9YM5QeoViAugifkeeSQ6P OfHGghB4s0mBvhs1ZJML1PDJYbX13uJtxvR79+agbDqhEkF5cdyciPfPgh9PVVRxkp45 i3OyRnwn9i+vu3ayrr5iEDIPr3IawOVrT3n6PBt+AqhUVLQRTIcKvdEj11xd5/VzMArO ygQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nVWSYdYM+m3VgpFdUaNYp2nPK15v+oLFac3l4i+Xkck=; b=TbPAIUOvX0ONT6/Ds2P0mdoYCJ2f/5qA1ReO/IsY42stQ1Ep7nkXX8e9ePCaa+oCZN Y46FfCOAQuzdv1LT0w37vtTBxrtaYXSn0vpFrIMyQ8QK64jdXqMQlptoYKF164O0ep+5 4WGf9LFvjVERlvoB85EjK7vuohLSh2qyvLvNtygZD0MKvBQ/csVFzMRDiv5CR6bnxtkL /cf4pExkJaKAvxXdV4nEZ/7/LG06IZYLjDUbhz7nZ4+FaDqllQsmF53GIlAi6Hnvsc1A doHCtV7KxmH53n4g+flPno8gwJSsAuqqcpHTRCXFrPLrjTMiBtQMXazNRyUMMIeVIxOl aMaQ== X-Gm-Message-State: AOAM532Edd7I6T7RQYSQwumQRwUjPVp5mFIXmt8+fXVY73ZpH/oMLd4e Cyoa3AQqpvEpeyNtjAoS5AB5ciE76Jw= X-Google-Smtp-Source: ABdhPJxbhoVS4O5do45wTEXma6Md3j6oREnROj3xA+nTQrYjV+3OjcYJRxFen4GD2PNOYOwIlCI0hQ== X-Received: by 2002:a17:90a:e005:: with SMTP id u5mr15885574pjy.64.1607340928636; Mon, 07 Dec 2020 03:35:28 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.25 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:28 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Haiwei Li Subject: [RFC V2 32/37] mm, dmemfs: register and handle the dmem mce Date: Mon, 7 Dec 2020 19:31:25 +0800 Message-Id: <2c95c5ed91e84229a234d243b8660e1b9cab8bbd.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang dmemfs register the mce handler, send signal to the procs whose vma is mapped in mce pfn. Signed-off-by: Haiwei Li Signed-off-by: Yulei Zhang --- fs/dmemfs/inode.c | 141 +++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/dmem.h | 7 +++ include/linux/mm.h | 2 + mm/dmem.c | 34 +++++++++++++ mm/memory-failure.c | 64 ++++++++++++++++------- 5 files changed, 231 insertions(+), 17 deletions(-) diff --git a/fs/dmemfs/inode.c b/fs/dmemfs/inode.c index f698b9d..4303bcdc 100644 --- a/fs/dmemfs/inode.c +++ b/fs/dmemfs/inode.c @@ -36,6 +36,47 @@ static uint __read_mostly max_alloc_try_dpages = 1; +struct dmemfs_inode { + struct inode *inode; + struct list_head link; +}; + +static LIST_HEAD(dmemfs_inode_list); +static DEFINE_SPINLOCK(dmemfs_inode_lock); + +static struct dmemfs_inode * +dmemfs_create_dmemfs_inode(struct inode *inode) +{ + struct dmemfs_inode *dmemfs_inode; + + spin_lock(&dmemfs_inode_lock); + dmemfs_inode = kmalloc(sizeof(struct dmemfs_inode), GFP_NOIO); + if (!dmemfs_inode) { + pr_err("DMEMFS: Out of memory while getting dmemfs inode\n"); + goto out; + } + dmemfs_inode->inode = inode; + list_add_tail(&dmemfs_inode->link, &dmemfs_inode_list); +out: + spin_unlock(&dmemfs_inode_lock); + return dmemfs_inode; +} + +static void dmemfs_delete_dmemfs_inode(struct inode *inode) +{ + struct dmemfs_inode *i, *next; + + spin_lock(&dmemfs_inode_lock); + list_for_each_entry_safe(i, next, &dmemfs_inode_list, link) { + if (i->inode == inode) { + list_del(&i->link); + kfree(i); + break; + } + } + spin_unlock(&dmemfs_inode_lock); +} + struct dmemfs_mount_opts { unsigned long dpage_size; }; @@ -218,6 +259,13 @@ static unsigned long dmem_pgoff_to_index(struct inode *inode, pgoff_t pgoff) return pgoff >> (sb->s_blocksize_bits - PAGE_SHIFT); } +static pgoff_t dmem_index_to_pgoff(struct inode *inode, unsigned long index) +{ + struct super_block *sb = inode->i_sb; + + return index << (sb->s_blocksize_bits - PAGE_SHIFT); +} + static void *dmem_addr_to_entry(struct inode *inode, phys_addr_t addr) { struct super_block *sb = inode->i_sb; @@ -806,6 +854,23 @@ static void dmemfs_evict_inode(struct inode *inode) clear_inode(inode); } +static struct inode *dmemfs_alloc_inode(struct super_block *sb) +{ + struct inode *inode; + + inode = alloc_inode_nonrcu(); + if (inode) + dmemfs_create_dmemfs_inode(inode); + return inode; +} + +static void dmemfs_destroy_inode(struct inode *inode) +{ + if (inode) + dmemfs_delete_dmemfs_inode(inode); + free_inode_nonrcu(inode); +} + /* * Display the mount options in /proc/mounts. */ @@ -819,9 +884,11 @@ static int dmemfs_show_options(struct seq_file *m, struct dentry *root) } static const struct super_operations dmemfs_ops = { + .alloc_inode = dmemfs_alloc_inode, .statfs = dmemfs_statfs, .evict_inode = dmemfs_evict_inode, .drop_inode = generic_delete_inode, + .destroy_inode = dmemfs_destroy_inode, .show_options = dmemfs_show_options, }; @@ -901,17 +968,91 @@ static void dmemfs_kill_sb(struct super_block *sb) .kill_sb = dmemfs_kill_sb, }; +static struct inode * +dmemfs_find_inode_by_addr(phys_addr_t addr, pgoff_t *pgoff) +{ + struct dmemfs_inode *di; + struct inode *inode; + struct address_space *mapping; + void *entry, **slot; + void *mce_entry; + + list_for_each_entry(di, &dmemfs_inode_list, link) { + inode = di->inode; + mapping = inode->i_mapping; + mce_entry = dmem_addr_to_entry(inode, addr); + XA_STATE(xas, &mapping->i_pages, 0); + rcu_read_lock(); + + xas_for_each(&xas, entry, ULONG_MAX) { + if (xas_retry(&xas, entry)) + continue; + + if (unlikely(entry != xas_reload(&xas))) + goto retry; + + if (mce_entry != entry) + continue; + *pgoff = dmem_index_to_pgoff(inode, xas.xa_index); + rcu_read_unlock(); + return inode; +retry: + xas_reset(&xas); + } + rcu_read_unlock(); + } + return NULL; +} + +static int dmemfs_mce_handler(struct notifier_block *this, unsigned long pfn, + void *v) +{ + struct dmem_mce_notifier_info *info = + (struct dmem_mce_notifier_info *)v; + int flags = info->flags; + struct inode *inode; + phys_addr_t mce_addr = __pfn_to_phys(pfn); + pgoff_t pgoff; + + spin_lock(&dmemfs_inode_lock); + inode = dmemfs_find_inode_by_addr(mce_addr, &pgoff); + if (!inode || !atomic_read(&inode->i_count)) + goto out; + + collect_procs_and_signal_inode(inode, pgoff, pfn, flags); +out: + spin_unlock(&dmemfs_inode_lock); + return 0; +} + +static struct notifier_block dmemfs_mce_notifier = { + .notifier_call = dmemfs_mce_handler, +}; + static int __init dmemfs_init(void) { int ret; + pr_info("dmemfs initialized\n"); ret = register_filesystem(&dmemfs_fs_type); + if (ret) + goto reg_fs_fail; + + ret = dmem_register_mce_notifier(&dmemfs_mce_notifier); + if (ret) + goto reg_notifier_fail; + return 0; + +reg_notifier_fail: + unregister_filesystem(&dmemfs_fs_type); +reg_fs_fail: return ret; } static void __exit dmemfs_uninit(void) { + dmem_unregister_mce_notifier(&dmemfs_mce_notifier); unregister_filesystem(&dmemfs_fs_type); } diff --git a/include/linux/dmem.h b/include/linux/dmem.h index cd17a91..fe0b270 100644 --- a/include/linux/dmem.h +++ b/include/linux/dmem.h @@ -23,6 +23,13 @@ #define dmem_free_page(addr) dmem_free_pages(addr, 1) bool dmem_memory_failure(unsigned long pfn, int flags); + +struct dmem_mce_notifier_info { + int flags; +}; + +int dmem_register_mce_notifier(struct notifier_block *nb); +int dmem_unregister_mce_notifier(struct notifier_block *nb); #else static inline int dmem_reserve_init(void) { diff --git a/include/linux/mm.h b/include/linux/mm.h index 2f3135fe..fa20f9c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3041,6 +3041,8 @@ enum mf_flags { extern void memory_failure_queue(unsigned long pfn, int flags); extern void memory_failure_queue_kick(int cpu); extern int unpoison_memory(unsigned long pfn); +extern void collect_procs_and_signal_inode(struct inode *inode, pgoff_t pgoff, + unsigned long pfn, int flags); extern int sysctl_memory_failure_early_kill; extern int sysctl_memory_failure_recovery; extern void shake_page(struct page *p, int access); diff --git a/mm/dmem.c b/mm/dmem.c index 16438db..dd81b24 100644 --- a/mm/dmem.c +++ b/mm/dmem.c @@ -70,6 +70,7 @@ struct dmem_node { struct dmem_pool { struct mutex lock; + struct raw_notifier_head mce_notifier_chain; unsigned long region_num; unsigned long registered_pages; @@ -92,6 +93,7 @@ struct dmem_pool { static struct dmem_pool dmem_pool = { .lock = __MUTEX_INITIALIZER(dmem_pool.lock), + .mce_notifier_chain = RAW_NOTIFIER_INIT(dmem_pool.mce_notifier_chain), }; #define DMEM_PAGE_SIZE (1UL << dmem_pool.dpage_shift) @@ -121,6 +123,35 @@ struct dmem_pool { #define for_each_dmem_region(_dnode, _dregion) \ list_for_each_entry(_dregion, &(_dnode)->regions, node) +int dmem_register_mce_notifier(struct notifier_block *nb) +{ + int ret; + + mutex_lock(&dmem_pool.lock); + ret = raw_notifier_chain_register(&dmem_pool.mce_notifier_chain, nb); + mutex_unlock(&dmem_pool.lock); + return ret; +} +EXPORT_SYMBOL(dmem_register_mce_notifier); + +int dmem_unregister_mce_notifier(struct notifier_block *nb) +{ + int ret; + + mutex_lock(&dmem_pool.lock); + ret = raw_notifier_chain_unregister(&dmem_pool.mce_notifier_chain, nb); + mutex_unlock(&dmem_pool.lock); + return ret; +} +EXPORT_SYMBOL(dmem_unregister_mce_notifier); + +static int dmem_mce_notify(unsigned long pfn, + struct dmem_mce_notifier_info *info) +{ + return raw_notifier_call_chain(&dmem_pool.mce_notifier_chain, + pfn, info); +} + static inline int *dmem_nodelist(int nid) { return nid_to_dnode(nid)->nodelist; @@ -1003,6 +1034,7 @@ bool dmem_memory_failure(unsigned long pfn, int flags) u64 pos; phys_addr_t addr = __pfn_to_phys(pfn); bool used = false; + struct dmem_mce_notifier_info info; dregion = find_dmem_region(addr, &pdnode); if (!dregion) @@ -1022,6 +1054,8 @@ bool dmem_memory_failure(unsigned long pfn, int flags) pos = phys_to_dpage(addr) - dregion->dpage_start_pfn; if (__test_and_set_bit(pos, dregion->bitmap)) { used = true; + info.flags = flags; + dmem_mce_notify(pfn, &info); } else { pr_info("MCE: free dpage, mark %#lx disabled in dmem\n", pfn); dnode_count_free_dpages(pdnode, -1); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index dda45d2..3aa7fe7 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -334,8 +334,8 @@ static unsigned long dev_pagemap_mapping_shift(struct page *page, * Uses GFP_ATOMIC allocations to avoid potential recursions in the VM. */ static void add_to_kill(struct task_struct *tsk, struct page *p, - struct vm_area_struct *vma, - struct list_head *to_kill) + struct vm_area_struct *vma, unsigned long pfn, + pgoff_t pgoff, struct list_head *to_kill) { struct to_kill *tk; @@ -345,12 +345,17 @@ static void add_to_kill(struct task_struct *tsk, struct page *p, return; } - tk->addr = page_address_in_vma(p, vma); - if (is_zone_device_page(p)) - tk->size_shift = dev_pagemap_mapping_shift(p, vma); - else - tk->size_shift = page_shift(compound_head(p)); - + if (p) { + tk->addr = page_address_in_vma(p, vma); + if (is_zone_device_page(p)) + tk->size_shift = dev_pagemap_mapping_shift(p, vma); + else + tk->size_shift = page_shift(compound_head(p)); + } else { + tk->size_shift = PAGE_SHIFT; + tk->addr = vma->vm_start + + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + } /* * Send SIGKILL if "tk->addr == -EFAULT". Also, as * "tk->size_shift" is always non-zero for !is_zone_device_page(), @@ -363,7 +368,7 @@ static void add_to_kill(struct task_struct *tsk, struct page *p, */ if (tk->addr == -EFAULT) { pr_info("Memory failure: Unable to find user space address %lx in %s\n", - page_to_pfn(p), tsk->comm); + pfn, tsk->comm); } else if (tk->size_shift == 0) { kfree(tk); return; @@ -496,7 +501,8 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill, if (!page_mapped_in_vma(page, vma)) continue; if (vma->vm_mm == t->mm) - add_to_kill(t, page, vma, to_kill); + add_to_kill(t, page, vma, page_to_pfn(page), + page_to_pgoff(page), to_kill); } } read_unlock(&tasklist_lock); @@ -504,19 +510,17 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill, } /* - * Collect processes when the error hit a file mapped page. + * Collect processes when the error hit a file mapped memory. */ -static void collect_procs_file(struct page *page, struct list_head *to_kill, - int force_early) +static void __collect_procs_file(struct address_space *mapping, pgoff_t pgoff, + struct page *page, unsigned long pfn, + struct list_head *to_kill, int force_early) { struct vm_area_struct *vma; struct task_struct *tsk; - struct address_space *mapping = page->mapping; - pgoff_t pgoff; i_mmap_lock_read(mapping); read_lock(&tasklist_lock); - pgoff = page_to_pgoff(page); for_each_process(tsk) { struct task_struct *t = task_early_kill(tsk, force_early); @@ -532,7 +536,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, * to be informed of all such data corruptions. */ if (vma->vm_mm == t->mm) - add_to_kill(t, page, vma, to_kill); + add_to_kill(t, page, vma, pfn, pgoff, to_kill); } } read_unlock(&tasklist_lock); @@ -540,6 +544,32 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, } /* + * Collect processes when the error hit a file mapped page. + */ +static void collect_procs_file(struct page *page, struct list_head *to_kill, + int force_early) +{ + struct address_space *mapping = page->mapping; + + __collect_procs_file(mapping, page_to_pgoff(page), page, + page_to_pfn(page), to_kill, force_early); +} + +void collect_procs_and_signal_inode(struct inode *inode, pgoff_t pgoff, + unsigned long pfn, int flags) +{ + int forcekill; + struct address_space *mapping = &inode->i_data; + LIST_HEAD(tokill); + + __collect_procs_file(mapping, pgoff, NULL, pfn, &tokill, + flags & MF_ACTION_REQUIRED); + forcekill = flags & MF_MUST_KILL; + kill_procs(&tokill, forcekill, false, pfn, flags); +} +EXPORT_SYMBOL(collect_procs_and_signal_inode); + +/* * Collect the processes who have the corrupted page mapped to kill. */ static void collect_procs(struct page *page, struct list_head *tokill, From patchwork Mon Dec 7 11:31:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955465 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A812C4167B for ; Mon, 7 Dec 2020 11:35:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 155E6233A0 for ; Mon, 7 Dec 2020 11:35:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 155E6233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9D1178D0020; Mon, 7 Dec 2020 06:35:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 984608D0001; Mon, 7 Dec 2020 06:35:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82CB58D0020; Mon, 7 Dec 2020 06:35:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0075.hostedemail.com [216.40.44.75]) by kanga.kvack.org (Postfix) with ESMTP id 6613E8D0001 for ; Mon, 7 Dec 2020 06:35:33 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3654833CD for ; Mon, 7 Dec 2020 11:35:33 +0000 (UTC) X-FDA: 77566280946.24.geese32_230ee3f273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 223F21A4A7 for ; Mon, 7 Dec 2020 11:35:33 +0000 (UTC) X-HE-Tag: geese32_230ee3f273de X-Filterd-Recvd-Size: 4684 Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com [209.85.216.67]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:32 +0000 (UTC) Received: by mail-pj1-f67.google.com with SMTP id l23so7279470pjg.1 for ; Mon, 07 Dec 2020 03:35:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ve50LBxe7KPsa/afhSPdadx4mGbx0d5NNuSu5BfW+ws=; b=iGPfYHDT9qrLj7NFYNDa99CfU/LeTtf1xtrQ4M7/y/WnH1dn41NqjJOU3FG2cI2rI1 h0kfX9f8shUAZZMOJgGaejDmBdM+kopryJtCrKgU0WydjatlKrXuJdW2bF7peb3YE73w HuxE6iS0CuXe4q7GR4g4kSkSH6FnTepRTIMRjxotGxhIeEy3ukOJoI5oZIXRy7bWfhgi IvKePLSYBnWHUrmlhAUkfkx96TSLs4I/L6WgpPAkzxznqxqFcO+gGTdsC/FaANWeyvGZ qVOLpBLEzfoJSjxHnOcThPOkYd3adliwtBA5VZ8npQ1fRXKU/uPKqTcL3sx4i7jwqU37 jGTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ve50LBxe7KPsa/afhSPdadx4mGbx0d5NNuSu5BfW+ws=; b=ONMeTCuQ++mJ/5VnHPCYWNAl7KGJt7WNtBsPVcJ/dBKgzKDHsE2s/GmmJuuphKQL4i dI87c0Xm6yRP6IZ4BNNtnYM4mOY1WcyNsriDECTGGtg1HhW+WCsO/YB+UpAYjv8W18BD +59u5nmziUrlRuXSTlTS47EnEopASupH3Fxmx6SK+CflUfWsy7DbP3HtWUjxc1N7iJR3 hwn57EAGmQudbnP9pL1qaGXg37dijXsH4s4cx7/poiGG+MZhrEK7il+lKO29kXYHK7HQ mN2XWSJXoRsOrA5mtvzu86I4LySQZo9xqNV7Z9IVBi4mWrQJPpQlmpKzKAQNHsGAcweB uzKw== X-Gm-Message-State: AOAM532NT6eUA2cltTqHZlfK9Vxxx0tUOn57A64pzNg/4tNB0WcxZk/v n/rotSkv1/jsj8nwmRyHU152SxDiP0A= X-Google-Smtp-Source: ABdhPJw+yEq36n4uncdVGAdYzZ2qqW7UOAZi3abFVTKDELRoe1bqIX3q7doUgLIBaZmdlgELZCPS2A== X-Received: by 2002:a17:90b:1945:: with SMTP id nk5mr15957725pjb.30.1607340931753; Mon, 07 Dec 2020 03:35:31 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:31 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang Subject: [RFC V2 33/37] kvm, x86: enable record_steal_time for dmem Date: Mon, 7 Dec 2020 19:31:26 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Adjust the kvm_map_gfn while using dmemfs to enable record_steal_time when entering the guest. Signed-off-by: Yulei Zhang --- virt/kvm/kvm_main.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2541a17..500b170 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -2164,7 +2165,10 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn, hva = kmap(page); #ifdef CONFIG_HAS_IOMEM } else if (!atomic) { - hva = memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB); + if (is_dmem_pfn(pfn)) + hva = __va(PFN_PHYS(pfn)); + else + hva = memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB); } else { return -EINVAL; #endif @@ -2214,9 +2218,10 @@ static void __kvm_unmap_gfn(struct kvm_memory_slot *memslot, kunmap(map->page); } #ifdef CONFIG_HAS_IOMEM - else if (!atomic) - memunmap(map->hva); - else + else if (!atomic) { + if (!is_dmem_pfn(map->pfn)) + memunmap(map->hva); + } else WARN_ONCE(1, "Unexpected unmapping in atomic context"); #endif From patchwork Mon Dec 7 11:31:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955467 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B684C0018C for ; Mon, 7 Dec 2020 11:35:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B878923340 for ; Mon, 7 Dec 2020 11:35:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B878923340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 516FC8D0008; Mon, 7 Dec 2020 06:35:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C4BA8D0001; Mon, 7 Dec 2020 06:35:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B3988D0008; Mon, 7 Dec 2020 06:35:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0174.hostedemail.com [216.40.44.174]) by kanga.kvack.org (Postfix) with ESMTP id 250178D0001 for ; Mon, 7 Dec 2020 06:35:38 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D5C89180AD820 for ; Mon, 7 Dec 2020 11:35:37 +0000 (UTC) X-FDA: 77566281114.29.prose19_0a041b5273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id DE440180868E3 for ; Mon, 7 Dec 2020 11:35:36 +0000 (UTC) X-HE-Tag: prose19_0a041b5273de X-Filterd-Recvd-Size: 8843 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:36 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id w4so8628747pgg.13 for ; Mon, 07 Dec 2020 03:35:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2oI/oh3gTLcVKVrjU4mtMVnwaLDwsHWGDZd/m897JZk=; b=lhjKgDf+8F/uFxn5RBaOSzYX0LvT73tAN0Gg+Z8ChDCSn1g21h9K7z03ULsA5x3FO/ 2a+Eg0te0MGsaduIR3o9k0QLaF/mI8tGtephTk2VDC0HtpswcYPbfNlwIcireWiJ3lEi YBLTiakDlLO8KbJjbiP2fFHdnJI2rSybJZ1rp2A8mhn/yuJGj2RdiE8Y7GUN+76U1u3F grW+T/JmrPwOi4a+Ia/MdYqiJ3odiRzcVjQEiwxt6frfkjpO1sZ3bLJV1WGZr7av8hnV KBYDpv1nu/d9yh0K3QIlXoA278BPZLPtVPY9jfVMUeGQEixWS3nOF9qEgg52Qo3REX5+ jePQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2oI/oh3gTLcVKVrjU4mtMVnwaLDwsHWGDZd/m897JZk=; b=Ps3xRzhmHFenvjkQvns4V3B5+KsHa+39Qx1iVk8RWZOIdfU/rkK5W026GB5SrB/maF AlYbY7q9VuyVLq3Fv/PN/135NLW6dxE8tC4GKf6kCbYs6GqOGx95MSGtkyHhY/Vz0Tw6 WbaZDZcmrsVOfIEj292XOHCSqiNvmi0dY01OBJHtxhrEB5OYeX+I+iw4vJa9LJp5VYfe KdUMZ1mZpRTZDzxGPq3ZsgUFDW1hyp8ymV/R07oCv699TX52ECS6dQfVV/AL2FmZRyGV GfrEsWubN56Ivo/nO4XXkySazEqdgzIHt8kMW7PyuhtE9p1MT14PkWbu40aE+qbV1OON +32g== X-Gm-Message-State: AOAM531APqQlE8pjaFRyLFtMrGHdYQ8TuBo8yYAQhvpgRm5z4y4wkBvl ocw2USO9xeloQzs8BDGjBOB2ULLeBbs= X-Google-Smtp-Source: ABdhPJxk408hICw1VHxdPSSZFmVYiF7oejg12uAdXoM+nWbeUsNih67nBm2DaYDp/HkVlqXXpfd2hQ== X-Received: by 2002:a17:902:fe17:b029:da:799a:8bfd with SMTP id g23-20020a170902fe17b02900da799a8bfdmr15667660plj.10.1607340935397; Mon, 07 Dec 2020 03:35:35 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.32 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:34 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Xiao Guangrong Subject: [RFC V2 34/37] dmem: add dmem unit tests Date: Mon, 7 Dec 2020 19:31:27 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang This test case is used to test dmem management system. Signed-off-by: Xiao Guangrong Signed-off-by: Yulei Zhang --- tools/testing/dmem/Kbuild | 1 + tools/testing/dmem/Makefile | 10 +++ tools/testing/dmem/dmem-test.c | 184 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 195 insertions(+) create mode 100644 tools/testing/dmem/Kbuild create mode 100644 tools/testing/dmem/Makefile create mode 100644 tools/testing/dmem/dmem-test.c diff --git a/tools/testing/dmem/Kbuild b/tools/testing/dmem/Kbuild new file mode 100644 index 00000000..04988f7 --- /dev/null +++ b/tools/testing/dmem/Kbuild @@ -0,0 +1 @@ +obj-m += dmem-test.o diff --git a/tools/testing/dmem/Makefile b/tools/testing/dmem/Makefile new file mode 100644 index 00000000..21f141f --- /dev/null +++ b/tools/testing/dmem/Makefile @@ -0,0 +1,10 @@ +KDIR ?= ../../../ + +default: + $(MAKE) -C $(KDIR) M=$$PWD + +install: default + $(MAKE) -C $(KDIR) M=$$PWD modules_install + +clean: + rm -f *.o *.ko Module.* modules.* *.mod.c diff --git a/tools/testing/dmem/dmem-test.c b/tools/testing/dmem/dmem-test.c new file mode 100644 index 00000000..4baae18 --- /dev/null +++ b/tools/testing/dmem/dmem-test.c @@ -0,0 +1,184 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#include +#include +#include +#include +#include +#include +#include +#include + +struct dmem_mem_node { + struct list_head node; +}; + +static LIST_HEAD(dmem_list); + +static int dmem_test_alloc_init(unsigned long dpage_shift) +{ + int ret; + + ret = dmem_alloc_init(dpage_shift); + if (ret) + pr_info("dmem_alloc_init failed, dpage_shift %ld ret=%d\n", + dpage_shift, ret); + return ret; +} + +static int __dmem_test_alloc(int order, int nid, nodemask_t *nodemask, + const char *caller) +{ + struct dmem_mem_node *pos; + phys_addr_t addr; + int i, ret = 0; + + for (i = 0; i < (1 << order); i++) { + addr = dmem_alloc_pages_nodemask(nid, nodemask, 1, NULL); + if (!addr) { + ret = -ENOMEM; + break; + } + + pos = __va(addr); + list_add(&pos->node, &dmem_list); + } + + pr_info("%s: alloc order %d on node %d has fallback node %s... %s.\n", + caller, order, nid, nodemask ? "yes" : "no", + !ret ? "okay" : "failed"); + + return ret; +} + +static void dmem_test_free_all(void) +{ + struct dmem_mem_node *pos, *n; + + list_for_each_entry_safe(pos, n, &dmem_list, node) { + list_del(&pos->node); + dmem_free_page(__pa(pos)); + } +} + +#define dmem_test_alloc(order, nid, nodemask) \ + __dmem_test_alloc(order, nid, nodemask, __func__) + +/* dmem shoud have 2^6 native pages available at lest */ +static int order_test(void) +{ + int order, i, ret; + int page_orders[] = {0, 1, 2, 3, 4, 5, 6}; + + ret = dmem_test_alloc_init(PAGE_SHIFT); + if (ret) + return ret; + + for (i = 0; i < ARRAY_SIZE(page_orders); i++) { + order = page_orders[i]; + + ret = dmem_test_alloc(order, numa_node_id(), NULL); + if (ret) + break; + } + + dmem_test_free_all(); + + dmem_alloc_uinit(); + + return ret; +} + +static int node_test(void) +{ + nodemask_t nodemask; + unsigned long nr = 0; + int order; + int node; + int ret = 0; + + order = 0; + + ret = dmem_test_alloc_init(PUD_SHIFT); + if (ret) + return ret; + + pr_info("%s: test allocation on node 0\n", __func__); + node = 0; + nodes_clear(nodemask); + node_set(0, nodemask); + + ret = dmem_test_alloc(order, node, &nodemask); + if (ret) + goto exit; + + dmem_test_free_all(); + + pr_info("%s: begin to exhaust dmem on node 0.\n", __func__); + node = 1; + nodes_clear(nodemask); + node_set(0, nodemask); + + INIT_LIST_HEAD(&dmem_list); + while (!(ret = dmem_test_alloc(order, node, &nodemask))) + nr++; + + pr_info("Allocation on node 0 success times: %lu\n", nr); + + pr_info("%s: allocation on node 0 again\n", __func__); + node = 0; + nodes_clear(nodemask); + node_set(0, nodemask); + ret = dmem_test_alloc(order, node, &nodemask); + if (!ret) { + pr_info("\tNot expected fallback\n"); + ret = -1; + } else { + ret = 0; + pr_info("\tOK, Dmem on node 0 exhausted, fallback success\n"); + } + + pr_info("%s: Release dmem\n", __func__); + dmem_test_free_all(); + +exit: + dmem_alloc_uinit(); + return ret; +} + +static __init int dmem_test_init(void) +{ + int ret; + + pr_info("dmem: test init...\n"); + + ret = order_test(); + if (ret) + return ret; + + ret = node_test(); + + + if (ret) + pr_info("dmem test fail, ret=%d\n", ret); + else + pr_info("dmem test success\n"); + return ret; +} + +static __exit void dmem_test_exit(void) +{ + pr_info("dmem: test exit...\n"); +} + +module_init(dmem_test_init); +module_exit(dmem_test_exit); +MODULE_LICENSE("GPL v2"); From patchwork Mon Dec 7 11:31:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F624C1B0D9 for ; Mon, 7 Dec 2020 11:35:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2650523340 for ; Mon, 7 Dec 2020 11:35:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2650523340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A95918D000C; Mon, 7 Dec 2020 06:35:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A6C208D0001; Mon, 7 Dec 2020 06:35:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9327C8D000C; Mon, 7 Dec 2020 06:35:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0043.hostedemail.com [216.40.44.43]) by kanga.kvack.org (Postfix) with ESMTP id 7ED408D0001 for ; Mon, 7 Dec 2020 06:35:40 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 44F2D1EF1 for ; Mon, 7 Dec 2020 11:35:40 +0000 (UTC) X-FDA: 77566281240.17.level95_4615e50273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 25A2C180D0181 for ; Mon, 7 Dec 2020 11:35:40 +0000 (UTC) X-HE-Tag: level95_4615e50273de X-Filterd-Recvd-Size: 12063 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:39 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id c12so2532277pfo.10 for ; Mon, 07 Dec 2020 03:35:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZbMU89dV/uL3+8OuWgp1O7gQ5pASlyw5FoQtpms6guk=; b=hJ834cKhEjJVNZZ3t967QVJ2E8W9K6mc6jhd3ZkKT1XWsW3Git/iP8YVfRmQyt8sIr CjbTX37cTnlCLLuno8hPGkGDUmoZ1BtiEYJXX1b6z90vTqSR1KA8+8rId4N/7TnizniZ enCmfy4akOOpsgOGcbpB931EuqRfgpj+h5gaDLkFjyYbBZ9e/Dda80jgLj0M5Cqc6r1u hwxdv353KdyhkQxcMeQ8uHs9JbJAVgoln+7QovXo3A8RUP62nOLFJKHy2/oiSVfloI6E 22pFGJvSpZrxXWrD0eCtE1IJmN+S1qu2/O18aqpz1ZdL8sqSQQUdbZk/09SaW4dU8ASj 9dEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZbMU89dV/uL3+8OuWgp1O7gQ5pASlyw5FoQtpms6guk=; b=QOrinXEWo+CkCep20XyRXLYMVmCprV5sfTtaCnh9W6G1XPowSPiSpGIiGnYZg+RlmK F9+6xAg0y94+tdno+NxDIARhKKqUWuwoeVwBxeLPBN948RbfYt0uNB/8P/KdwD3NLdVz mkKf//RNoUlMLQ7eluqSp4Cx0g333E7I5DNceGb7aWlTyGsL7E9LUOOQBDl7bGknLfFz xALJdvtGq9wQL6/G+fLadN0neFtbewXR5J8znpuA5JcjQtIuXw59RW0PUJz5kyV7A7xF BOJIHxyEtNMneYHGf0itExIKVKa8lCUHiYDY5NcQj4Yi8Q+e7h8Nj7q6aM8OE13H0soH tmxA== X-Gm-Message-State: AOAM533CrZCsYwzCj9Wr38V9o1c3gfBP5mQ8p29mlyIqAY/I4wUUVlaK lQGODhhCWleKHqgc3nzgWpqnnVMpP28= X-Google-Smtp-Source: ABdhPJytxl0Aj3b5REF0I4awOHOqfBomUWPVwgLlHgKjMQw2BuQ5IeQCZdBfDO1sTV2IIFA1EZ86zQ== X-Received: by 2002:a63:4905:: with SMTP id w5mr17945642pga.124.1607340938617; Mon, 07 Dec 2020 03:35:38 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.35 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:38 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 35/37] mm, dmem: introduce dregion->memmap for dmem Date: Mon, 7 Dec 2020 19:31:28 +0800 Message-Id: X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Append 'memmap' into struct dmem_region, mapping each page of dmem with struct dmempage. Currently there is just one member '_refcount' in struct dmempage to reflect the number of all modules which occupied the dmem page. Modules which allocates the dmem page from dmempool will make first reference and set _refcount to 1. Modules which try to free the dmem page to dmempool will decrease 1 at _refcount and free it if _refcount is tested as zero after decrease. At each time module A passes dmem page to module B, module B should call get_dmem_pfn() to increase _refcount for dmem page before making use of it to avoid referencing a dmem page which is occasionally freeed by any other module in parallel. Vice versa after finishing usage of that dmem page need call put_dmem_pfn() to decrease the _refcount. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- include/linux/dmem.h | 5 ++ mm/dmem.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 139 insertions(+), 13 deletions(-) diff --git a/include/linux/dmem.h b/include/linux/dmem.h index fe0b270..8aaa80b 100644 --- a/include/linux/dmem.h +++ b/include/linux/dmem.h @@ -22,6 +22,9 @@ bool is_dmem_pfn(unsigned long pfn); #define dmem_free_page(addr) dmem_free_pages(addr, 1) +void get_dmem_pfn(unsigned long pfn); +#define put_dmem_pfn(pfn) dmem_free_page(PFN_PHYS(pfn)) + bool dmem_memory_failure(unsigned long pfn, int flags); struct dmem_mce_notifier_info { @@ -45,5 +48,7 @@ static inline bool dmem_memory_failure(unsigned long pfn, int flags) { return false; } +void get_dmem_pfn(unsigned long pfn) {} +void put_dmem_pfn(unsigned long pfn) {} #endif #endif /* _LINUX_DMEM_H */ diff --git a/mm/dmem.c b/mm/dmem.c index dd81b24..776dbf2 100644 --- a/mm/dmem.c +++ b/mm/dmem.c @@ -47,6 +47,7 @@ struct dmem_region { unsigned long static_error_bitmap; unsigned long *error_bitmap; + void *memmap; }; /* @@ -91,6 +92,10 @@ struct dmem_pool { struct dmem_node nodes[MAX_NUMNODES]; }; +struct dmempage { + atomic_t _refcount; +}; + static struct dmem_pool dmem_pool = { .lock = __MUTEX_INITIALIZER(dmem_pool.lock), .mce_notifier_chain = RAW_NOTIFIER_INIT(dmem_pool.mce_notifier_chain), @@ -123,6 +128,40 @@ struct dmem_pool { #define for_each_dmem_region(_dnode, _dregion) \ list_for_each_entry(_dregion, &(_dnode)->regions, node) +#define pfn_to_dmempage(_pfn, _dregion) \ + ((struct dmempage *)(_dregion)->memmap + \ + pfn_to_dpage(_pfn) - (_dregion)->dpage_start_pfn) + +#define dmempage_to_dpage(_dmempage, _dregion) \ + ((_dmempage) - (struct dmempage *)(_dregion)->memmap + \ + (_dregion)->dpage_start_pfn) + +static inline int dmempage_count(struct dmempage *dmempage) +{ + return atomic_read(&dmempage->_refcount); +} + +static inline void set_dmempage_count(struct dmempage *dmempage, int v) +{ + atomic_set(&dmempage->_refcount, v); +} + +static inline void dmempage_ref_inc(struct dmempage *dmempage) +{ + atomic_inc(&dmempage->_refcount); +} + +static inline int dmempage_ref_dec_and_test(struct dmempage *dmempage) +{ + return atomic_dec_and_test(&dmempage->_refcount); +} + +static inline int put_dmempage_testzero(struct dmempage *dmempage) +{ + VM_BUG_ON(dmempage_count(dmempage) == 0); + return dmempage_ref_dec_and_test(dmempage); +} + int dmem_register_mce_notifier(struct notifier_block *nb) { int ret; @@ -559,10 +598,25 @@ static int __init dmem_late_init(void) } late_initcall(dmem_late_init); +static void *dmem_memmap_alloc(unsigned long dpages) +{ + unsigned long size; + + size = dpages * sizeof(struct dmempage); + return vzalloc(size); +} + +static void dmem_memmap_free(void *memmap) +{ + if (memmap) + vfree(memmap); +} + static int dmem_alloc_region_init(struct dmem_region *dregion, unsigned long *dpages) { unsigned long start, end, *bitmap; + void *memmap; start = DMEM_PAGE_UP(dregion->reserved_start_addr); end = DMEM_PAGE_DOWN(dregion->reserved_end_addr); @@ -575,7 +629,14 @@ static int dmem_alloc_region_init(struct dmem_region *dregion, if (!bitmap) return -ENOMEM; + memmap = dmem_memmap_alloc(*dpages); + if (!memmap) { + dmem_bitmap_free(*dpages, bitmap, &dregion->static_bitmap); + return -ENOMEM; + } + dregion->bitmap = bitmap; + dregion->memmap = memmap; dregion->next_free_pos = 0; dregion->dpage_start_pfn = start; dregion->dpage_end_pfn = end; @@ -650,7 +711,9 @@ static void dmem_alloc_region_uinit(struct dmem_region *dregion) dmem_uinit_check_alloc_bitmap(dregion); dmem_bitmap_free(dpages, bitmap, &dregion->static_bitmap); + dmem_memmap_free(dregion->memmap); dregion->bitmap = NULL; + dregion->memmap = NULL; } static void __dmem_alloc_uinit(void) @@ -793,6 +856,16 @@ int dmem_alloc_init(unsigned long dpage_shift) return dpage_to_phys(dregion->dpage_start_pfn + pos); } +static void prep_new_dmempage(unsigned long phys, unsigned int nr, + struct dmem_region *dregion) +{ + struct dmempage *dmempage = pfn_to_dmempage(PHYS_PFN(phys), dregion); + unsigned int i; + + for (i = 0; i < nr; i++, dmempage++) + set_dmempage_count(dmempage, 1); +} + /* * allocate dmem pages from the nodelist * @@ -839,6 +912,7 @@ int dmem_alloc_init(unsigned long dpage_shift) if (addr) { dnode_count_free_dpages(dnode, -(long)(*result_nr)); + prep_new_dmempage(addr, *result_nr, dregion); break; } } @@ -993,6 +1067,41 @@ static struct dmem_region *find_dmem_region(phys_addr_t phys_addr, return NULL; } +static unsigned int free_dmempages_prepare(struct dmempage *dmempage, + unsigned int dpages_nr) +{ + unsigned int i, ret = 0; + + for (i = 0; i < dpages_nr; i++, dmempage++) + if (put_dmempage_testzero(dmempage)) + ret++; + + return ret; +} + +void __dmem_free_pages(struct dmempage *dmempage, + unsigned int dpages_nr, + struct dmem_region *dregion, + struct dmem_node *pdnode) +{ + phys_addr_t dpage = dmempage_to_dpage(dmempage, dregion); + u64 pos; + unsigned long err_dpages; + + trace_dmem_free_pages(dpage_to_phys(dpage), dpages_nr); + WARN_ON(!dmem_pool.dpage_shift); + + pos = dpage - dregion->dpage_start_pfn; + dregion->next_free_pos = min(dregion->next_free_pos, pos); + + /* it is not possible to span multiple regions */ + WARN_ON(dpage + dpages_nr - 1 >= dregion->dpage_end_pfn); + + err_dpages = dmem_alloc_bitmap_clear(dregion, dpage, dpages_nr); + + dnode_count_free_dpages(pdnode, dpages_nr - err_dpages); +} + /* * free dmem page to the dmem pool * @addr: the physical addree will be freed @@ -1002,27 +1111,26 @@ void dmem_free_pages(phys_addr_t addr, unsigned int dpages_nr) { struct dmem_region *dregion; struct dmem_node *pdnode = NULL; - phys_addr_t dpage = phys_to_dpage(addr); - u64 pos; - unsigned long err_dpages; + struct dmempage *dmempage; + unsigned int nr; mutex_lock(&dmem_pool.lock); - trace_dmem_free_pages(addr, dpages_nr); - WARN_ON(!dmem_pool.dpage_shift); - dregion = find_dmem_region(addr, &pdnode); WARN_ON(!dregion || !dregion->bitmap || !pdnode); - pos = dpage - dregion->dpage_start_pfn; - dregion->next_free_pos = min(dregion->next_free_pos, pos); - - /* it is not possible to span multiple regions */ - WARN_ON(dpage + dpages_nr - 1 >= dregion->dpage_end_pfn); + dmempage = pfn_to_dmempage(PHYS_PFN(addr), dregion); - err_dpages = dmem_alloc_bitmap_clear(dregion, dpage, dpages_nr); + nr = free_dmempages_prepare(dmempage, dpages_nr); + if (nr == dpages_nr) + __dmem_free_pages(dmempage, dpages_nr, dregion, pdnode); + else if (nr) + while (dpages_nr--, dmempage++) { + if (dmempage_count(dmempage)) + continue; + __dmem_free_pages(dmempage, 1, dregion, pdnode); + } - dnode_count_free_dpages(pdnode, dpages_nr - err_dpages); mutex_unlock(&dmem_pool.lock); } EXPORT_SYMBOL(dmem_free_pages); @@ -1073,3 +1181,16 @@ bool is_dmem_pfn(unsigned long pfn) return !!find_dmem_region(__pfn_to_phys(pfn), &dnode); } EXPORT_SYMBOL(is_dmem_pfn); + +void get_dmem_pfn(unsigned long pfn) +{ + struct dmem_region *dregion = find_dmem_region(PFN_PHYS(pfn), NULL); + struct dmempage *dmempage; + + VM_BUG_ON(!dregion || !dregion->memmap); + + dmempage = pfn_to_dmempage(pfn, dregion); + VM_BUG_ON(dmempage_count(dmempage) + 127u <= 127u); + dmempage_ref_inc(dmempage); +} +EXPORT_SYMBOL(get_dmem_pfn); From patchwork Mon Dec 7 11:31:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955473 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2EDEC4167B for ; Mon, 7 Dec 2020 11:35:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7141D233A0 for ; Mon, 7 Dec 2020 11:35:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7141D233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 031C98D0011; Mon, 7 Dec 2020 06:35:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F24CF8D0001; Mon, 7 Dec 2020 06:35:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC97C8D0011; Mon, 7 Dec 2020 06:35:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0124.hostedemail.com [216.40.44.124]) by kanga.kvack.org (Postfix) with ESMTP id BDAA68D0001 for ; Mon, 7 Dec 2020 06:35:43 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 848901EF1 for ; Mon, 7 Dec 2020 11:35:43 +0000 (UTC) X-FDA: 77566281366.26.coal15_2100d1d273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 5539D1804B660 for ; Mon, 7 Dec 2020 11:35:43 +0000 (UTC) X-HE-Tag: coal15_2100d1d273de X-Filterd-Recvd-Size: 4660 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:42 +0000 (UTC) Received: by mail-pf1-f174.google.com with SMTP id t8so9610944pfg.8 for ; Mon, 07 Dec 2020 03:35:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=01Z5fklCLxTkAgFCaz6nAUEBM28bmnYK87yE7XDoghk=; b=Ow37ASksRvhh9CbdtCHIUcexMF92+y/fcxUo4cp1Hcu1OVAkm4VgeqQuRQn3wk6HYw HBIvMgI3NhxdwdBCKwt5Iq4O8LCN4Ax1wiG4DsCvRan0q4fRQKL8bzXsSjED1ZZnfPtW i2fYicNCYucP2yFhvKqZO390HegOf8okFuF473TH47NWnU5wevHxKx8xWUmYQMv1OtGt 0lIXeFTqGZWaY2z+qKMSdfJnMGbGjjHGIm7bp+KVSSw1iX8dSHyykquckNwiMQ3vazJb w8osDFx9fg+nq686wIqW3v7PfEnSUS4MCwgVJ3Vx8c8qTcTONTPefKk36TylgEAY+cPI pfmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=01Z5fklCLxTkAgFCaz6nAUEBM28bmnYK87yE7XDoghk=; b=Dnc/yPz2jXGSISwWZDmLH9mt85R0AHHC1IewwTOkxOPhLT4ih2NpbU0AYbq09i1b69 Z9xX3Rrg7177/lUtsOpAFZ7YSotlw68YDNh+JrI+vLWVcVD2xHkBBWLQB5uv7Xl+i5MH 9HiueDxQ7vhDOffpwT+Gw23XfwRaatx2eymyOy128UPRjmllkKkrpHBS88ye0X52DG33 anIjU+7DTkFq5YzgAT97s/qGcuiOsg7IohHdEVG5WKWQ1HVtcPuu8hywxpkeNM/262vq tfly9qsnFFg8B3C3STQisXctCRZssVg89C6EFGTb/2kUqbGGwI6ZNjY5fN5NM6uXfRle I8iQ== X-Gm-Message-State: AOAM531Mmz9/pFwsXf2rbYMae4EZ37k5j6qenD9zIICSGWa8t2LgWZL2 jzVVwTg+DpyElj6GzvGQadzucruOTfo= X-Google-Smtp-Source: ABdhPJyc0QahNAcq2tXM9yV1P4tXgagCj+zgpJi+Q0nk6M3KkIQ/sHcL91OhO1SYZDhkmBDra3QOpQ== X-Received: by 2002:a63:a551:: with SMTP id r17mr3001298pgu.13.1607340942035; Mon, 07 Dec 2020 03:35:42 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:41 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Chen Zhuo Subject: [RFC V2 36/37] vfio: support dmempage refcount for vfio Date: Mon, 7 Dec 2020 19:31:29 +0800 Message-Id: <0e5dd1479a55d8af7adfe44390f8e45186295dce.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Add get/put_dmem_pfn(), each time when vfio module reference/release dmempages. Signed-off-by: Chen Zhuo Signed-off-by: Yulei Zhang --- drivers/vfio/vfio_iommu_type1.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index c465d1a..4856a89 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -39,6 +39,7 @@ #include #include #include +#include #define DRIVER_VERSION "0.2" #define DRIVER_AUTHOR "Alex Williamson " @@ -411,7 +412,10 @@ static int put_pfn(unsigned long pfn, int prot) unpin_user_pages_dirty_lock(&page, 1, prot & IOMMU_WRITE); return 1; - } + } else if (is_dmem_pfn(pfn)) + put_dmem_pfn(pfn); + + /* Dmem page is not counted against user. */ return 0; } @@ -477,6 +481,9 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, if (!ret && !is_invalid_reserved_pfn(*pfn)) ret = -EFAULT; + + if (!ret && is_dmem_pfn(*pfn)) + get_dmem_pfn(*pfn); } done: mmap_read_unlock(mm); From patchwork Mon Dec 7 11:31:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yulei zhang X-Patchwork-Id: 11955477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8F8AC433FE for ; Mon, 7 Dec 2020 11:35:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6A8DA233A0 for ; Mon, 7 Dec 2020 11:35:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A8DA233A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F03EF8D0012; Mon, 7 Dec 2020 06:35:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E8C8F8D0001; Mon, 7 Dec 2020 06:35:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7A3F8D0012; Mon, 7 Dec 2020 06:35:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0135.hostedemail.com [216.40.44.135]) by kanga.kvack.org (Postfix) with ESMTP id C2C6B8D0001 for ; Mon, 7 Dec 2020 06:35:46 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 958BE8249980 for ; Mon, 7 Dec 2020 11:35:46 +0000 (UTC) X-FDA: 77566281492.05.anger95_5900a7c273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 7C78E18014A32 for ; Mon, 7 Dec 2020 11:35:46 +0000 (UTC) X-HE-Tag: anger95_5900a7c273de X-Filterd-Recvd-Size: 6108 Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:46 +0000 (UTC) Received: by mail-pg1-f196.google.com with SMTP id e2so317828pgi.5 for ; Mon, 07 Dec 2020 03:35:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8dMrBUV0z9hLgybDV1qAgRrexcK6XcuJtlWOIrJYhH0=; b=QSXG0JhfnoAGvVBFx63sba2FOkjDtHFQBx/E4eY2XASWrdpLT0Ew5Lxo9DejGl9h++ 5V2/EM/2abVeCbQe6r6uVAcXu38nGBRW5kLnKtZtrVY4TlxWPSKHSY/zlUjObVbTs+wL x9eXcsTw9I5f8s7ws9VtyPX5JLLnZOlJuXOxZXXyCPUjoqPY9XoS104KOPsvqi2TP9Qr dlnqXrw19GS6AKRnSVxl8hUOIkMKM9yyU4zUfq/dJvLXMP8hln1Ay82xzQkUBHKSwNDx ejDThi/5DYZ0I6J/39QFbRF3Y/8kvDD3Eiy1+tKCQcQLGzMn2NiZ6exaemi55aBwIr5D oy1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8dMrBUV0z9hLgybDV1qAgRrexcK6XcuJtlWOIrJYhH0=; b=k5koD0QHr1KtfB0PP345BQrU3St8paqBmcMbJ8jwevgNsUKAowy1PxvhzfzZK2ecB4 hUXhjfDOrK9LJK3EHaq1x6Y6S5asZ9GJsIMovPvAbqJKCWEFWwzvh+eBIQ1XfQwAT5AL 5Lb0RIyNcgFoe25LN7i+bbHlhagaohL4w7mSGXpNfTSHTXJQwrIiQA3KgzSTSpmcHvL8 dCv9GAXwVxTtQ2gYIjVtc52b+zpysHS8XQx4dWQh0tT2CXtMx99oDDJDlYnPyniw4ouZ z/OjrwpgJsacQX15SE8zo0AZugBDt9RXxPaSMjiu7diV9Iq296tQ2hNwLHEGDi1hFub0 HAhw== X-Gm-Message-State: AOAM532Qxx5w3ZvinUWWgmQwY0mZK8Aem3CHBp+UxEsbG9m65f130lnQ +b7GBSvDqommAvzmh1HkPqIslsxVNp4= X-Google-Smtp-Source: ABdhPJxr+/fznfTo2BDgJytsbTQDDtDhOhsgkc3XiXicQdpt2zupWpmvamPhl7LZ72PAGPtdf/AUSg== X-Received: by 2002:a05:6a00:7c5:b029:19e:2965:7a6 with SMTP id n5-20020a056a0007c5b029019e296507a6mr2120711pfu.60.1607340945089; Mon, 07 Dec 2020 03:35:45 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.42 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:44 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang Subject: [RFC V2 37/37] Add documentation for dmemfs Date: Mon, 7 Dec 2020 19:31:30 +0800 Message-Id: <6a3a71f75dad1fa440677fc1bcdc170f178be1d8.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang Introduce dmemfs.rst to document the basic usage of dmemfs. Signed-off-by: Yulei Zhang --- Documentation/filesystems/dmemfs.rst | 58 ++++++++++++++++++++++++++++++++++++ Documentation/filesystems/index.rst | 1 + 2 files changed, 59 insertions(+) create mode 100644 Documentation/filesystems/dmemfs.rst diff --git a/Documentation/filesystems/dmemfs.rst b/Documentation/filesystems/dmemfs.rst new file mode 100644 index 00000000..f13ed0c --- /dev/null +++ b/Documentation/filesystems/dmemfs.rst @@ -0,0 +1,58 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +The Direct Memory Filesystem - DMEMFS +===================================== + + +.. Table of contents + + - Overview + - Compilation + - Usage + +Overview +======== + +Dmemfs (Direct Memory filesystem) is device memory or reserved +memory based filesystem. This kind of memory is special as it +is not managed by kernel and it is without 'struct page'. Therefore +it can save extra memory from the host system for various usage, +especially for guest virtual machines. + +It uses a kernel boot parameter ``dmem=`` to reserve the system +memory when the host system boots up, the details can be checked +in /Documentation/admin-guide/kernel-parameters.txt. + +Compilation +=========== + +The filesystem should be enabled by turning on the kernel configuration +options:: + + CONFIG_DMEM_FS - Direct Memory filesystem support + CONFIG_DMEM - Allow reservation of memory for dmem + + +Additionally, the following can be turned on to aid debugging:: + + CONFIG_DMEM_DEBUG_FS - Enable debug information for dmem + +Usage +======== + +Dmemfs supports mapping ``4K``, ``2M`` and ``1G`` size of pages to +the userspace, for example :: + + # mount -t dmemfs none -o pagesize=4K /mnt/ + +The it can create the backing storage with 4G size :: + + # truncate /mnt/dmemfs-uuid --size 4G + +To use as backing storage for virtual machine starts with qemu, just need +to specify the memory-backed-file in the qemu command line like this :: + + # -object memory-backend-file,id=ram-node0,mem-path=/mnt/dmemfs-uuid \ + share=yes,size=4G,host-nodes=0,policy=preferred -numa node,nodeid=0,memdev=ram-node0 + diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index 98f59a8..23e944b 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -120,3 +120,4 @@ Documentation for filesystem implementations. xfs-delayed-logging-design xfs-self-describing-metadata zonefs + dmemfs