From patchwork Wed Dec 11 09:29:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mircea CIRJALIU - MELIU X-Patchwork-Id: 11284561 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3A420139A for ; Wed, 11 Dec 2019 09:29:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D2EFB21556 for ; Wed, 11 Dec 2019 09:29:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=bitdefender.onmicrosoft.com header.i=@bitdefender.onmicrosoft.com header.b="byvkxwPe" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D2EFB21556 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bitdefender.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D439A6B3134; Wed, 11 Dec 2019 04:29:22 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CF45A6B3135; Wed, 11 Dec 2019 04:29:22 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBB996B3136; Wed, 11 Dec 2019 04:29:22 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103]) by kanga.kvack.org (Postfix) with ESMTP id 996F86B3134 for ; Wed, 11 Dec 2019 04:29:22 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 370B58249980 for ; Wed, 11 Dec 2019 09:29:22 +0000 (UTC) X-FDA: 76252337364.14.copy15_51aca04cd9e52 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,mcirjaliu@bitdefender.com,::linux-kernel@vger.kernel.org:jglisse@redhat.com:pbonzini@redhat.com:aarcange@redhat.com,RULES_HIT:30003:30029:30054:30056:30067:30070:30075:30090,0,RBL:40.107.7.127:@bitdefender.com:.lbl8.mailshell.net-62.2.84.32 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: copy15_51aca04cd9e52 X-Filterd-Recvd-Size: 25300 Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-eopbgr70127.outbound.protection.outlook.com [40.107.7.127]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Dec 2019 09:29:21 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XWVdr/J54/aCBVpVMUWucO2rDWty8LUXGprRgSIrResA8wajDMZfLM8kaXEOU/ADhrj5yHMJ4ggFHpU4LZ5KRdmr6OiQ+Zp2lVLkti2N/VGPRegeucunf3pbBG1wCSRCx45hRceAvk1zIziqn0b2QvWR+T75uoL3YeKvIh1o8WdY7B/IxS3u44HzcfuBRcfkSgeCV7fwu0umcJ+ND+D56rwvT7OrdUUdX+1FH2syxe7BFbCEeMH/OrPW1pBVcTK/uyPErMRq+zV16rj84n+BmY8bdl841Ku7Fz9bzYTcZ+gpFdla1/hF42Wunkj5jtOkRVVUTCmLWx4mpYCFw3XPmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OnAXBQTnoPKEZg9WPbTjY/ufZ7DWMSbKweyVdFzW00w=; b=RzgTrWIowr/Ph3jX8zzo/vojGzi8vmZBEBNn1r5UpFFrjvEAVBZcuCDIEYLSZIoBszdxqF4XwmSsENr4gCjIJ4HXlSdho8uldR/CBhFpDxEKNIY5VjngrjExmW5Wm2rnUD6Qog1UPUCl0Z19arKi60hSnUIZ8wEyMruqduWoYsiUSEr6q/CUSZ+HEtxWdoAC0ZqTWqQ7ntPJPv120DA46g28CJZOwMUATjk6NEhA7CnUlTaBLpfzHEUbPtAZXJaCnDqSwfKJ7yHFtUYsrovIzLLv3togxRfQ8/Fkk+LEWEpYloq6k52fSCJAAtCphNttFBn/IbJ2Dqf4dYkJ3PjgTQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=bitdefender.com; dmarc=pass action=none header.from=bitdefender.com; dkim=pass header.d=bitdefender.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitdefender.onmicrosoft.com; s=selector2-bitdefender-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OnAXBQTnoPKEZg9WPbTjY/ufZ7DWMSbKweyVdFzW00w=; b=byvkxwPeiTdU2r4s66eK5/CGkLfHcpMhEyUduDQySg+utwgH4SbnRDQNZIVp2+bgBj3coTnhtI8MecxqMyoctc3ZWuTOLJYH3QLUwkp46qQEM0f71B7JDXtkKAZB56ZnDR34hoyeXx/0bUE8okulQkMEGd0Wi4OI9GdgorJhWI4= Received: from DB7PR02MB3979.eurprd02.prod.outlook.com (20.177.121.157) by DB7PR02MB4869.eurprd02.prod.outlook.com (20.177.123.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2516.14; Wed, 11 Dec 2019 09:29:18 +0000 Received: from DB7PR02MB3979.eurprd02.prod.outlook.com ([fe80::65e5:e5bd:a115:ce06]) by DB7PR02MB3979.eurprd02.prod.outlook.com ([fe80::65e5:e5bd:a115:ce06%2]) with mapi id 15.20.2516.018; Wed, 11 Dec 2019 09:29:18 +0000 From: Mircea CIRJALIU - MELIU To: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Jerome Glisse , Paolo Bonzini , "aarcange@redhat.com" Subject: [RFC PATCH v1 1/4] mm/remote_mapping: mirror a process address space Thread-Topic: [RFC PATCH v1 1/4] mm/remote_mapping: mirror a process address space Thread-Index: AdWwBBdjWL1iwgZWSiiPCON+EPj4xA== Date: Wed, 11 Dec 2019 09:29:17 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=mcirjaliu@bitdefender.com; x-originating-ip: [91.199.104.6] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 74d65d0f-67e3-4cba-1ca6-08d77e1c98a0 x-ms-traffictypediagnostic: DB7PR02MB4869:|DB7PR02MB4869:|DB7PR02MB4869: x-microsoft-antispam-prvs: x-ms-exchange-transport-forked: True x-ms-oob-tlc-oobclassifiers: OLM:1013; x-forefront-prvs: 024847EE92 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(376002)(366004)(136003)(346002)(396003)(39860400002)(189003)(199004)(316002)(30864003)(8936002)(33656002)(478600001)(26005)(71200400001)(110136005)(186003)(86362001)(8676002)(66476007)(81156014)(76116006)(66556008)(55016002)(52536014)(6506007)(9686003)(66946007)(7696005)(2906002)(64756008)(66446008)(5660300002)(81166006);DIR:OUT;SFP:1102;SCL:1;SRVR:DB7PR02MB4869;H:DB7PR02MB3979.eurprd02.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: bitdefender.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: MMDlW+aXIhcuN2qXo907jxt8sRAFgVfCekHdmoon/zcPrjqV1DYHgMUREeyEgmrH/j0c6zIq9Dpc5dlWjCLgjdNZbnxRin2948xqQHqE4X3MorZlNnHSjpSoe51nRFijKZ72iNuEA5kbgfu6TnPPYvFi6NPXCe20RQdOCg98o6jO+u28RpTIEwDhtjg6NsMM06VEd80kxgMfCHxxxarwmVlPaet8l/fMMtazZA+S4oDrRlMHx359CnJqF1bol8YwoLFvd8RsKHpxqsaKLVTiqdldTTciTETdamouowRl2HOJiTvwxjT7pktLx83BiSpQJUPMBI1LXWxZni/SIxTEOt5emKstjgz3cWvWbOBf5Gmz2zKRJ1+Yhtxq0IqUXJFerhdJNBtHDEccOxkTsHxlMzsGlsHYIjbTrJbG8f6P3KJrBoisnxe//z8APC3PG+Wl MIME-Version: 1.0 X-OriginatorOrg: bitdefender.com X-MS-Exchange-CrossTenant-Network-Message-Id: 74d65d0f-67e3-4cba-1ca6-08d77e1c98a0 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Dec 2019 09:29:17.9339 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 487baf29-f1da-469a-9221-243f830c36f3 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: DpnbQ71EUW9NWv1nS3/ckSKVUxYMMwN2M5963Fy6MAomT0xiArBi9t5eIxk95RVd6JWeaM+nQZ4mg5EcgFKGdupLIzqAu26rL7KiDiBapZg= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR02MB4869 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use a device to inspect another process address space via page table mirroring. Give this device a source process PID via an ioctl(), then use mmap() to analyze the source process address space like an ordinary file. Process address space mirroring is limited to anon VMAs. The device mirrors page tables on demand (faults) and invalidates them by listening to MMU notifier events. Signed-off-by: Mircea Cirjaliu --- include/linux/remote_mapping.h | 33 ++ include/uapi/linux/remote_mapping.h | 12 + mm/Kconfig | 9 + mm/Makefile | 1 + mm/remote_mapping.c | 615 ++++++++++++++++++++++++++++++++++++ 5 files changed, 670 insertions(+) create mode 100644 include/linux/remote_mapping.h create mode 100644 include/uapi/linux/remote_mapping.h create mode 100644 mm/remote_mapping.c diff --git a/include/linux/remote_mapping.h b/include/linux/remote_mapping.h new file mode 100644 index 0000000..ad0995d --- /dev/null +++ b/include/linux/remote_mapping.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _REMOTE_MAPPING_H +#define _REMOTE_MAPPING_H + +#include +#include +#include +#include +#include + +struct remote_file_context { + struct srcu_struct mm_srcu; + spinlock_t mm_lock; + struct mm_struct __rcu *mm; + struct mmu_notifier mn; + + // interval tree for mapped ranges + struct rb_root_cached rb_root; + struct rw_semaphore tree_lock; +}; + +// describes a mapped range +// mirror VMA points here +struct remote_vma_context { + // al information about the mapped interval is found in the VMA + struct vm_area_struct *vma; + + // interval tree link + struct rb_node target_rb; + unsigned long rb_subtree_last; +}; + +#endif /* _REMOTE_MAPPING_H */ diff --git a/include/uapi/linux/remote_mapping.h b/include/uapi/linux/remote_mapping.h new file mode 100644 index 0000000..eb0eec3 --- /dev/null +++ b/include/uapi/linux/remote_mapping.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ + +#ifndef __UAPI_REMOTE_MAPPING_H__ +#define __UAPI_REMOTE_MAPPING_H__ + +#include +#include + +#define REMOTE_PROC_MAP _IOW('r', 0x01, int) +// TODO: also ioctl for pidfd + +#endif /* __UAPI_REMOTE_MAPPING_H__ */ diff --git a/mm/Kconfig b/mm/Kconfig index ab80933..c10dd5c 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -739,4 +739,13 @@ config ARCH_HAS_HUGEPD config MAPPING_DIRTY_HELPERS bool +config REMOTE_MAPPING + bool "Remote memory mapping" + depends on MMU && MMU_NOTIFIER + default n + + help + Allows a given process to map pages of another process in its own + address space. + endmenu diff --git a/mm/Makefile b/mm/Makefile index 1937cc2..595f1a8c 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -108,3 +108,4 @@ obj-$(CONFIG_ZONE_DEVICE) += memremap.o obj-$(CONFIG_HMM_MIRROR) += hmm.o obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o +obj-$(CONFIG_REMOTE_MAPPING) += remote_mapping.o diff --git a/mm/remote_mapping.c b/mm/remote_mapping.c new file mode 100644 index 0000000..358b1f5 --- /dev/null +++ b/mm/remote_mapping.c @@ -0,0 +1,615 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Remote memory mapping. + * + * Copyright (C) 2017-2018 Bitdefender S.R.L. + * + * Author: + * Mircea Cirjaliu + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "internal.h" + +#include + +#define ASSERT(exp) BUG_ON(!(exp)) + + +static inline unsigned long ctx_start(struct remote_vma_context *ctx) +{ + return ctx->vma->vm_pgoff << PAGE_SHIFT; +} + +static inline unsigned long ctx_end(struct remote_vma_context *ctx) +{ + return (ctx->vma->vm_pgoff << PAGE_SHIFT) + + (ctx->vma->vm_end - ctx->vma->vm_start); +} + +static inline unsigned long range_start(struct remote_vma_context *ctx) +{ + return ctx_start(ctx) + 1; +} + +static inline unsigned long range_last(struct remote_vma_context *ctx) +{ + return ctx_end(ctx) - 1; +} + +INTERVAL_TREE_DEFINE(struct remote_vma_context, target_rb, unsigned long, + rb_subtree_last, range_start, range_last, + static inline, range_interval_tree) + +#define range_tree_foreach(ctx, root, start, last) \ + for (ctx = range_interval_tree_iter_first(root, start, last);\ + ctx; ctx = range_interval_tree_iter_next(ctx, start, last)) + +static inline bool +range_interval_tree_overlaps(struct remote_vma_context *ctx, + struct remote_file_context *fctx) +{ + struct remote_vma_context *iter; + + range_tree_foreach(iter, &fctx->rb_root, ctx_start(ctx), ctx_end(ctx)) + return true; + + return false; +} + +static int +mirror_invalidate_range_start(struct mmu_notifier *mn, + const struct mmu_notifier_range *range) +{ + struct remote_file_context *fctx = + container_of(mn, struct remote_file_context, mn); + struct remote_vma_context *ctx; + unsigned long src_start, src_end; + unsigned long vma_start, vma_end; + + /* quick filter - we only map pages from anon VMAs */ + if (!vma_is_anonymous(range->vma)) + return 0; + + /* + * If ctx + VMA were found here, then the VMA + its address space + * haven't been unmapped. See comments in mirror_vm_close(). + */ + down_read(&fctx->tree_lock); + + range_tree_foreach(ctx, &fctx->rb_root, range->start, range->end) { + pr_debug("%s: %lx-%lx found %lx-%lx\n", + __func__, range->start, range->end, + ctx_start(ctx), ctx_end(ctx)); + + // intersect these intervals (source process address range) + src_start = max(range->start, ctx_start(ctx)); + src_end = min(range->end, ctx_end(ctx)); + + // translate to destination process address range + vma_start = ctx->vma->vm_start + (src_start - ctx_start(ctx)); + vma_end = ctx->vma->vm_end + (src_end - ctx_end(ctx)); + + zap_vma_ptes(ctx->vma, vma_start, vma_end - vma_start); + } + + up_read(&fctx->tree_lock); + + return 0; +} + +/* get notified when source MM is shutting down, so we avoid faulting in vain */ +static void +mirror_release(struct mmu_notifier *mn, struct mm_struct *mm) +{ + struct remote_file_context *fctx = + container_of(mn, struct remote_file_context, mn); + + spin_lock(&fctx->mm_lock); + rcu_assign_pointer(fctx->mm, NULL); + spin_unlock(&fctx->mm_lock); + + /* delay address space closing until local faults finish */ + synchronize_srcu(&fctx->mm_srcu); +} + +static const struct mmu_notifier_ops mirror_notifier_ops = { + .invalidate_range_start = mirror_invalidate_range_start, + .release = mirror_release, +}; + + +static void remote_file_context_init(struct remote_file_context *ctx) +{ + ctx->mm = NULL; + ctx->mn.ops = &mirror_notifier_ops; + init_srcu_struct(&ctx->mm_srcu); + spin_lock_init(&ctx->mm_lock); + + ctx->rb_root = RB_ROOT_CACHED; + init_rwsem(&ctx->tree_lock); +} + +static struct remote_file_context *remote_file_context_alloc(void) +{ + struct remote_file_context *ctx; + + ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); + if (ctx) + remote_file_context_init(ctx); + + return ctx; +} + +static void remote_file_context_free(struct remote_file_context *ctx) +{ + kfree(ctx); +} + + +static void remote_vma_context_init(struct remote_vma_context *ctx) +{ + ctx->vma = NULL; +} + +static struct remote_vma_context *remote_vma_context_alloc(void) +{ + struct remote_vma_context *ctx; + + ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); + if (ctx) + remote_vma_context_init(ctx); + + return ctx; +} + +static void remote_vma_context_free(struct remote_vma_context *ctx) +{ + kfree(ctx); +} + + +static struct page *mm_remote_get_page(struct mm_struct *req_mm, + unsigned long req_hva, unsigned int flags) +{ + struct page *req_page = NULL; + long nrpages; + + might_sleep(); + + flags |= FOLL_ANON | FOLL_MIGRATION; + + /* get host page corresponding to requested address */ + nrpages = get_user_pages_remote(NULL, req_mm, req_hva, 1, + flags, &req_page, NULL, NULL); + if (unlikely(nrpages == 0)) { + pr_err("no page for req_hva %016lx\n", req_hva); + return ERR_PTR(-ENOENT); + } else if (IS_ERR_VALUE(nrpages)) { + pr_err("get_user_pages_remote() failed: %d\n", (int)nrpages); + return ERR_PTR(nrpages); + } + + /* limit introspection to anon memory (this also excludes zero-page) */ + if (!PageAnon(req_page)) { + put_page(req_page); + pr_err("page at req_hva %016lx not anon\n", req_hva); + return ERR_PTR(-EINVAL); + } + + return req_page; +} + +static int mirror_dev_open(struct inode *inode, struct file *file) +{ + struct remote_file_context *fctx; + + fctx = remote_file_context_alloc(); + if (!fctx) + return -ENOMEM; + file->private_data = fctx; + + return 0; +} + +static int do_remote_proc_map(struct file *file, int pid) +{ + struct task_struct *req_task; + struct mm_struct *req_mm; + struct remote_file_context *fctx = file->private_data; + int result; + + /* this function may race with mirror_release() notifier */ + spin_lock(&fctx->mm_lock); + if (fctx->mm) { + spin_unlock(&fctx->mm_lock); + return -EALREADY; + } + spin_unlock(&fctx->mm_lock); + + // find task + req_task = find_get_task_by_vpid(pid); + if (!req_task) + return -ESRCH; + + // find + get mm + req_mm = get_task_mm(req_task); + put_task_struct(req_task); + if (!req_mm) + return -EINVAL; + + /* there should be no mirror VMA faults at the moment */ + spin_lock(&fctx->mm_lock); + rcu_assign_pointer(fctx->mm, req_mm); + spin_unlock(&fctx->mm_lock); + + // register MMU notifier + result = mmu_notifier_register(&fctx->mn, req_mm); + if (result) { + mmput(req_mm); + pr_err("unable to register MMU notifier\n"); + + return result; + } + + mmput(req_mm); + + return 0; +} + +static long mirror_dev_ioctl(struct file *file, unsigned int ioctl, + unsigned long arg) +{ + long result; + + switch (ioctl) { + case REMOTE_PROC_MAP: { + int pid = (int)arg; + + result = do_remote_proc_map(file, pid); + break; + } + + default: + pr_err("ioctl %d not implemented\n", ioctl); + result = -ENOTTY; + } + + return result; +} + +static int mirror_dev_release(struct inode *inode, struct file *file) +{ + struct remote_file_context *fctx = file->private_data; + struct mm_struct *src_mm = NULL; + + /* this function may race with mirror_release() notifier */ + spin_lock(&fctx->mm_lock); + if (fctx->mm) { + mmgrab(fctx->mm); + src_mm = fctx->mm; + } + spin_unlock(&fctx->mm_lock); + + /* attempt unregistering if pointer found to be valid */ + if (src_mm) { + mmu_notifier_unregister(&fctx->mn, fctx->mm); + mmdrop(src_mm); + } + + /* + * the synchronization inside mmu_notifier_unregister() makes sure no + * notifier will run after the call + */ + remote_file_context_free(fctx); + + return 0; +} + +/* + * We end up here if the local PMD is NULL. + * Doesn't matter if the address is aligned to huge page boundary or not. + * We look for a huge page mapped at the target equivalent address and try to + * map it in our page tables without splitting it. + */ +static vm_fault_t +mirror_vm_hugefault(struct vm_fault *vmf, enum page_entry_size pe_size) +{ + struct vm_area_struct *vma = vmf->vma; + struct file *file = vma->vm_file; + struct remote_file_context *fctx = file->private_data; + unsigned long req_address; + unsigned int gup_flags; + struct page *req_page; + vm_fault_t result; + struct mm_struct *src_mm; + int idx; + + pr_debug("%s: pe_size %d, address %016lx\n", + __func__, pe_size, vmf->address); + + /* No support for anonymous transparent PUD pages yet */ + if (pe_size == PE_SIZE_PUD) + return VM_FAULT_FALLBACK; + + idx = srcu_read_lock(&fctx->mm_srcu); + + /* check if source mm still exists */ + src_mm = srcu_dereference(fctx->mm, &fctx->mm_srcu); + if (!src_mm) { + result = VM_FAULT_SIGBUS; + goto out; + } + + /* attempt near-deadlock situation */ + if (!down_read_trylock(&src_mm->mmap_sem)) { + srcu_read_unlock(&fctx->mm_srcu, idx); + up_read(¤t->mm->mmap_sem); + + return VM_FAULT_RETRY; + } + + /* set GUP flags depending on the VMA */ + gup_flags = FOLL_HUGE; + if (vma->vm_flags & VM_WRITE) + gup_flags |= FOLL_WRITE | FOLL_FORCE; + + req_address = vmf->pgoff << PAGE_SHIFT; + req_page = mm_remote_get_page(src_mm, req_address, gup_flags); + + /* check for validity of the page */ + if (IS_ERR_OR_NULL(req_page)) { + up_read(&src_mm->mmap_sem); + + if (PTR_ERR(req_page) == -ERESTARTSYS) { + srcu_read_unlock(&fctx->mm_srcu, idx); + up_read(¤t->mm->mmap_sem); + + return VM_FAULT_RETRY; + } + + result = VM_FAULT_FALLBACK; + goto out; + } + + /* shouldn't reach this case, but check anyway */ + if (unlikely(!PageCompound(req_page))) { + result = VM_FAULT_FALLBACK; + goto out_page; + } + + result = vmf_insert_pfn_pmd(vmf, page_to_pfn_t(req_page), + vmf->flags & FAULT_FLAG_WRITE); + +out_page: + put_page(req_page); + up_read(&src_mm->mmap_sem); + +out: + srcu_read_unlock(&fctx->mm_srcu, idx); + + return result; +} + +static vm_fault_t mirror_vm_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct file *file = vma->vm_file; + struct remote_file_context *fctx = file->private_data; + unsigned long req_address; + unsigned int gup_flags; + struct page *req_page; + vm_fault_t result; + struct mm_struct *src_mm; + int idx; + + pr_debug("%s: address %016lx\n", __func__, vmf->address); + + idx = srcu_read_lock(&fctx->mm_srcu); + + /* check if source mm still exists */ + src_mm = srcu_dereference(fctx->mm, &fctx->mm_srcu); + if (!src_mm) { + result = VM_FAULT_SIGBUS; + goto out; + } + + /* attempt near-deadlock situation */ + if (!down_read_trylock(&src_mm->mmap_sem)) { + srcu_read_unlock(&fctx->mm_srcu, idx); + up_read(¤t->mm->mmap_sem); + + return VM_FAULT_RETRY; + } + + /* set GUP flags depending on the VMA */ + gup_flags = FOLL_SPLIT; + if (vma->vm_flags & VM_WRITE) + gup_flags |= FOLL_WRITE | FOLL_FORCE; + + req_address = vmf->pgoff << PAGE_SHIFT; + req_page = mm_remote_get_page(src_mm, req_address, gup_flags); + + /* check for validity of the page */ + if (IS_ERR_OR_NULL(req_page)) { + up_read(&src_mm->mmap_sem); + + if (PTR_ERR(req_page) == -ERESTARTSYS || + PTR_ERR(req_page) == -EBUSY) { + srcu_read_unlock(&fctx->mm_srcu, idx); + up_read(¤t->mm->mmap_sem); + + return VM_FAULT_RETRY; + } + + result = VM_FAULT_SIGBUS; + goto out; + } + + result = vmf_insert_pfn(vmf->vma, vmf->address, page_to_pfn(req_page)); + +//out_page: + put_page(req_page); + up_read(&src_mm->mmap_sem); + +out: + srcu_read_unlock(&fctx->mm_srcu, idx); + + return result; +} + +/* + * This is called in remove_vma() at the end of __do_munmap() after the address + * space has been unmapped and the page tables have been freed. + */ +static void mirror_vm_close(struct vm_area_struct *vma) +{ + struct remote_vma_context *ctx = vma->vm_private_data; + struct remote_file_context *fctx = vma->vm_file->private_data; + + pr_debug("%s: %016lx - %016lx (%lu bytes)\n", + __func__, vma->vm_start, vma->vm_end, + vma->vm_end - vma->vm_start); + + /* will wait for any running invalidate notifiers to finish */ + down_write(&fctx->tree_lock); + range_interval_tree_remove(ctx, &fctx->rb_root); + up_write(&fctx->tree_lock); + + remote_vma_context_free(ctx); + vma->vm_private_data = NULL; +} + +// this will prevent partial unmap of destination VMA +static int mirror_vm_split(struct vm_area_struct *area, unsigned long addr) +{ + return -EINVAL; +} + +static const struct vm_operations_struct mirror_vm_ops = { + .close = mirror_vm_close, + .fault = mirror_vm_fault, + .huge_fault = mirror_vm_hugefault, + .split = mirror_vm_split, +}; + + +static int mirror_dev_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct remote_file_context *fctx = file->private_data; + struct remote_vma_context *ctx; + + pr_debug("%s: %016lx - %016lx (%lu bytes)\n", + __func__, vma->vm_start, vma->vm_end, + vma->vm_end - vma->vm_start); + + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + + /* prepare the context */ + ctx = remote_vma_context_alloc(); + if (!ctx) + return -ENOMEM; + + vma->vm_private_data = ctx; + ctx->vma = vma; + + down_write(&fctx->tree_lock); + if (range_interval_tree_overlaps(ctx, fctx)) { + up_write(&fctx->tree_lock); + + pr_err("part of range already mirrored\n"); + remote_vma_context_free(ctx); + return -EALREADY; + } + + range_interval_tree_insert(ctx, &fctx->rb_root); + up_write(&fctx->tree_lock); + + /* set basic VMA properties */ + vma->vm_flags |= VM_DONTCOPY | VM_DONTDUMP | VM_PFNMAP; + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_ops = &mirror_vm_ops; + + return 0; +} + +/* + * We must have the same alignment relative to a huge page boundary + * as the target VMA or requested address + */ +static unsigned long +mirror_get_unmapped_area(struct file *file, const unsigned long addr0, + const unsigned long len, const unsigned long pgoff, + const unsigned long flags) +{ + struct vm_unmapped_area_info info; + unsigned long address = pgoff << PAGE_SHIFT; + bool huge_align = !(address & ~HPAGE_PMD_MASK); + + pr_debug("%s: len %lu, pgoff 0x%016lx, %s alignment.\n", + __func__, len, pgoff, huge_align ? "PMD" : "page"); + + info.flags = VM_UNMAPPED_AREA_TOPDOWN; + info.length = len; + info.low_limit = PAGE_SIZE; + info.high_limit = get_mmap_base(0); + info.align_mask = ~HPAGE_PMD_MASK; + info.align_offset = address & ~HPAGE_PMD_MASK; + + address = vm_unmapped_area(&info); + + pr_debug("%s: address 0x%016lx\n", __func__, address); + + return address; +} + +static const struct file_operations mirror_ops = { + .open = mirror_dev_open, + .unlocked_ioctl = mirror_dev_ioctl, + .compat_ioctl = mirror_dev_ioctl, + .get_unmapped_area = mirror_get_unmapped_area, + .llseek = no_llseek, + .mmap = mirror_dev_mmap, + .release = mirror_dev_release, +}; + +static struct miscdevice mirror_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "mirror-proc", + .fops = &mirror_ops, +}; + +builtin_misc_device(mirror_dev); + From patchwork Wed Dec 11 09:29:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mircea CIRJALIU - MELIU X-Patchwork-Id: 11284567 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 18007139A for ; Wed, 11 Dec 2019 09:29:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CF3EC2464B for ; Wed, 11 Dec 2019 09:29:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=bitdefender.onmicrosoft.com header.i=@bitdefender.onmicrosoft.com header.b="KvuCHri7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CF3EC2464B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bitdefender.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4DF0A6B3137; Wed, 11 Dec 2019 04:29:29 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 490D86B3138; Wed, 11 Dec 2019 04:29:29 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3320E6B3139; Wed, 11 Dec 2019 04:29:29 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0056.hostedemail.com [216.40.44.56]) by kanga.kvack.org (Postfix) with ESMTP id 1D0976B3137 for ; Wed, 11 Dec 2019 04:29:29 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id C1558180AD81F for ; Wed, 11 Dec 2019 09:29:28 +0000 (UTC) X-FDA: 76252337616.09.line48_52aea221a8524 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,mcirjaliu@bitdefender.com,::linux-kernel@vger.kernel.org:jglisse@redhat.com:pbonzini@redhat.com:aarcange@redhat.com,RULES_HIT:30003:30054:30070:30075:30090,0,RBL:40.107.5.109:@bitdefender.com:.lbl8.mailshell.net-62.2.31.32 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: line48_52aea221a8524 X-Filterd-Recvd-Size: 6297 Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-eopbgr50109.outbound.protection.outlook.com [40.107.5.109]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Dec 2019 09:29:28 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NNDVyJ4jl+XNz4kHCz6DQGL7bzCEJx4g++DqaNJ4LZeW/1oVlvaDN1zVhEfCxDICI+LfE3LMKybssO9Zk29zcUSL7wa22jIdb43oHYZssFkax0CZkvF0s45GnvzQHME4/ArksEYvuwpN73k9nXJB9RXRsJgFeuuhvcUBQ2vTrMR0VXnN2eP86tp4pNV+kO9H/ltQ+2+EjLuHmKtD9/QCJLR0SJXGfOgLbR0iBWF0YrOkOw1qXbUatP4Vg/bcAtlqM8ZU4a7d/vlRRtDdrHw2Xr13CQ7wviLhsZoOQbZ5GRe3lPxr55RpjL3gF2geznBXB30EFTVDvVW5nffQmuCj/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=k2/G2JI7OGvFmNlfq/NWtPv9WTChNoOLtjGPZqFQMsA=; b=GoptXlKExVGHl2JZRhMATig5J8XZIWjMHEFwU87o56qQtkUxMCxFstDyV+X9zFgGFkF+mnrcVkZNdBiCHLA9ZBTzbPKdRT99H1K4VqF/9TpNuZBZP6jmhDIL72iD+A3PUm8GvNxKnFc9pwxmWiljiuYzCCWkvJykHBInxHQWW/Gf1XY8XMbYWGgLXcRyQZD0tFj/TdjXqMI7ejVfOQn8RT0wdGQEDjNnGTi+gjdbw9HoIunxAfvYs3f9kCNIohoEhLtk9BmQ7VDe40sj+nyoHRHUjNQyrjoFV01ARbai8eDn9c1HSXhQr5Ptu3/EdPen42yOHpuR+cC2VXKupixOGQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=bitdefender.com; dmarc=pass action=none header.from=bitdefender.com; dkim=pass header.d=bitdefender.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitdefender.onmicrosoft.com; s=selector2-bitdefender-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=k2/G2JI7OGvFmNlfq/NWtPv9WTChNoOLtjGPZqFQMsA=; b=KvuCHri7Ui8RBYebOeB2+CxwfgsNWYILyP8R43Iywbhdllm91yKdau61Yw4/PmZhJtHN+7XnIYdBeSs22zYbE/BH0Gu8aPha5AXJi619Nfpw5twu7o+sMDW1+19FP56Z/oxeefkJlXnZ6dWoFVvWGmDIx9EXAj4iWk8SfgSAAPU= Received: from DB7PR02MB3979.eurprd02.prod.outlook.com (20.177.121.157) by DB7PR02MB3915.eurprd02.prod.outlook.com (20.176.239.77) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2516.13; Wed, 11 Dec 2019 09:29:21 +0000 Received: from DB7PR02MB3979.eurprd02.prod.outlook.com ([fe80::65e5:e5bd:a115:ce06]) by DB7PR02MB3979.eurprd02.prod.outlook.com ([fe80::65e5:e5bd:a115:ce06%2]) with mapi id 15.20.2516.018; Wed, 11 Dec 2019 09:29:21 +0000 From: Mircea CIRJALIU - MELIU To: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Jerome Glisse , Paolo Bonzini , "aarcange@redhat.com" Subject: [RFC PATCH v1 2/4] mm: also set VMA in khugepaged range invalidation Thread-Topic: [RFC PATCH v1 2/4] mm: also set VMA in khugepaged range invalidation Thread-Index: AdWwBEq3Z5BwlJ79QIOTA1r/KOTZSg== Date: Wed, 11 Dec 2019 09:29:21 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=mcirjaliu@bitdefender.com; x-originating-ip: [91.199.104.6] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 377151f8-5c14-499a-88ff-08d77e1c9ac9 x-ms-traffictypediagnostic: DB7PR02MB3915:|DB7PR02MB3915:|DB7PR02MB3915: x-microsoft-antispam-prvs: x-ms-exchange-transport-forked: True x-ms-oob-tlc-oobclassifiers: OLM:3383; x-forefront-prvs: 024847EE92 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(396003)(136003)(39860400002)(346002)(376002)(366004)(189003)(199004)(66446008)(64756008)(66946007)(8676002)(2906002)(4744005)(66556008)(76116006)(6506007)(26005)(66476007)(33656002)(5660300002)(52536014)(316002)(9686003)(86362001)(7696005)(81166006)(81156014)(8936002)(186003)(71200400001)(110136005)(478600001)(55016002);DIR:OUT;SFP:1102;SCL:1;SRVR:DB7PR02MB3915;H:DB7PR02MB3979.eurprd02.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: bitdefender.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: iR2f6O2cAssgiIKmul9pVCayEp7JP7j/d+Wde660gdBfuVIHdylmtCSFXJ3vmOErlMNBYV9uiLToJqVX7sN2CX69987sYp/IIE4iVNontBMI83B4ylBV/7EhqgR78WI3ZfiXjaGl23y41CrsOiv6i7rWRMpIEf2fj8Pj1KBGmLF89rcZgE4pPE2L9jkv86i5pf4A264l1hPFZgtDKtfSNIQtDgo8YcKLEy3wmAXxKOjQz5Nuxud66W4uIRcZRfQzrIAozKYWnFhXsPDF6Y8wP1ykw5WLcgEUC+jC5+uHqEUZn5TzWpfFlDtW2e+ELJ1sccxHgeEcOuZd43UTTQJ3hSLQkUBOh8pTEuSeg4IOq27oNEdpQpuFKNMzAOtLVrbhRQynVSEZ32ze9Vpx7SJ+hpHL1VLifzNR5+LVEQ/sNRDnC95AiYNgj5vgsTnEWpih MIME-Version: 1.0 X-OriginatorOrg: bitdefender.com X-MS-Exchange-CrossTenant-Network-Message-Id: 377151f8-5c14-499a-88ff-08d77e1c9ac9 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Dec 2019 09:29:21.5878 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 487baf29-f1da-469a-9221-243f830c36f3 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: nvIiDXZlGSOw/I7W1wp/h21xpsSC47jAeetnJoMtnCXaFYIlNRFrmJGA0iVgIplbCFKtNpnaFodFqoPmmDnAG6rXxawYnl5ICG/4i/QaHEg= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR02MB3915 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: MMU notifier client may need the VMA for extra info. This change is needed by the remote mapping feature that inspects anon VMAs with huge mappings. Signed-off-by: Mircea Cirjaliu --- mm/khugepaged.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b679908..11c65f3 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1028,7 +1028,7 @@ static void collapse_huge_page(struct mm_struct *mm, anon_vma_lock_write(vma->anon_vma); - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, address, address + HPAGE_PMD_SIZE); mmu_notifier_invalidate_range_start(&range); From patchwork Wed Dec 11 09:29:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mircea CIRJALIU - MELIU X-Patchwork-Id: 11284565 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DEB11139A for ; Wed, 11 Dec 2019 09:29:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AB0032173E for ; Wed, 11 Dec 2019 09:29:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=bitdefender.onmicrosoft.com header.i=@bitdefender.onmicrosoft.com header.b="b9H66msa" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AB0032173E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bitdefender.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7505E6B3136; Wed, 11 Dec 2019 04:29:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 704A56B3137; Wed, 11 Dec 2019 04:29:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61A356B3138; Wed, 11 Dec 2019 04:29:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id 44D9D6B3136 for ; Wed, 11 Dec 2019 04:29:28 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id D39B5181AEF00 for ; Wed, 11 Dec 2019 09:29:27 +0000 (UTC) X-FDA: 76252337574.05.fall71_5286e4cac3423 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,mcirjaliu@bitdefender.com,::linux-kernel@vger.kernel.org:jglisse@redhat.com:pbonzini@redhat.com:aarcange@redhat.com,RULES_HIT:30054,0,RBL:40.107.5.109:@bitdefender.com:.lbl8.mailshell.net-62.2.31.32 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: fall71_5286e4cac3423 X-Filterd-Recvd-Size: 6534 Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-eopbgr50109.outbound.protection.outlook.com [40.107.5.109]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Dec 2019 09:29:27 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SCNmxrlAuseo2aeUS4FAd8UqxXp0mDZ60JkUyUGj+5ZMRncGKYsAIz3txzbSFvuqd2NCCl1K1RBeWUvTVlA0mGwY37Swkitr5CKKVOfqaX08bDYeaGcWTuaifdXY1UJvclDKr5dBMMtTkYGM65Ch5MaotGN8zLCULG6rUV9m/W5hbYmTw0tg7gye1VauuQXc1knvlFU5OfJUDwpYI6EzFyABizOtea3YSfsF/k7KQlP9G5l5TVEYcXa6gH/mKy6nmqPA8wK+33/Pux/OP/RiqzMhTvHnhW6Sp6e9bGeCqlLYO8pNH61yS8UiN1pB081ahpuCq1zkb5LF6F7K0dPrJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wDEOIp/sAyFdlZrj/GO/ouorop1i74oV3BWpvZkNoLM=; b=fX/NplO9N7sEE1mDAjWrNInq+8jyZaJHRfdsUdINwrzgmtzot4Yn+T38yRiwPfEd2RymKAiIU+urEq+12CdzCA0d/Z+sGn+q/hrHsqLq/XoEWkoUszHV0jYrLnqP/QW0++zdFW59Vbaj6uCWs43b3CRFYMOHNQ6j/PVoFeidEk0NjAHbXUJIyDmRLoqFVN/t4Ilb598um2a1xDuF4PZOQDdHjkQpc3hR3Pw/auV52OpFzYSQ9+hM7OhyX1TOTj0QytvkEA7ithUtTBfGQ9GmHOrjtIJDVbtZBxD0M2V0oh+zZP1SF1TPjcHITgpmnbdgp5iz1omGb4/aJOc3lZbCUA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=bitdefender.com; dmarc=pass action=none header.from=bitdefender.com; dkim=pass header.d=bitdefender.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitdefender.onmicrosoft.com; s=selector2-bitdefender-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wDEOIp/sAyFdlZrj/GO/ouorop1i74oV3BWpvZkNoLM=; b=b9H66msanuL6QZ30A1Dt78K/rlKNeGKrp49egpnBrBCwjZxeN+a6JnG9Y0nrVWH7ffeTA5oCvZR9+ZlNfqwAHaZRgdntoGaSshMbvYpKpYEBtBpDqQ8cM5kYzh1QEnFuUAqZ0+c4ibL6LHEP8rqMm4RBsR6N8u6ahC9p/VsMrnk= Received: from DB7PR02MB3979.eurprd02.prod.outlook.com (20.177.121.157) by DB7PR02MB3915.eurprd02.prod.outlook.com (20.176.239.77) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2516.13; Wed, 11 Dec 2019 09:29:20 +0000 Received: from DB7PR02MB3979.eurprd02.prod.outlook.com ([fe80::65e5:e5bd:a115:ce06]) by DB7PR02MB3979.eurprd02.prod.outlook.com ([fe80::65e5:e5bd:a115:ce06%2]) with mapi id 15.20.2516.018; Wed, 11 Dec 2019 09:29:20 +0000 From: Mircea CIRJALIU - MELIU To: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Jerome Glisse , Paolo Bonzini , "aarcange@redhat.com" Subject: [RFC PATCH v1 3/4] thp: fix huge page zapping for special PMDs Thread-Topic: [RFC PATCH v1 3/4] thp: fix huge page zapping for special PMDs Thread-Index: AdWwBRVdoL1dw8xpR1utr5q5yXwQBw== Date: Wed, 11 Dec 2019 09:29:20 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=mcirjaliu@bitdefender.com; x-originating-ip: [91.199.104.6] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 19eee9f0-ebe8-4a64-4adf-08d77e1c9a25 x-ms-traffictypediagnostic: DB7PR02MB3915:|DB7PR02MB3915:|DB7PR02MB3915: x-microsoft-antispam-prvs: x-ms-exchange-transport-forked: True x-ms-oob-tlc-oobclassifiers: OLM:1728; x-forefront-prvs: 024847EE92 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(396003)(136003)(39860400002)(346002)(376002)(366004)(189003)(199004)(66446008)(64756008)(66946007)(8676002)(2906002)(4744005)(66556008)(76116006)(6506007)(26005)(66476007)(33656002)(5660300002)(52536014)(316002)(9686003)(86362001)(7696005)(81166006)(81156014)(8936002)(186003)(71200400001)(110136005)(478600001)(55016002)(14583001);DIR:OUT;SFP:1102;SCL:1;SRVR:DB7PR02MB3915;H:DB7PR02MB3979.eurprd02.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: bitdefender.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: +lZHJspytERe4/WY8paLVlSfrM3DrGwskGVmBuVdPnh9sFzY10ikT45xTZJhtYPxJzZjKH+iV2gg6y7lJfqZbkLFo/WIohDXnUFAw74FjFk+XH3c8wVF1vRoD3zX1isrUf0uWUwKaf/sUNwxJo5cMumkFnm7JAQpnzeB/2QMu0b6YzeJD5Awx0wWyGqRaJwCTzkNN8Rr7rkrEEtt1bt+ujhzvsXGBa4vuK8IgDz7SGCY2hW6vX4gv7PTxa+GxYtxHbu+kNnwRZPd83Q9jpNL5SDxcoiua4MgAIiv4Ncp3tTxHXrjsS6XI3Eqh9adoPbzH2dP+BvHgKONrHqtnU2Z6S6So9BkwZzC+Le32qz83CM3ZBh89fvrteegDIYQW5MqlsvoSYp+fDpITmuUnD2DP4QSRgYrpQ5xqDCGjUJdZ1kgwUKBhkG5xpVlBM6A18ABCru53iAoOxxtnk1FKqDZ1V3Ti4U9KNurMnDzZRkVXpYrV1os1R+NmFR0LNTtl3zl MIME-Version: 1.0 X-OriginatorOrg: bitdefender.com X-MS-Exchange-CrossTenant-Network-Message-Id: 19eee9f0-ebe8-4a64-4adf-08d77e1c9a25 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Dec 2019 09:29:20.5374 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 487baf29-f1da-469a-9221-243f830c36f3 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: +4HzDgtjH9J2+9n/AFNYK2AWrgwtrXQkM5SFVgZZ8APGUQ5ZSlMkbrTkCjwqLzFgsB+esdFlJp/WMi2VYr/CuYdCOn3XwKr645ZJmRU+3EE= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR02MB3915 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When calling zap_vma_ptes() on VM_PFNMAP VMAs involving huge mappings, pmd_page() will return an invalid page, causing trouble. Use instead vm_normal_page_pmd() and test for returned page like zap_pte_range(). Signed-off-by: Mircea Cirjaliu --- mm/huge_memory.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 41a0fbd..92ce487 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1804,7 +1804,11 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, int flush_needed = 1; if (pmd_present(orig_pmd)) { - page = pmd_page(orig_pmd); + page = vm_normal_page_pmd(vma, addr, orig_pmd); + if (unlikely(!page)) { + spin_unlock(ptl); + return 1; + } page_remove_rmap(page, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); From patchwork Wed Dec 11 09:29:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mircea CIRJALIU - MELIU X-Patchwork-Id: 11284563 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 67B696C1 for ; Wed, 11 Dec 2019 09:29:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2B3C921556 for ; Wed, 11 Dec 2019 09:29:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=bitdefender.onmicrosoft.com header.i=@bitdefender.onmicrosoft.com header.b="f/rLSy80" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2B3C921556 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bitdefender.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 30C566B3135; Wed, 11 Dec 2019 04:29:26 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2E29C6B3136; Wed, 11 Dec 2019 04:29:26 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AB326B3137; Wed, 11 Dec 2019 04:29:26 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0657A6B3135 for ; Wed, 11 Dec 2019 04:29:26 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id A51DA9403 for ; Wed, 11 Dec 2019 09:29:25 +0000 (UTC) X-FDA: 76252337490.09.heat43_523595226f03e X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,mcirjaliu@bitdefender.com,::linux-kernel@vger.kernel.org:jglisse@redhat.com:pbonzini@redhat.com:aarcange@redhat.com,RULES_HIT:30003:30051:30054,0,RBL:40.107.5.90:@bitdefender.com:.lbl8.mailshell.net-62.2.31.32 64.100.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: heat43_523595226f03e X-Filterd-Recvd-Size: 8000 Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-eopbgr50090.outbound.protection.outlook.com [40.107.5.90]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Dec 2019 09:29:24 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ksdSjE29AvW3Ui14zMPQXzFmU7itHQ75XYKZhXJbcSFNG/fI2+VascdSo905RwEXHrofp0TU27GM5FTDMu7f12K5231oiTJL9BgceH6df7UN49Siv9iikmNrgNuRzF/B4Ditmi1OpgInEJ4uQlgNZTXA+i4HKlgbnntx5shN13cvlgQwqqdHc9gm7VDgcRk1jlre+2nJCH5T/vHSlH3pkQnoSFexFW9qqPyPTD26IMN5O0WjQwM0eTBrSIAErbkmda4g7K74WsqjQWLP/4NJ2ZoAbh9Ai7sRgv/yEmj2t1FW9aWrJOZfByAyQ1PdMAOu6Xagp9VT8MKIvwmG8xeHrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DHfQOaN8333zLLb4pLlKG7B07hj9xdZKMp+SmMKXR4A=; b=fs8LtUkNA+5IScVUrNGp0imBJWjTXl1mQUmk7vUh8Arc3uCLpmrwopOQtOH0tiM9s07svW/ueIoO3f7uKFXQxYcxwVl1C3rhOSHWTS8BZk+vi3Gn0Qwu6LKb3RD440Zc+X1Fq0bnBNtVmMtyIfwVMlE0YmbChJ7HYWrSYbo48ImfVFcHq9S47ujbJkqnCRaZ8d7Ugd2ha820oSzCw9AiQ3N9sDUtmriESlliJUO8xpk6gW+dVubJ2KLCc78YasU2zrOceiO/yBMPTiZ8cuPuun+RiuTZOFbjGcaiYlKkjlJCnV+0gAA0543jvZh6McNFAMVt9+tHFGkPpOr6LkjFvg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=bitdefender.com; dmarc=pass action=none header.from=bitdefender.com; dkim=pass header.d=bitdefender.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitdefender.onmicrosoft.com; s=selector2-bitdefender-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DHfQOaN8333zLLb4pLlKG7B07hj9xdZKMp+SmMKXR4A=; b=f/rLSy80gnF2b6g4kz8Na3zL6eEfzp0c4LNNhYoiY4wm3mCA/l6QGHH1WzkX7Rx999OgISRebD2tpOrWQyIztB/X6gfhzkjgFHnXbmP5esQarT7EMrV41MBHsmUtJwwrmf4LWlat+2rW/YwhIh5jCfQNxcjQLN6SJFup7AAx95A= Received: from DB7PR02MB3979.eurprd02.prod.outlook.com (20.177.121.157) by DB7PR02MB3915.eurprd02.prod.outlook.com (20.176.239.77) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2516.13; Wed, 11 Dec 2019 09:29:19 +0000 Received: from DB7PR02MB3979.eurprd02.prod.outlook.com ([fe80::65e5:e5bd:a115:ce06]) by DB7PR02MB3979.eurprd02.prod.outlook.com ([fe80::65e5:e5bd:a115:ce06%2]) with mapi id 15.20.2516.018; Wed, 11 Dec 2019 09:29:19 +0000 From: Mircea CIRJALIU - MELIU To: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Jerome Glisse , Paolo Bonzini , "aarcange@redhat.com" Subject: [RFC PATCH v1 4/4] mm/gup: flag to limit follow_page() to transhuge pages Thread-Topic: [RFC PATCH v1 4/4] mm/gup: flag to limit follow_page() to transhuge pages Thread-Index: AdWwBRVo97ip28i7RLG6c8GTBoWkWg== Date: Wed, 11 Dec 2019 09:29:19 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=mcirjaliu@bitdefender.com; x-originating-ip: [91.199.104.6] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 3eea2438-ad6f-43d6-36fb-08d77e1c9991 x-ms-traffictypediagnostic: DB7PR02MB3915:|DB7PR02MB3915:|DB7PR02MB3915: x-microsoft-antispam-prvs: x-ms-exchange-transport-forked: True x-ms-oob-tlc-oobclassifiers: OLM:3276; x-forefront-prvs: 024847EE92 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(396003)(136003)(39860400002)(346002)(376002)(366004)(189003)(199004)(66446008)(64756008)(66946007)(8676002)(2906002)(66556008)(76116006)(6506007)(26005)(66476007)(33656002)(5660300002)(52536014)(316002)(9686003)(86362001)(7696005)(81166006)(81156014)(8936002)(186003)(71200400001)(110136005)(478600001)(55016002);DIR:OUT;SFP:1102;SCL:1;SRVR:DB7PR02MB3915;H:DB7PR02MB3979.eurprd02.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: bitdefender.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: l6Tte8+OxnjtJpKs1baGqCor7Pzdsb3tZ2FgcxvVqODHMST0IXJqI43B7BS0+0dHqjb5To7wWc9xbEKzXEAq9D8/YgxFnah+lHMIpkoVPSJ8nq6cCRSEJ+84OljV8D+x+foHTj1CR7uaVWtglyoNKHQuM71hn+ajIwTUwX2lyLxhZ5Xun/Jq1VeSoQQEMkO6Gy1UcDsvjvA4wZfB0cdTZPMqH5pO/zWPvG4XbZdIbsnI1yt1ffcv8HW+JvaqNMzdQxbe8y8y9iBsQ8+poMoWcIfCdt4EAJkwLMShCxktjUIxoZepRHjuSRH5MapVeXcsxRQwimRoG4gx85PPEJQ+I5DqWQbmMWoWATCwBYiRE4yjx/IyJPJU84UU1yenthrSqDrJyq6UviaD5ePFpNB/OCnU/jJWU14gYlM6/HgzFyqYTnbX9YGE6q+8xyjqNMlV MIME-Version: 1.0 X-OriginatorOrg: bitdefender.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3eea2438-ad6f-43d6-36fb-08d77e1c9991 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Dec 2019 09:29:19.5580 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 487baf29-f1da-469a-9221-243f830c36f3 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: o4j4wK+QLDPF25NSJKWePu+BzBhtYYRwsiIt4k65y8ndRWB9amlbY7eNP5RtI1s9Gi/1emoZJA2wvmifsISyCZRRX96Dy99Yi/0gc3zMFR4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR02MB3915 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Sometimes the user needs to look up a transhuge page mapped at a given address. So instead of being given a normal page and having to test it, save some cycles by filtering out PTE mappings. Signed-off-by: Mircea Cirjaliu --- include/linux/mm.h | 1 + mm/gup.c | 13 +++++++++++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index c97ea3b..64bbf83 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2579,6 +2579,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_ANON 0x8000 /* don't do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ +#define FOLL_HUGE 0x40000 /* only return huge mappings */ /* * NOTE on FOLL_LONGTERM: diff --git a/mm/gup.c b/mm/gup.c index 7646bf9..a776bdc 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -361,9 +361,11 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (page) return page; } - if (likely(!pmd_trans_huge(pmdval))) + if (likely(!pmd_trans_huge(pmdval))) { + if (flags & FOLL_HUGE) + return ERR_PTR(-EFAULT); return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); - + } if ((flags & FOLL_NUMA) && pmd_protnone(pmdval)) return no_page_table(vma, flags); @@ -382,6 +384,8 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); + if (flags & FOLL_HUGE) + return ERR_PTR(-EFAULT); return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) { @@ -513,6 +517,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, struct page *page; struct mm_struct *mm = vma->vm_mm; + VM_BUG_ON((flags & (FOLL_SPLIT | FOLL_HUGE)) == (FOLL_SPLIT | FOLL_HUGE)); + ctx->page_mask = 0; /* make this handle hugepd */ @@ -685,6 +691,9 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if (gup_flags & FOLL_ANON && !vma_is_anonymous(vma)) return -EFAULT; + if (gup_flags & FOLL_HUGE && !transparent_hugepage_enabled(vma)) + return -EFAULT; + if (write) { if (!(vm_flags & VM_WRITE)) { if (!(gup_flags & FOLL_FORCE))