From patchwork Wed Dec 22 12:33:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peng Liang X-Patchwork-Id: 12691519 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23C9EC433F5 for ; Wed, 22 Dec 2021 12:42:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B9456B0073; Wed, 22 Dec 2021 07:42:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8689D6B0074; Wed, 22 Dec 2021 07:42:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 757B76B0075; Wed, 22 Dec 2021 07:42:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id 687DB6B0073 for ; Wed, 22 Dec 2021 07:42:34 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 239648249980 for ; Wed, 22 Dec 2021 12:42:34 +0000 (UTC) X-FDA: 78945393828.03.8D729ED Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf29.hostedemail.com (Postfix) with ESMTP id A8B1A120045 for ; Wed, 22 Dec 2021 12:42:29 +0000 (UTC) Received: from kwepemi100010.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4JJtHw0jQMzcc0Q; Wed, 22 Dec 2021 20:42:04 +0800 (CST) Received: from kwepemm600002.china.huawei.com (7.193.23.29) by kwepemi100010.china.huawei.com (7.221.188.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20; Wed, 22 Dec 2021 20:42:27 +0800 Received: from localhost.localdomain (10.175.101.6) by kwepemm600002.china.huawei.com (7.193.23.29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20; Wed, 22 Dec 2021 20:42:26 +0800 From: Peng Liang To: , CC: , , , , , Subject: [RFC 0/1] memfd: Support mapping to zero page on reading Date: Wed, 22 Dec 2021 20:33:59 +0800 Message-ID: <20211222123400.1659635-1-liangpeng10@huawei.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 X-Originating-IP: [10.175.101.6] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600002.china.huawei.com (7.193.23.29) X-CFilter-Loop: Reflected X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A8B1A120045 X-Stat-Signature: qytdobhdawfdp6h7x5wsucjn6inppicn Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf29.hostedemail.com: domain of liangpeng10@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=liangpeng10@huawei.com X-HE-Tag: 1640176949-907741 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi all, Recently we are working on implementing CRIU [1] for QEMU based on Steven's work [2]. It will use memfd to allocate guest memory in order to restore (inherit) it in the new QEMU process. However, memfd will allocate a new page for reading while anonymous memory will map to zero page for reading. For QEMU, memfd may cause that all memory are allocated during the migration because QEMU will read all pages in migration. It may lead to OOM if over-committed memory is enabled, which is usually enabled in public cloud. In this patch I try to add support mapping to zero pages on reading memfd. On reading, memfd will map to zero page instead of allocating a new page. Then COW it when a write occurs. For now it's just a demo for discussion. There are lots of work to do, e.g.: 1. don't support THP; 2. don't support shared reading and writing, only for inherit. For example: task1 | task2 1) read from addr | | 2) write to addr 3) read from addr again | then 3) will read 0 instead of the data task2 writed in 2). Would something similar be welcome in the Linux? Thanks, Peng [1] https://criu.org/Checkpoint/Restore [2] https://patchwork.kernel.org/project/qemu-devel/cover/1628286241-217457-1-git-send-email-steven.sistare@oracle.com/ Peng Liang (1): memfd: Support mapping to zero page on reading memfd include/linux/fs.h | 2 ++ include/uapi/linux/memfd.h | 1 + mm/memfd.c | 8 ++++++-- mm/memory.c | 37 ++++++++++++++++++++++++++++++++++--- mm/shmem.c | 10 ++++++++-- 5 files changed, 51 insertions(+), 7 deletions(-)