From patchwork Mon Jan 9 07:22:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13093093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD6C5C677F1 for ; Mon, 9 Jan 2023 07:19:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F8118E0005; Mon, 9 Jan 2023 02:19:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A77E8E0001; Mon, 9 Jan 2023 02:19:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 286F98E0005; Mon, 9 Jan 2023 02:19:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 178E88E0001 for ; Mon, 9 Jan 2023 02:19:41 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DB926402AB for ; Mon, 9 Jan 2023 07:19:40 +0000 (UTC) X-FDA: 80334410520.12.B24B3A6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf09.hostedemail.com (Postfix) with ESMTP id E9F2914000B for ; Mon, 9 Jan 2023 07:19:38 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="F5F/clnF"; spf=pass (imf09.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673248779; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LJobTs2V5XcSusL76UNrK4PZQv7eZ//5y09KZ8bp8nU=; b=jwe5t3itYFhmgoVUGdnxQfe7saT8Vbx3fNWtrFbkurBmNNjbdVqKJ7uv8QLdRTSBhHymrr hjy52Cypf6Xq2FA8CiRBkERbG+EoolckrtuJSoGwO93N3FZwlDe/2Yopk8SQk82hy/j1LD 7Tj3MiywT22a7qJYzbYZlIuPM7h0sPw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="F5F/clnF"; spf=pass (imf09.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673248779; a=rsa-sha256; cv=none; b=uXiMJ5CiWEg7noaLVLagl9wujRbh8s2PnA755xHJJ0Jw7W60OPOfICijUO7aobLBJIryQz sd6yimVCBPWMas5tTviwhtdyL3fDtW6C32YEgjC2xEkBsqyiVZz75houXtkHJs0QmnVu9U eF5NeQJWb4pzGCxA80Ulo0yb94yrfOQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673248779; x=1704784779; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aPU5uaXNkq3fAf8nIKkOuklY542v16NKu2eeGh2divw=; b=F5F/clnFrSB4JRvjJ11aE3cWRx+0RBzrHyrmwpO4CLNFeClboPoMt3X7 BcsgG6jHl/tD8SRl2r32m+LrMTiwPfD76p0Q6QP4mysdby6sac/zEnjMd jnsIETTC+CTXPqiPUQS/Bc/wXjv0gBqpqQZFoP60SZkWtH9aIg8gA5ooO 6SdLt9qm6ot1tkF5JBdiJmviBo5zgNUjmY/tVBTMEyOrbBaV2n0VG04fQ rZYhK2JwB2n0HDA4TxI2ni5YfeIBuoIRkyG1Yw86DRBOeeu+WiCQX5VqR 6ntsn+2qdAs/Im8DNN7u0bwoVKbmvt7BJwuqnyOuI/QP40sNLwKjnU0Ng g==; X-IronPort-AV: E=McAfee;i="6500,9779,10584"; a="387260883" X-IronPort-AV: E=Sophos;i="5.96,311,1665471600"; d="scan'208";a="387260883" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jan 2023 23:19:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10584"; a="634112033" X-IronPort-AV: E=Sophos;i="5.96,311,1665471600"; d="scan'208";a="634112033" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga006.jf.intel.com with ESMTP; 08 Jan 2023 23:19:14 -0800 From: Yin Fengwei To: linux-mm@kvack.org, akpm@linux-foundation.org, jack@suse.cz, hughd@google.com, kirill.shutemov@linux.intel.com, mhocko@suse.com, ak@linux.intel.com, aarcange@redhat.com, npiggin@gmail.com, mgorman@techsingularity.net, willy@infradead.org, rppt@kernel.org, dave.hansen@intel.com, ying.huang@intel.com, tim.c.chen@intel.com Cc: fengwei.yin@intel.com Subject: [RFC PATCH 1/4] mcpage: add size/mask/shift definition for multiple consecutive page Date: Mon, 9 Jan 2023 15:22:29 +0800 Message-Id: <20230109072232.2398464-2-fengwei.yin@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230109072232.2398464-1-fengwei.yin@intel.com> References: <20230109072232.2398464-1-fengwei.yin@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E9F2914000B X-Stat-Signature: ii9r69tt7mcjjtss9ersf1ic9abf47op X-Rspam-User: X-HE-Tag: 1673248778-886699 X-HE-Meta: U2FsdGVkX18Z5AFEC9LH38sHdYR4hwIqA6fAIocPu1Hb/ZbUOGHfJj8NsMU6fpT9k9ZYtkPIjh44GRAg3u+p9eMZvtRwihn3/yJBFDS5SyJV9pUv/NOztjt8O7AHYC4UtA4S0bFFysROiDKzQeFxryV3EBxmbhY2PGhfPxkh+JdEfBBFGazpvos8K85tUnfjtFgsuGpIRarssSHB/PZ3qtVq7uIQd8iMqlQG3omtMNSOiiHnTDZMJWcX+L6KjXT5xk4z2859x4l4Hi9qKCj6H9YI03imW8z6ys87c4KzoppAgll7nUZ1JlWSQDRdcOmjmCWBOjs/g6XH8fniJJZ+sr2MqxQ1ybv3yR+YZ2YACvJKLFcM2A5vpxXJUbW5neKoFhej1SRxZqYVIEiIQBugF3C/IZKSmpzLxqp4cMnAekyaGL9/3nEdvDiwDcMmaaARv+b/XqPGitEnE2Jm87fCz1DHpqOqFFv3BvtCKSuFanElWyC+CQuTgPhRXVM6Jzjq6fsxikExh2NsoeYA+cJ5R/XdnPtjtGvgc7kyRSuPUTPnfbP5MEw/yrBIxVqTUBBajwjrHczq8ZC/FBQ4UdwXF1HqBzPAjBaJe0pI/ilyFGWQV3H4Og0VtKwaNfR8vGCsb0BOAU17vB2UXBw8CPqaO0P18UoreiagAZRzmym5Vf4yDvaXZOQLcdtI80BgppDJ8V2VwvKJOjB+Fx5oE3GSokYkKKsJ5cIsxUylwkpDTsj7oZHaUnOqokGeWxlaP216GuxUfhMcU6gsolbzG22tLiSFORNbJmFedAvMBUb2R4w47cc7OQ3FfQdotSN0pELVi03+lkIHOk7V/i3wSCnnrIfl8QtNWwBdSDaqahy4f2Ht0Xa/quCbfBo3a9ZtCcG2Ld2GX/2udfsWnDidhjQb8g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge page in current kernel could bring obvious performance improvement for some workloads with less TLB missing and less page fault. But the limited options of huge page size (2M/1G for x86_64) also brings extra cost like larger memory consumption, and more CPU cycle for page zeroing. The idea of the multiple consecutive page (abbr as "mcpage") is using collection of physical contiguous 4K page other than huge page for anonymous mapping. Target is to have more choices to trade off the pros and cons of huge page. Comparing to huge page, it will not get so much benefit of TLB missing and page fault. And it will not pay too much extra cost for large memory consumption and larger latency introduced by page compaction, page zeroing etc. The size of mcpage can be configured. The default value of 16K size is just picked up arbitrarily. User should choose the value according to the result of tuning their workload with different mcpage size. To have physical contiguous pages, high order pages is allocated (order is calculated according to mcpage size). Then the high order page will be split. By doing this, each sub page of mcpage is just normal 4K page. The current kernel page management infrastructure is applied to "mc" pages without any change. To reduce the page fault number, multiple page table entries are populated in one page fault with sub pages pfn of mcpage. This also brings a little bit cost of memory consumption. Update Kconfig to allow user define the mcpage order. Define MACROs like mcpage mask/shift/nr/size. In this RFC patch, only Kconfig is used for mcpage order to show the idea. Runtime parameter will be chosen if make this official patch in the future. Signed-off-by: Yin Fengwei --- include/linux/mm_types.h | 11 +++++++++++ mm/Kconfig | 19 +++++++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 3b8475007734..fa561c7b6290 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -71,6 +71,17 @@ struct mem_cgroup; #define _struct_page_alignment __aligned(sizeof(unsigned long)) #endif +#ifdef CONFIG_MCPAGE_ORDER +#define MCPAGE_ORDER CONFIG_MCPAGE_ORDER +#else +#define MCPAGE_ORDER 0 +#endif + +#define MCPAGE_SIZE (1 << (MCPAGE_ORDER + PAGE_SHIFT)) +#define MCPAGE_MASK (~(MCPAGE_SIZE - 1)) +#define MCPAGE_SHIFT (MCPAGE_ORDER + PAGE_SHIFT) +#define MCPAGE_NR (1 << (MCPAGE_ORDER)) + struct page { unsigned long flags; /* Atomic flags, some possibly * updated asynchronously */ diff --git a/mm/Kconfig b/mm/Kconfig index ff7b209dec05..c202dc99ab6d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -650,6 +650,25 @@ config HUGETLB_PAGE_SIZE_VARIABLE Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be clamped down to MAX_ORDER - 1. +config MCPAGE + bool "multiple consecutive page " + default n + help + Enable multiple consecutive page: mcpage is page collections (sub-page) + which are physical contiguous. When mapping to user space, all the + sub-pages will be mapped to user space in one page fault handler. + Expect to trade off the pros and cons of huge page. Like less + unnecessary extra memory zeroing and less memory consumption. + But with no TLB benefit. + +config MCPAGE_ORDER + int "multiple consecutive page order" + default 2 + depends on X86_64 && MCPAGE + help + The order of mcpage. Should be chosen carefully by tuning your + workload. + config CONTIG_ALLOC def_bool (MEMORY_ISOLATION && COMPACTION) || CMA