From patchwork Tue Oct 27 06:32:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 11859467 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 34816921 for ; Tue, 27 Oct 2020 06:33:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B58C72084C for ; Tue, 27 Oct 2020 06:32:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B58C72084C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 960F96B005C; Tue, 27 Oct 2020 02:32:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8EB4E6B005D; Tue, 27 Oct 2020 02:32:58 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B1C56B0062; Tue, 27 Oct 2020 02:32:58 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0117.hostedemail.com [216.40.44.117]) by kanga.kvack.org (Postfix) with ESMTP id 45ECB6B005C for ; Tue, 27 Oct 2020 02:32:58 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D4ADF1EE6 for ; Tue, 27 Oct 2020 06:32:57 +0000 (UTC) X-FDA: 77416737594.22.worm51_0e136ea2727a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id ACDCA18038E60 for ; Tue, 27 Oct 2020 06:32:57 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ying.huang@intel.com,,RULES_HIT:30003:30012:30054:30055:30064:30070,0,RBL:134.134.136.100:@intel.com:.lbl8.mailshell.net-62.18.0.100 64.95.201.95;04yg87yn7akrbk7hin6cghuewdj8ooc9wg45ezwxde4wdae4i7dafss1yx776rt.1fcaiu33q9ckgdzr6oqh98r7zp1sn1a4ebg6pcteykc58ysjfnseo14oa6iiqjc.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: worm51_0e136ea2727a X-Filterd-Recvd-Size: 6890 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Tue, 27 Oct 2020 06:32:56 +0000 (UTC) IronPort-SDR: xqh8xXrrTERzQA/P7ONFw8QJbFMDURZutptlL6+X6HpJ7CTJyPn0MALE8NWHq4L1/tv7knGmNg 9WsnTzwTBLAw== X-IronPort-AV: E=McAfee;i="6000,8403,9786"; a="232221160" X-IronPort-AV: E=Sophos;i="5.77,422,1596524400"; d="scan'208";a="232221160" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Oct 2020 23:32:54 -0700 IronPort-SDR: MLrRYf6MDx88wDb+Pjg+TRX7VD+hlJUOxZEBW2k6nc/lSpHKDNdqgfUHVjQDHflqRQg4Jk8Z4y 13IWVo4m4QFw== X-IronPort-AV: E=Sophos;i="5.77,422,1596524400"; d="scan'208";a="535666676" Received: from lzhengha-mobl.ccr.corp.intel.com (HELO yhuang-mobile.ccr.corp.intel.com) ([10.254.213.46]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Oct 2020 23:32:51 -0700 From: Huang Ying To: Peter Zijlstra Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Andrew Morton , Michal Hocko , Rik van Riel , Mel Gorman , Ingo Molnar , Dave Hansen , Dan Williams Subject: [RFC -V4 0/6] autonuma: Optimize memory placement for memory tiering system Date: Tue, 27 Oct 2020 14:32:11 +0800 Message-Id: <20201027063217.211096-1-ying.huang@intel.com> X-Mailer: git-send-email 2.28.0 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With the advent of various new memory types, some machines will have multiple types of memory, e.g. DRAM and PMEM (persistent memory). The memory subsystem of these machines can be called memory tiering system, because the performance of the different types of memory are usually different. After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM"), the PMEM could be used as the cost-effective volatile memory in separate NUMA nodes. In a typical memory tiering system, there are CPUs, DRAM and PMEM in each physical NUMA node. The CPUs and the DRAM will be put in one logical node, while the PMEM will be put in another (faked) logical node. To optimize the system overall performance, the hot pages should be placed in DRAM node. To do that, we need to identify the hot pages in the PMEM node and migrate them to DRAM node via NUMA migration. In the original AutoNUMA, there are already a set of existing mechanisms to identify the pages recently accessed by the CPUs in a node and migrate the pages to the node. So we can reuse these mechanisms to build the mechanisms to optimize the page placement in the memory tiering system. This has been implemented in this patchset. At the other hand, the cold pages should be placed in PMEM node. So, we also need to identify the cold pages in the DRAM node and migrate them to PMEM node. In the following patchset, [RFC][PATCH 0/9] [v4][RESEND] Migrate Pages in lieu of discard https://lore.kernel.org/linux-mm/20201007161736.ACC6E387@viggo.jf.intel.com/ A mechanism to demote the cold DRAM pages to PMEM node under memory pressure is implemented. Based on that, the cold DRAM pages can be demoted to PMEM node proactively to free some memory space on DRAM node. And this frees the space on DRAM node for the hot PMEM pages to be promoted to. This has been implemented in this patchset too. The patchset is based on the following not-yet-merged patchset, [RFC][PATCH 0/9] [v4][RESEND] Migrate Pages in lieu of discard https://lore.kernel.org/linux-mm/20201007161736.ACC6E387@viggo.jf.intel.com/ This is part of a larger patch set. If you want to apply these or play with them, I'd suggest using the tree from below, https://github.com/hying-caritas/linux/commits/autonuma-r4 We have tested the solution with the pmbench memory accessing benchmark with the 80:20 read/write ratio and the normal access address distribution on a 2 socket Intel server with Optane DC Persistent Memory Model. The test results of the base kernel and step by step optimizations are as follows, Throughput Promotion DRAM bandwidth access/s MB/s MB/s ----------- ---------- -------------- Base 74238178.0 4291.7 Patch 1 146050652.3 359.4 11248.6 Patch 2 146300787.1 355.2 11237.2 Patch 3 162536383.0 211.7 11890.4 Patch 4 157187775.0 105.9 10412.3 Patch 5 164028415.2 73.3 10810.6 Patch 6 162666229.4 74.6 10715.1 The whole patchset improves the benchmark score up to 119.1%. The basic AutoNUMA based optimization solution (patch 1), the hot page selection algorithm (patch 3), and the threshold automatic adjustment algorithms (patch 5) improves the performance or reduce the overhead (promotion MB/s) mostly. Changelog: v4: - Rebased on the latest page demotion patchset. (which bases on v5.9-rc6) - Add page promotion counter. v3: - Move the rate limit control as late as possible per Mel Gorman's comments. - Revise the hot page selection implementation to store page scan time in struct page. - Code cleanup. - Rebased on the latest page demotion patchset. v2: - Addressed comments for V1. - Rebased on v5.5. Huang Ying (6): autonuma: Optimize page placement for memory tiering system autonuma, memory tiering: Skip to scan fast memory autonuma, memory tiering: Hot page selection with hint page fault latency autonuma, memory tiering: Rate limit NUMA migration throughput autonuma, memory tiering: Adjust hot threshold automatically autonuma, memory tiering: Add page promotion counter include/linux/mm.h | 29 ++++++++ include/linux/mmzone.h | 11 ++++ include/linux/node.h | 5 ++ include/linux/sched/sysctl.h | 12 ++++ kernel/sched/core.c | 9 +-- kernel/sched/fair.c | 124 +++++++++++++++++++++++++++++++++++ kernel/sysctl.c | 22 ++++++- mm/huge_memory.c | 41 ++++++++---- mm/memory.c | 11 +++- mm/migrate.c | 52 +++++++++++++-- mm/mmzone.c | 17 +++++ mm/mprotect.c | 19 +++++- mm/vmscan.c | 15 +++++ mm/vmstat.c | 4 ++ 14 files changed, 345 insertions(+), 26 deletions(-)