From patchwork Thu Aug 1 07:56:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: BiscuitOS Broiler X-Patchwork-Id: 13749932 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2F96C3DA64 for ; Thu, 1 Aug 2024 07:57:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5ED266B0083; Thu, 1 Aug 2024 03:57:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 576636B0098; Thu, 1 Aug 2024 03:57:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 418AA6B0093; Thu, 1 Aug 2024 03:57:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1FC3C6B0098 for ; Thu, 1 Aug 2024 03:57:54 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C3EB1140908 for ; Thu, 1 Aug 2024 07:57:53 +0000 (UTC) X-FDA: 82402922826.18.B7F7403 Received: from h3cspam02-ex.h3c.com (smtp.h3c.com [60.191.123.50]) by imf07.hostedemail.com (Postfix) with ESMTP id 9F13D40016 for ; Thu, 1 Aug 2024 07:57:50 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; spf=pass (imf07.hostedemail.com: domain of zhang.renze@h3c.com designates 60.191.123.50 as permitted sender) smtp.mailfrom=zhang.renze@h3c.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722499066; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references; bh=3A+iUY4rSOtGBjFf13M6rsrZq1kjTb/KbgsoCXR/djo=; b=ylhMEDVcKH8+SzSReoLvki68DR3wA0026dqGJZ4JOOxjyxNJJMP/M3gBS7stqJTe9mxYdf e7mDuT3CQWZ4lskTY5ryWSFy9/2igfibUFpsqr3xbVilqnyW3f/j+6FdkXB5Y79UCw1Maw 0iy4hdpuXHhwPQ7/QFhFzqNmwRcA6kU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; spf=pass (imf07.hostedemail.com: domain of zhang.renze@h3c.com designates 60.191.123.50 as permitted sender) smtp.mailfrom=zhang.renze@h3c.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722499066; a=rsa-sha256; cv=none; b=wYaNAT6azOU4fBoAj0A3N8YZC8f4NGowG3DDiaKeikGV/hfJk9IuqBo4o7GPBX8gf8N8dr RAHxOoiquoU+11UxOVDWyUDsxDML6AWBLrSRJrJ+l0lHYlQtiu9/4Stg7LA4RGR1DHAK6k paRW2nNDLloq8oHbi9V7tHZsAzGlNZQ= Received: from mail.maildlp.com ([172.25.15.154]) by h3cspam02-ex.h3c.com with ESMTP id 4717uGhC049139; Thu, 1 Aug 2024 15:56:16 +0800 (GMT-8) (envelope-from zhang.renze@h3c.com) Received: from DAG6EX09-BJD.srv.huawei-3com.com (unknown [10.153.34.11]) by mail.maildlp.com (Postfix) with ESMTP id 15BCD2004735; Thu, 1 Aug 2024 16:01:03 +0800 (CST) Received: from localhost.localdomain (10.99.206.12) by DAG6EX09-BJD.srv.huawei-3com.com (10.153.34.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1258.27; Thu, 1 Aug 2024 15:56:18 +0800 From: BiscuitOS Broiler To: , , CC: , , , , , , , , , , , , , , , , , Subject: [PATCH v2 0/1] mm: introduce MADV_DEMOTE/MADV_PROMOTE Date: Thu, 1 Aug 2024 15:56:09 +0800 Message-ID: <20240801075610.12351-1-zhang.renze@h3c.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Originating-IP: [10.99.206.12] X-ClientProxiedBy: BJSMTP01-EX.srv.huawei-3com.com (10.63.20.132) To DAG6EX09-BJD.srv.huawei-3com.com (10.153.34.11) X-DNSRBL: X-MAIL: h3cspam02-ex.h3c.com 4717uGhC049139 X-Rspam-User: X-Stat-Signature: mz68phbmhjcfogg1ai3tkb8nohje5jep X-Rspamd-Queue-Id: 9F13D40016 X-Rspamd-Server: rspam11 X-HE-Tag: 1722499070-267378 X-HE-Meta: U2FsdGVkX1/TgarR4JJ5ZsKOtbUr1w90Ygg1lqGkmjK7B9709Yf7veDqBbW6Ro1BpvYjM1vK+8q/6i26LGHZ2j8LCg+frBuBPN/jO/IUSRCuo8d7YsrxgULH2vPgrx1y2hB0SQE1pJWjWPycAr3ywdqwmNI4x88HRbdat3LVIA2V048CWMq/Lgtw5xHLdERW38fnjPy04jUehXR94ZEo7KcsKEHo/Mxr7vzFSObPz5eyV+LbmqjVngADsNRkZ/9OZQNP9w8zX9iRPZIoGFx2gzoHkj8f/VyFYNKhlzbLZ7YzI0HVw3ufDve7TO7sEnQQu1kVNKDG56wRbYC2OJvvoU3e1CnTmPtQ4BfhjgE4MxOBZzqqiVnlcL2EtOSfGsbCD7oDFcfINdkUEbcJH09Hn0Y1tKpqlOZDEIQW2MvhaEbpNngDV7PfwRHvRMIhmohOZ3OVtmVzLFRNyOM+ztq1qOPP1XasDiuURvhsVW3S0C1SVewzIiKqc7xiGfmc8+rPGL4XfWQgTBQUI2Ib7fmU4NyNdIxRFrKS4TA4PM4odWm9H4Fjjz/gfV86vzjdIvUo9RAm3QHTduHE5QuTcGQop0+R921RQhBdMnbMWE7q9JTcs5IdWZjR5Su16c6YxvX86Ij2z4N1dj4z7eds4CwMi0+YXCDCk+CpuHpjlVIY5LDCdO0cyuBvuq8rCwo59tP2xju4RUUAp8k/l80mJytc6+JMYC5PACTvhuFIJh/MftcIGLJKv3Nz+tXiuue72HQywJxE7SS1SaHRrENLupcESpjx5bhbIX7yNLkU0n6Myv0oWY8S96Kk2LmiU21/ZXAgTllaxAA+7ViFbBYO74cL2BEeqWf55JWc7dm5js1ssikDJbLGRKd+IH1qehNTBwmuTSyQdLauxNMcdMJrIJGwR10hhW3564wIml1IZYl7v+bqU26FyTSLOwnZ8AV4IrVNY0OgqHv3U3zfXIczF40 VoyZRxkC jtuTUle15bwG3hqEa1VVpSXfRuuDd/n+CxVwzFXy+wHjmuTm2dfdWus9EL1POqQ9hOfOccZ5wB++LRTcX5AVwbOUE2EAI0A3gwWR6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000273, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Sure, here's the Scalable Tiered Memory Control (STMC) **Background** In the era when artificial intelligence, big data analytics, and machine learning have become mainstream research topics and application scenarios, the demand for high-capacity and high- bandwidth memory in computers has become increasingly important. The emergence of CXL (Compute Express Link) provides the possibility of high-capacity memory. Although CXL TYPE3 devices can provide large memory capacities, their access speed is lower than traditional DRAM due to hardware architecture limitations. To enjoy the large capacity brought by CXL memory while minimizing the impact of high latency, Linux has introduced the Tiered Memory architecture. In the Tiered Memory architecture, CXL memory is treated as an independent, slower NUMA NODE, while DRAM is considered as a relatively faster NUMA NODE. Applications allocate memory from the local node, and Tiered Memory, leveraging memory reclamation and NUMA Balancing mechanisms, can transparently demote physical pages not recently accessed by user processes to the slower CXL NUMA NODE. However, when user processes re-access the demoted memory, the Tiered Memory mechanism will, based on certain logic, decide whether to promote the demoted physical pages back to the fast NUMA NODE. If the promotion is successful, the memory accessed by the user process will reside in DRAM; otherwise, it will reside in the CXL NODE. Through the Tiered Memory mechanism, Linux balances betweenlarge memory capacity and latency, striving to maintain an equilibrium for applications. **Problem** Although Tiered Memory strives to balance between large capacity and latency, specific scenarios can lead to the following issues: 1. In scenarios requiring massive computations, if data is heavily stored in CXL slow memory and Tiered Memory cannot promptly promote this memory to fast DRAM, it will significantly impact program performance. 2. Similar to the scenario described in point 1, if Tiered Memory decides to promote these physical pages to fast DRAM NODE, but due to limitations in the DRAM NODE promote ratio, these physical pages cannot be promoted. Consequently, the program will keep running in slow memory. 3. After an application finishes computing on a large block of fast memory, it may not immediately re-access it. Hence, this memory can only wait for the memory reclamation mechanism to demote it. 4. Similar to the scenario described in point 3, if the demotion speed is slow, these cold pages will occupy the promotion resources, preventing some eligible slow pages from being immediately promoted, severely affecting application efficiency. **Solution** We propose the **Scalable Tiered Memory Control (STMC)** mechanism, which delegates the authority of promoting and demoting memory to the application. The principle is simple, as follows: 1. When an application is preparing for computation, it can promote the memory it needs to use or ensure the memory resides on a fast NODE. 2. When an application will not use the memory shortly, it can immediately demote the memory to slow memory, freeing up valuable promotion resources. STMC mechanism is implemented through the madvise system call, providing two new advice options: MADV_DEMOTE and MADV_PROMOTE. MADV_DEMOTE advises demote the physical memory to the node where slow memory resides; this advice only fails if there is no free physical memory on the slow memory node. MADV_PROMOTE advises retaining the physical memory in the fast memory; this advice only fails if there are no promotion slots available on the fast memory node. Benefits brought by STMC include: 1. The STMC mechanism is a variant of on-demand memory management designed to let applications enjoy fast memory as much as possible, while actively demoting to slow memory when not in use, thus freeing up promotion slots for the NODE and allowing it to run in an optimized Tiered Memory environment. 2. The STMC mechanism better balances large capacity and latency. **Shortcomings of STMC** The STMC mechanism requires the caller to manage memory demotion and promotion. If the memory is not promptly demoting after an promotion, it may cause issues similar to memory leaks, leading to short-term promotion bottlenecks. BiscuitOS Broiler (1): mm: introduce MADV_DEMOTE/MADV_PROMOTE arch/alpha/include/uapi/asm/mman.h | 3 + arch/mips/include/uapi/asm/mman.h | 3 + arch/parisc/include/uapi/asm/mman.h | 3 + arch/xtensa/include/uapi/asm/mman.h | 3 + include/uapi/asm-generic/mman-common.h | 3 + mm/internal.h | 1 + mm/madvise.c | 251 +++++++++++++++++++ mm/vmscan.c | 57 +++++ tools/include/uapi/asm-generic/mman-common.h | 3 + 9 files changed, 327 insertions(+)