From patchwork Thu Aug 1 06:50:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: BiscuitOS Broiler X-Patchwork-Id: 13749894 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5188C3DA4A for ; Thu, 1 Aug 2024 06:53:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6ADEE6B00A3; Thu, 1 Aug 2024 02:53:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6318B6B00A4; Thu, 1 Aug 2024 02:53:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4857F6B00A6; Thu, 1 Aug 2024 02:53:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 287606B00A3 for ; Thu, 1 Aug 2024 02:53:20 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DAB4E40872 for ; Thu, 1 Aug 2024 06:53:19 +0000 (UTC) X-FDA: 82402760118.25.27A695D Received: from h3cspam02-ex.h3c.com (smtp.h3c.com [60.191.123.50]) by imf22.hostedemail.com (Postfix) with ESMTP id 995BFC0023 for ; Thu, 1 Aug 2024 06:53:16 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of zhang.renze@h3c.com designates 60.191.123.50 as permitted sender) smtp.mailfrom=zhang.renze@h3c.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722495142; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=IcUjljgsaFRtKLEpHcxR+C40t/7du/oPS3Q13sPoBck=; b=xaAXbMsZDOfDGHgPnIfIYdcu9WbrpQUjcM+/9j4xSqX46yxaTvreSe8shqeenrlLwcchdZ LpWf42ATCmaDFygMcRWfDtEestX2WEpQIFReZNk8WraJw5Ono0q99DgMFGzOc3ewhTi5HP TsMsSC7Pxs1n4nw1WXic5ghBvTurFas= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722495142; a=rsa-sha256; cv=none; b=1Pmw6QeX2UDPfnnBIYlESXUc0jlGWoZ3Nqf77d+PNtxpwLdLseQAzflNCQwniYa96trVhb rFWUW5SAs/ATvdCipF7S+y7H4fENDDgr+Ppd0W8ivpRhcf2vJKnJyzAX58kKiMQtNYjSh0 4K28/YcE0Axv1UsbHqehlAeAssvdKR0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of zhang.renze@h3c.com designates 60.191.123.50 as permitted sender) smtp.mailfrom=zhang.renze@h3c.com Received: from mail.maildlp.com ([172.25.15.154]) by h3cspam02-ex.h3c.com with ESMTP id 4716p8Fp012906; Thu, 1 Aug 2024 14:51:08 +0800 (GMT-8) (envelope-from zhang.renze@h3c.com) Received: from DAG6EX09-BJD.srv.huawei-3com.com (unknown [10.153.34.11]) by mail.maildlp.com (Postfix) with ESMTP id B360A20071B5; Thu, 1 Aug 2024 14:55:54 +0800 (CST) Received: from localhost.localdomain (10.99.206.12) by DAG6EX09-BJD.srv.huawei-3com.com (10.153.34.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1258.27; Thu, 1 Aug 2024 14:51:10 +0800 From: BiscuitOS Broiler To: , , CC: , , , , , , , , , , , , , , , , , Subject: [PATCH 0/1] mm: introduce MADV_DEMOTE/MADV_PROMOTE Date: Thu, 1 Aug 2024 14:50:34 +0800 Message-ID: <20240801065035.15377-1-zhang.renze@h3c.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Originating-IP: [10.99.206.12] X-ClientProxiedBy: BJSMTP02-EX.srv.huawei-3com.com (10.63.20.133) To DAG6EX09-BJD.srv.huawei-3com.com (10.153.34.11) X-DNSRBL: X-MAIL: h3cspam02-ex.h3c.com 4716p8Fp012906 X-Rspamd-Queue-Id: 995BFC0023 X-Stat-Signature: axxjwdbgqebw7p8xdfsentne5e3titr4 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1722495196-219940 X-HE-Meta: U2FsdGVkX18peN2kASuH2kz/LtnKrbh9+8dobcKHBOWyl+BM2RN5d+iPicayD+c8wYD2yQSKgAJK+w7/HxyLo5Z00566+32sjNSR1QUZwdv975WTaDdWxkVV077/83Omd7X3EAX8Ink4qnUvstckUWT6irTshA8sb7XWjDzwOVAHfzJ6FbDC1uQrQfWfTFdizmPGF7+MT2EoT/73wOUlT7Wa2HB60NgWt6B44x+IWmG2/Bl07L+Z9Gi6p1Qx//obfXd4quyCSZEx427OB6VAtINniNNxShYI6pX6EtTyEWlBTJRA363TBUOvjiZSdybi2r9q2393lYpJS3JerahV/GCnscA7oRireG3HYq/B7hoI/NsubKzzqqLekexMlcpSiJV5bUn31c+oDOUEEX0CfgbPdClvhc2/mtHuF8h0mcOq1IEdxUh90zJfY4vTQejUeGGztHdYBx/zwAAVZRo3obBvXO9QYgbQ4Vn2L1s4vn29KcovGkxQQLSWi1yBNMPW1kJblFKMB4uazcvcGFamxaW/uAAcVJ1hp6rCBcuveY83I2rpcv+tZTKNzzughVTf/2Ibi/HQBAnu1r8NOFU3yycjHtXU/anooDO57KkFrcGCzIvKZ27K7D2HpW2mxVJtu2FQmmoR9h/gA/WyIOoepfDU5/wfQ0rYJIOFgdde3Jw1KXhOEJEDY0tMIv+mKK938tROsViax+09KT8C0lwsd7UMvKYPKolNJFcyQSdNTjGFwbcX+Aq8WmqQH0SM0BbfxAoa57sEK+DS7JA0HRlEJsRVJ59CrNKtHq0uEs/dpqrqy1xz8/E7+Q7pt3y6+dsvljlIqDzzzBkeFfWrTII3XIlB4o3Q9yBQr7ykHEhd/mqxWT4yoSBNrLQWOCmGc6iTowt9R6oz6itmxKe2epdjWee1GocI0lLjettneK4giWRxHaZfivBA5vT1TizUxMd7cl5sZ/CA8Klskscapiv cjq1rbBa Wc4/KsOw7Nt+A7w7T5JhSMT3jwH3ZmLSQx3XwqjWAueQZe+TBz4zW5AbLHoc6gqGySdFEzQ5DXnXu9QMkdiuQxykOafL2/W5AHrkVsW2jGHAa3QtOneg+WgBueKvO309CDmS/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000255, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Sure, here's the Scalable Tiered Memory Control (STMC) **Background** In the era when artificial intelligence, big data analytics, and machine learning have become mainstream research topics and application scenarios, the demand for high-capacity and high- bandwidth memory in computers has become increasingly important. The emergence of CXL (Compute Express Link) provides the possibility of high-capacity memory. Although CXL TYPE3 devices can provide large memory capacities, their access speed is lower than traditional DRAM due to hardware architecture limitations. To enjoy the large capacity brought by CXL memory while minimizing the impact of high latency, Linux has introduced the Tiered Memory architecture. In the Tiered Memory architecture, CXL memory is treated as an independent, slower NUMA NODE, while DRAM is considered as a relatively faster NUMA NODE. Applications allocate memory from the local node, and Tiered Memory, leveraging memory reclamation and NUMA Balancing mechanisms, can transparently demote physical pages not recently accessed by user processes to the slower CXL NUMA NODE. However, when user processes re-access the demoted memory, the Tiered Memory mechanism will, based on certain logic, decide whether to promote the demoted physical pages back to the fast NUMA NODE. If the promotion is successful, the memory accessed by the user process will reside in DRAM; otherwise, it will reside in the CXL NODE. Through the Tiered Memory mechanism, Linux balances betweenlarge memory capacity and latency, striving to maintain an equilibrium for applications. **Problem** Although Tiered Memory strives to balance between large capacity and latency, specific scenarios can lead to the following issues: 1. In scenarios requiring massive computations, if data is heavily stored in CXL slow memory and Tiered Memory cannot promptly promote this memory to fast DRAM, it will significantly impact program performance. 2. Similar to the scenario described in point 1, if Tiered Memory decides to promote these physical pages to fast DRAM NODE, but due to limitations in the DRAM NODE promote ratio, these physical pages cannot be promoted. Consequently, the program will keep running in slow memory. 3. After an application finishes computing on a large block of fast memory, it may not immediately re-access it. Hence, this memory can only wait for the memory reclamation mechanism to demote it. 4. Similar to the scenario described in point 3, if the demotion speed is slow, these cold pages will occupy the promotion resources, preventing some eligible slow pages from being immediately promoted, severely affecting application efficiency. **Solution** We propose the **Scalable Tiered Memory Control (STMC)** mechanism, which delegates the authority of promoting and demoting memory to the application. The principle is simple, as follows: 1. When an application is preparing for computation, it can promote the memory it needs to use or ensure the memory resides on a fast NODE. 2. When an application will not use the memory shortly, it can immediately demote the memory to slow memory, freeing up valuable promotion resources. STMC mechanism is implemented through the madvise system call, providing two new advice options: MADV_DEMOTE and MADV_PROMOTE. MADV_DEMOTE advises demote the physical memory to the node where slow memory resides; this advice only fails if there is no free physical memory on the slow memory node. MADV_PROMOTE advises retaining the physical memory in the fast memory; this advice only fails if there are no promotion slots available on the fast memory node. Benefits brought by STMC include: 1. The STMC mechanism is a variant of on-demand memory management designed to let applications enjoy fast memory as much as possible, while actively demoting to slow memory when not in use, thus freeing up promotion slots for the NODE and allowing it to run in an optimized Tiered Memory environment. 2. The STMC mechanism better balances large capacity and latency. **Shortcomings of STMC** The STMC mechanism requires the caller to manage memory demotion and promotion. If the memory is not promptly demoting after an promotion, it may cause issues similar to memory leaks, leading to short-term promotion bottlenecks. BiscuitOS Broiler (1): mm: introduce MADV_DEMOTE/MADV_PROMOTE arch/alpha/include/uapi/asm/mman.h | 3 + arch/mips/include/uapi/asm/mman.h | 3 + arch/parisc/include/uapi/asm/mman.h | 3 + arch/xtensa/include/uapi/asm/mman.h | 3 + include/uapi/asm-generic/mman-common.h | 3 + mm/internal.h | 1 + mm/madvise.c | 251 +++++++++++++++++++ mm/vmscan.c | 57 +++++ tools/include/uapi/asm-generic/mman-common.h | 3 + 9 files changed, 327 insertions(+) --- 2.34.1 ------------------------------------------------------------------------------------------------------------------------------------- ±¾Óʼþ¼°Æ丽¼þº¬ÓÐлªÈý¼¯Íŵı£ÃÜÐÅÏ¢£¬½öÏÞÓÚ·¢Ë͸øÉÏÃæµØÖ·ÖÐÁгö µÄ¸öÈË»òȺ×é¡£½ûÖ¹ÈκÎÆäËûÈËÒÔÈκÎÐÎʽʹÓ㨰üÀ¨µ«²»ÏÞÓÚÈ«²¿»ò²¿·ÖµØй¶¡¢¸´ÖÆ¡¢ »òÉ¢·¢£©±¾ÓʼþÖеÄÐÅÏ¢¡£Èç¹ûÄú´íÊÕÁ˱¾Óʼþ£¬ÇëÄúÁ¢¼´µç»°»òÓʼþ֪ͨ·¢¼þÈ˲¢É¾³ý±¾ Óʼþ£¡ This e-mail and its attachments contain confidential information from New H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!