From patchwork Thu Dec 12 07:37:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13904768 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 144BAE7717F for ; Thu, 12 Dec 2024 07:37:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 767926B007B; Thu, 12 Dec 2024 02:37:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F07A6B0082; Thu, 12 Dec 2024 02:37:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 569E06B0083; Thu, 12 Dec 2024 02:37:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 346DB6B007B for ; Thu, 12 Dec 2024 02:37:28 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8B4E1141685 for ; Thu, 12 Dec 2024 07:37:27 +0000 (UTC) X-FDA: 82885500810.25.522C5A5 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf08.hostedemail.com (Postfix) with ESMTP id C2FE4160008 for ; Thu, 12 Dec 2024 07:37:09 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BnP+AMuv; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733989029; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=f5INp/jeRXwEqox6fsGjhN0owAbHsffF0u6E3Zic+N0=; b=CaCgW0JCfzb01wyRUuxrnMwoGJWg45J3V5r7KDB3ueKZn5DRJZaW+rhPV88V2MEyeA7vzS EHD1KmSbOYlr/vEh9L3FO9WbufSHEWnG+ODkiOjHOC9iGLz2yTDlVLye0G01/wLdq4Xuk5 TwMdnVZ6e8/ZcvyGnWXYgore1HNBSvk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733989029; a=rsa-sha256; cv=none; b=MTWz7vPs0T6y0dobpi1GNsoKIuYy+73scffp3bnlnbFv2m5pm/3dbjHXoosVz3n3Y+x8hK E8p4Kq19nKZ2Xl3Y/iFFAKccmiAETb8TPUDxBzidyAPVKanTrjfm/qNPGLjlN/FQNE5Auh Dqq+JfuNjVc/mB0A8XxYmwZhm1wUud0= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BnP+AMuv; spf=pass (imf08.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-21683192bf9so2817605ad.3 for ; Wed, 11 Dec 2024 23:37:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733989044; x=1734593844; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=f5INp/jeRXwEqox6fsGjhN0owAbHsffF0u6E3Zic+N0=; b=BnP+AMuv8BBDTUww1dsWnOsvfZcs6sJqiWe4jagXUe8ti7KGlJajyiTFxYnZKYMbiu GRjYELLx+nD5tveqruhm7kJRr9wvE9InCQCebmsiLJhxeXDPcyROWeA4rnwTfj5ox05A DrAMFh4HFbZe01Kjqp7bchxyrTb4rG6OF1cy6gxy89+CrgqoSNKkw/kbJV7xgufokPiZ OUuGcVzcHaq17DjZjUNIj6vrAj8UiWMdnZesuKRKzzyKth93y8FTnADyF6+7j9SVy2UP O7xWsB4pcQwBpI01L7xK9LYWph6Y2rLpmxhl68Sl42184IGapfMUMuwq+I0/y3NA9qdH FQFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733989044; x=1734593844; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=f5INp/jeRXwEqox6fsGjhN0owAbHsffF0u6E3Zic+N0=; b=wi29HXbUlITJ5Sh/y62KbgyrQ0SZfbDL5qK29NJn18hQtWP0fsJpNA5sMNbuk9cdt4 1hgMd2C5ma25wy3ptJ9SVcvTSes7skfdcV/RCDJ/VRepMPfFt55B2ziKiRZGSb2/pZm9 i8s4NaOQT/53KbaYYFIacUHDElsebg4l5zQZEg0hGiNj1JEeS5v9XmvcjokZTSWmyDrH /V/AblC2WWiZ4QiaZY7s1zwT1Pwv76y6TGDApaIBxZ+4MhWuUP3KKZ7Si8Wfzn2OqHIf CLxLV+a0fJCJRK3ziRomkMpxRzz62inTG3dA6AjnNJKjmpmhSNbRK4c/GeiJr0d41lo+ 8L/g== X-Forwarded-Encrypted: i=1; AJvYcCXFjLy+KS8wrocT+IpM3ZOT2jRZxsspj7fQTzIAp81TOCqfpjDXGSVbcYLPcD1YoBVEAYAyrBuBdQ==@kvack.org X-Gm-Message-State: AOJu0YwuLncPsUIUgzYgMmbCxPA2kdIYU2nHjRN1gUR1gbHzy6GYf/hs GssmjSYYcxnOj5EdeJr/HbXwpuaN3MWAyn22tY6gUlR+sGJzHwFs X-Gm-Gg: ASbGncuYFisKAMr3up6eaBiVR1rKtCdAlYQNCIuTT93THLDFP1zswJjjuWdjHyoEqg9 Oh3wdaC52x5GRfCdsyDksJ0EiDu2hnOxaUXK9kE0GkpOtymCjS9QU0Qtxk1o5ABv3h7YcAjZQt9 N5et8OCU3Fm1UAwsGeSCgAYEEZA5xx656liI7TF/unqSlobgo4ESBfmFRivllbyouli145iCxz0 6GoApykXC1ez9W1OwufGN30l8oc/fPnUwXT+6yUvvMYvZEPj9lEOMHjY2dQCkRcbB1a46AQ025u N2bfmfxE X-Google-Smtp-Source: AGHT+IHHtjl+FLemjrNZhpMmvZRIsYTrMbPsj+aiwtMOGZ7Mc24qoR2BQlnz2qwHutWPKU7UmdPHhA== X-Received: by 2002:a17:903:1d0:b0:216:5e6e:68ae with SMTP id d9443c01a7336-2177851db38mr94735955ad.31.1733989044321; Wed, 11 Dec 2024 23:37:24 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:2599:c511:827d:d3a0]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21615dbcd48sm101153845ad.109.2024.12.11.23.37.17 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 11 Dec 2024 23:37:23 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: axboe@kernel.dk, bala.seshasayee@linux.intel.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kanchana.p.sridhar@intel.com, kasong@tencent.com, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, terrelln@fb.com, usamaarif642@gmail.com, v-songbaohua@oppo.com, wajdi.k.feghali@intel.com, willy@infradead.org, ying.huang@linux.alibaba.com, yosryahmed@google.com, baolin.wang@linux.alibaba.com Subject: [PATCH RFC] mm: map zero-filled pages to zero_pfn while doing swap-in Date: Thu, 12 Dec 2024 20:37:11 +1300 Message-Id: <20241212073711.82300-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C2FE4160008 X-Stat-Signature: e7uzpu4jmx9poj599i6yhboja4s4hky1 X-Rspam-User: X-HE-Tag: 1733989029-75609 X-HE-Meta: U2FsdGVkX18zu9atRJYjYeT0Y8IDsOrK7LAf73pHbeEJp3s6unkG6NRbCng/EN1XOleG6ZoD9DD+ciF4+G3k+i6J72O5rdk3UogGx2y7YorTIZxDZSXIYrNx/4DA9eiD058OpNzZYpmIADb1ro1DFREkajBAIX/UtUVDfiGxwzKIgZ7IRQ3+tZz3fS18TsQCJhLs49kOO0/dvUci5IvQUB0QReGE8LKMYzSWQoQAZ8t5fQ/B0B0HT4wbHJAxKY8o+KGdyCvJ+WrV2GZfgE7sHLOgZjKCFdrClhe4vqwNK2tFzbYf4H4hPONKYjB0TpguCOWIFoDIEOy0yFPHOaHuxaeDoOgErm+7dy0g89qYhoXltMjdNmiK6eVZp+cKZiqhmitZj2ElLVWybMCE2Dr4wJj36fUMX+6ry5Xud+yj0epxUG+PuxVAu/9MY9L4MnVpI/O+I56Jk9Vwu7HdtPX4vdmh1QHRXw2VMBX0nOtdiSiudFqucST9djwhQIbX0XWFXhjbHNf5fzAxF3YMtzzUFlAv1j8tAkRWLjguQgyXOuBMY7+c//KTpTCsYWwDXclJyfRjafKLykt+1kZ3+dEiRXTQLxSlF6x4rM8AQdkPh3/oK+A1Z7UAS00FPeR4uVX7Ae2dcwZuzPR5jsp68dl7siqfiJZPwGEYOlGxgYln+wRuTNeqvaQWaiJtz994XdoDFCH9XvR7PO79CarTJBfuCXyNzAuY3+3kgLzFZYBH3VoFaR2oMb658rztbJVOPBq6Bzne6NkuvXjX7qzhWGM+rCQvBMHOvNH1VjcvznGpuQtsS2C3F1BdzXUU1LykuyYr4Crlo6UuSVADJB3y3cFoncy9P/qgidPTPEx8lQuYon/OC1pOtQ3gctq58zkq5OMeJc0jRff3bjtGJW63e66CKOekgpt2oK7KqmIHAO8BB4e5Jc/GmJ1E06Ps11nwIS5uTs567woFs0WLTG1VIrh 7LklgXxm 5CQj64RKkVCz3P0qdTt+3ukNLWfUR3jGTqaBNHHeomYdW7dNAtu9usKqj/HlzQgZvbNCA/1n0zBRb15Z0UPvdPB+krydCBgUzPhUm2V/FMCNOKZ6SIEL4GCUJBRSXt420DQc5bH8/rtkQK7G6cn4JouYf1zAGn+ss4EgG5BRJXoHB+bO81sFFyXDkciFXXHXXvfrhNJlPlYUcJBogPxXEtuHtYszeg32gRcdvFevvN8NWgzwoou9KoZpG/VVjl8Vq4HKpEStabCLwEN7GXUw07Wpb1SSjWb3FC1Ip60YD+qO1m838xUvg7b2NTgwaqkXhRe50D6Ow02dC4lhngV+POq5WmtI1qELO5TjX5w15929vF9USOE+M1VqJbI3y1xaEY5AjUr9nB0we2Xa2fxX7aWys/3WLPZq26rwIAsEtaNobhq0VZ17Dvu1h3q78OVjGNhLaFWEQxutzYUaaOxTIxaz+kw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000007, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song While developing the zeromap series, Usama observed that certain workloads may contain over 10% zero-filled pages. This may present an opportunity to save memory by mapping zero-filled pages to zero_pfn in do_swap_page(). If a write occurs later, do_wp_page() can allocate a new page using the Copy-on-Write mechanism. For workloads with numerous zero-filled pages, this can greatly reduce the RSS. For example: #include #include #include #include #include #define SIZE (20 * 1024 * 1024) int main() { volatile char *buffer = (char *)mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); volatile char data; if (buffer == MAP_FAILED) { perror("mmap failed"); exit(EXIT_FAILURE); } memset(buffer, 0, SIZE); if (madvise(buffer, SIZE, MADV_PAGEOUT) != 0) perror("madvise MADV_PAGEOUT failed"); for (size_t i = 0; i < SIZE; i++) data = buffer[i]; sleep(1000); return 0; } ~ # ./a.out & w/o patch: ~ # ps aux | head -n 1; ps aux | grep '[a]\.out' USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 101 2.9 10.6 22540 21268 ttyAMA0 S 06:50 0:00 ./a.out w/ patch: ~ # ps aux | head -n 1; ps aux | grep '[a]\.out' USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 141 0.1 0.3 22540 792 ttyAMA0 S 06:38 0:00 ./a.out Signed-off-by: Barry Song --- mm/memory.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 2bacebbf4cf6..b37f0f61d0bc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4294,6 +4294,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; bool need_clear_cache = false; + bool map_zero_pfn = false; bool exclusive = false; swp_entry_t entry; pte_t pte; @@ -4364,6 +4365,39 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swapcache = folio; if (!folio) { + /* Use the zero-page for reads */ + if (!(vmf->flags & FAULT_FLAG_WRITE) && + !mm_forbids_zeropage(vma->vm_mm) && + __swap_count(entry) == 1) { + swap_zeromap_batch(entry, 1, &map_zero_pfn); + if (map_zero_pfn) { + if (swapcache_prepare(entry, 1)) { + add_wait_queue(&swapcache_wq, &wait); + schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); + goto out; + } + nr_pages = 1; + need_clear_cache = true; + pte = pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address), + vma->vm_page_prot)); + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, + &vmf->ptl); + if (unlikely(!vmf->pte || !pte_same(ptep_get(vmf->pte), + vmf->orig_pte))) + goto unlock; + + page = pfn_to_page(my_zero_pfn(vmf->address)); + arch_swap_restore(entry, page_folio(page)); + swap_free_nr(entry, 1); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -1); + set_ptes(vma->vm_mm, vmf->address, vmf->pte, pte, 1); + arch_do_swap_page_nr(vma->vm_mm, vma, vmf->address, pte, pte, 1); + update_mmu_cache_range(vmf, vma, vmf->address, vmf->pte, 1); + goto unlock; + } + } + if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { /* skip swapcache */