From patchwork Tue Aug 6 01:24:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13754300 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B23D6C3DA7F for ; Tue, 6 Aug 2024 01:24:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4077D6B0083; Mon, 5 Aug 2024 21:24:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B6CD6B0085; Mon, 5 Aug 2024 21:24:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A6546B0088; Mon, 5 Aug 2024 21:24:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0D4386B0083 for ; Mon, 5 Aug 2024 21:24:26 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BB477141DC2 for ; Tue, 6 Aug 2024 01:24:25 +0000 (UTC) X-FDA: 82420075290.22.FB36C8C Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf06.hostedemail.com (Postfix) with ESMTP id DED20180003 for ; Tue, 6 Aug 2024 01:24:23 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Md+CIMko; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722907456; a=rsa-sha256; cv=none; b=NZXnzV4NlaIRn176nN1LvWkEWtvtl+OBbXs4QA9vcQtdK4lpp3dqezMAej6gP68uX6ZKzQ xJ4iSDCPFYhUsnkm+BK9Ju16E2eIZ+oCcLwoJge0MGRhZF2bzvzba74Fvxn0HJm5M4lNYW gxD4iISE/4LtMrOhc30DYcpgotwueq8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Md+CIMko; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722907456; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=OsOKPbhkPaiydzWP1r4G7J0zjFvZD/lYhHn26zn4nj4=; b=FT+erfhOYYFZm6LyVcWvWgCTwNPqBY7qGG33RYHLVNZnl8k9aHZOgDhCpeuYqRV96ICPeF jAxRPWX6MfwadLJhiVHwal+dm+zrQyitMwyy6MRVHBgOWgDLHqeYjPESqRKCjxKh9eT8rZ qNc3PRR2rgRVY0UkofHSF515QHRuMNA= Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-70d1fb6c108so68438b3a.3 for ; Mon, 05 Aug 2024 18:24:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722907462; x=1723512262; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=OsOKPbhkPaiydzWP1r4G7J0zjFvZD/lYhHn26zn4nj4=; b=Md+CIMkotwYuLJDJtTgHj7YAwO1+L2Hto6mNzOejbERXpYbbfFqkuqeq81Thzduljp P2N/4jcudHi30dwsBhg2qaj/BdyvVlqv4LQDsP5SBUr1ED4OFMG2KBvMy31jtu0a3R6w KfMMnw1dyx4sP8X2QqF+oSGiRmmnvxWkEisjvSiqm1JgQDiAYvduXUTHast8MIufSZrw AAthOw/afpg2n2SALNr3dtjNWlo3dsze1IscsHDeEsMDzCcabyLz/7eb6DgTMt5UsUsE pNNXByYR6CRXqcEKQQNr8Na1lsqIaNxsuq9/W1QLekqc2lsMcindEHFowZD5Vlkm8e1e +ERw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722907462; x=1723512262; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OsOKPbhkPaiydzWP1r4G7J0zjFvZD/lYhHn26zn4nj4=; b=idSNbW4jW+Es1H8kfs+2iQPJBS+Q4jTeu+LyEVKKeIqzzKxTcHF0U9dgBv1KRfGg6T gxxRSd27y+3iguSjBzT3qoT7d1kWjxTcOWbBa/6cfxQKEfR5iZo+Qbf9utk/H/kmrU7E MMmoKOjBZrmDkrADsx4OIUzJpyL2dY+PM69w0eDLJl+8mmhQEryGtwJyOf6FhCMtdU+g 0wuBM+0eO4GU4M39gcusefXBw1OmF3d+FyQ6kNdV3IfHhjEx4FpJ6E4FyQoR3PRTglNP PS1nYC4spRNcbTbz/un/OhMC6oncNi5LLZw31cXe8Q9KfrxLZKBZflPaI8akrYV/2+ya mzWg== X-Forwarded-Encrypted: i=1; AJvYcCUdIYl0yRbzfGx9feVfSyHmEUKK88Bg14N8V1cke+PE8z/746UFCaLUSLJKs4KZb8J7PjnNijxvmPbnXgy3sTM6C10= X-Gm-Message-State: AOJu0YyH5u4qtYT/R3GSmIltAYhEPSWGZaT66a9qGkLJQF0B9svmcio8 Yu1Lc3luQUOSHA/hVttv8UFjDLAXLLHnUFMsz0FVaYoIrbac6ITE X-Google-Smtp-Source: AGHT+IGQabLP6wltWJMRjg7CFpG1IbbrvX8MeFaOWs4NUqm6t+TEzRUeBUqafrhxsW9P18Bw0trtRw== X-Received: by 2002:a05:6a00:3d0c:b0:70d:39d9:4523 with SMTP id d2e1a72fcca58-7106cfce26cmr14875707b3a.7.1722907462284; Mon, 05 Aug 2024 18:24:22 -0700 (PDT) Received: from barry-desktop.hub ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7106ed0d462sm6215808b3a.174.2024.08.05.18.24.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Aug 2024 18:24:21 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Barry Song , Kairui Song , Chris Li , "Huang, Ying" , Hugh Dickins , Kalesh Singh , Ryan Roberts , David Hildenbrand Subject: [PATCH] mm: attempt to batch free swap entries for zap_pte_range() Date: Tue, 6 Aug 2024 13:24:09 +1200 Message-Id: <20240806012409.61962-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: DED20180003 X-Rspamd-Server: rspam01 X-Stat-Signature: znxsrdg7cj9cw5tf37tgwg9c4yh6xjw6 X-HE-Tag: 1722907463-720024 X-HE-Meta: U2FsdGVkX19JPP5gMoBymK506IMGQfpOm3cPJMrqwoy0RhK8i2w49FjBDaEg5ER0k/hdizFogB/HjQQWTSVGcP9TMruZ+P7zGO2VPl+JX8Z+qxn45nBQHSGB9EIzRzeeaL4OIMfN3Iq1gS6LJh36mrJ5Y4c8idJxbUeeVHTfQmYswM+61mBfUhZ7DvXaDBaIJc5osR805kbos5sjtvjwkp/XqPsdMSoLN8MJKi1/JR+jy21oK3zMtKQmVzD1hl533BCo6C67AEJmM/aj566j5mgPO0wZkzNrTPXaogX0SDmK0I0ZZO2tgiEta3/8zb3ES0DrhUlvHCe7C+1JUIpZgUHPlc9esPGWL+K7NXBSM/2Aho+/dKryrSAo7kkWOEvS8khZEAtypN3MEgsPV0yqDpIkSrm2WvC69pG0/hmSsy14ya0WAScmyhnsQDh4QHTXH49goRC0y29fqqR4k6aIwoz7NlFC6Ae2+il5utS4GiWNrFM4Lgxn2dRYN6NtEmAay6UvcNzSpdJ+fJpWMn/ruYVsjUrUrEGkhvbLbAX9gI/KZhThAoOIjUP/1aM8s2/Ki8a8iGIda4BN3Vq8WzXk7Dl8ZhAXEy+H188N/vjezKLexcBaFZ5jF9DrREzBmRKUy3XoA5+MuV/wFRGdCZ1ZblFs9SfuIU6dnu3ZeHrCJNcT3Yuq5BquIRvK8r//X7W3TPeZ3pOTFeGYwrq4YxegP9wHKWQ4i1MUqJobyouhqC2zMTXdGcvM6+BKfaDPTeTIB4Kz12ypCUcUxjVtZtSZYyinIBtTbAGJhACRjOBwRKUD2uGBzsNCMp2pA1tIe7T4opQlyx5SpuY6vHEBJ/wXMPwOigVhE2x+oGqC9eGqpvATMw0DKNR/wbNikKUYNmGTZRbek25TvDyCepzk21Sfq6bid1NkltawdWZcfIjC53MwntJb/3qhS3peCyD7O/86asZ7Y/hYipbkq+wEMd9 e/c79+s+ fWaoVTQadqtE/vk9Qr60YdKcuX3XKTXiMuCB1HdUFB0QxiBlnsrp7+43hVRwBKB9f5bamxaNEYziAW4sU5qQG3RKVnhdVEc/8KNbOxjrmJOW3XuTmmo9egvtWA1pl6CI82OMLYpkXjpekQUmfdEHpQSwWxcAWEYwwEImNxHOnuUSn0rILOcj2e0BYgo00aghwQQQKgf6TZZ0Wb3kb2YHE3gSqvGjo6blzKQ/fJHxtPILXveYmKg5DqBir7MJzhQIMubU25NA4UCwUvEdYA/546yesBXRDIQ8DHp3Qam1kUxxvrs0xQvIPX7ygnDT/VtMIVzLrBbogtisHFtAPjd4GERfeT8x5ZCBUPHv/CqvL0Wb0HDQqH1ob/J8x5JrH7Qzgd9MBW4fQ+NgkcI0ZPLXdrRD5Cus6riFTKS+oCrkR8f+Up6my0F4koA/H5HFCwMYvx61VPXDS12T7t0sfMtb8iXvUtq+NgoGYU0DuPoQLFewu0x+HCtQOBMAeg+G10XmVWNC/qc+jEl3IvZMVwogE8pQz2neJX6jDtoUPfU5OfBmDJPeDMtE4i5OfGSad4mG6kiUMx0JT/BD9G6HTH/+okErqGh6xBBVn0FVTDstqqGTQ1MmMuIi2xdeSa56GB5xCzlaDQpT4tayta5mgsrSD41mrMFG51DxZnRTldaN/m3B/uA+BaQCJO8wUw02UB6XUmUpKDKu6AXRVKLdyUTmzZTb9W7gmOj83ewFykdMI2rS3ksGA2mcsHFLOZw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Zhiguo reported that swap release could be a serious bottleneck during process exits[1]. With mTHP, we have the opportunity to batch free swaps. Thanks to the work of Chris and Kairui[2], I was able to achieve this optimization with minimal code changes by building on their efforts. If swap_count is 1, which is likely true as most anon memory are private, we can free all contiguous swap slots all together. Ran the below test program for measuring the bandwidth of munmap using zRAM and 64KiB mTHP: #include #include #include unsigned long long tv_to_ms(struct timeval tv) { return tv.tv_sec * 1000 + tv.tv_usec / 1000; } main() { struct timeval tv_b, tv_e; int i; #define SIZE 1024*1024*1024 void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (!p) { perror("fail to get memory"); exit(-1); } madvise(p, SIZE, MADV_HUGEPAGE); memset(p, 0x11, SIZE); /* write to get mem */ madvise(p, SIZE, MADV_PAGEOUT); gettimeofday(&tv_b, NULL); munmap(p, SIZE); gettimeofday(&tv_e, NULL); printf("munmap in bandwidth: %ld bytes/ms\n", SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b))); } The result is as below (munmap bandwidth): mm-unstable mm-unstable-with-patch round1 21053761 63161283 round2 21053761 63161283 round3 21053761 63161283 round4 20648881 67108864 round5 20648881 67108864 munmap bandwidth becomes 3X faster. [1] https://lore.kernel.org/linux-mm/20240731133318.527-1-justinjiang@vivo.com/ [2] https://lore.kernel.org/linux-mm/20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org/ Cc: Kairui Song Cc: Chris Li Cc: "Huang, Ying" Cc: Hugh Dickins Cc: Kalesh Singh Cc: Ryan Roberts Cc: David Hildenbrand Signed-off-by: Barry Song --- mm/swapfile.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/mm/swapfile.c b/mm/swapfile.c index ea023fc25d08..ed872a186e81 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -156,6 +156,25 @@ static bool swap_is_has_cache(struct swap_info_struct *si, return true; } +static bool swap_is_last_map(struct swap_info_struct *si, + unsigned long offset, int nr_pages, + bool *has_cache) +{ + unsigned char *map = si->swap_map + offset; + unsigned char *map_end = map + nr_pages; + bool cached = false; + + do { + if ((*map & ~SWAP_HAS_CACHE) != 1) + return false; + if (*map & SWAP_HAS_CACHE) + cached = true; + } while (++map < map_end); + + *has_cache = cached; + return true; +} + /* * returns number of pages in the folio that backs the swap entry. If positive, * the folio was reclaimed. If negative, the folio was not reclaimed. If 0, no @@ -1469,6 +1488,39 @@ static unsigned char __swap_entry_free(struct swap_info_struct *p, return usage; } +static bool try_batch_swap_entries_free(struct swap_info_struct *p, + swp_entry_t entry, int nr, bool *any_only_cache) +{ + unsigned long offset = swp_offset(entry); + struct swap_cluster_info *ci; + bool has_cache = false; + bool can_batch; + int i; + + /* cross into another cluster */ + if (nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER) + return false; + ci = lock_cluster_or_swap_info(p, offset); + can_batch = swap_is_last_map(p, offset, nr, &has_cache); + if (can_batch) { + for (i = 0; i < nr; i++) + WRITE_ONCE(p->swap_map[offset + i], SWAP_HAS_CACHE); + } + unlock_cluster_or_swap_info(p, ci); + + /* all swap_maps have count==1 and have no swapcache */ + if (!can_batch) + goto out; + if (!has_cache) { + spin_lock(&p->lock); + swap_entry_range_free(p, entry, nr); + spin_unlock(&p->lock); + } + *any_only_cache = has_cache; +out: + return can_batch; +} + /* * Drop the last HAS_CACHE flag of swap entries, caller have to * ensure all entries belong to the same cgroup. @@ -1797,6 +1849,7 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) bool any_only_cache = false; unsigned long offset; unsigned char count; + bool batched; if (non_swap_entry(entry)) return; @@ -1808,6 +1861,13 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) if (WARN_ON(end_offset > si->max)) goto out; + if (nr > 1 && swap_count(data_race(si->swap_map[start_offset]) == 1)) { + batched = try_batch_swap_entries_free(si, entry, nr, + &any_only_cache); + if (batched) + goto reclaim; + } + /* * First free all entries in the range. */ @@ -1821,6 +1881,7 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr) } } +reclaim: /* * Short-circuit the below loop if none of the entries had their * reference drop to zero.