From patchwork Fri Sep 6 00:10:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13793110 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F41FCE7AA5 for ; Fri, 6 Sep 2024 00:11:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A99016B0089; Thu, 5 Sep 2024 20:11:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A48FD6B008A; Thu, 5 Sep 2024 20:11:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C2B36B008C; Thu, 5 Sep 2024 20:11:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6C2536B0089 for ; Thu, 5 Sep 2024 20:11:18 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D0AB3140E49 for ; Fri, 6 Sep 2024 00:11:17 +0000 (UTC) X-FDA: 82532383794.18.2577C52 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf04.hostedemail.com (Postfix) with ESMTP id 007DF40022 for ; Fri, 6 Sep 2024 00:11:15 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=STmkpTek; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725581451; a=rsa-sha256; cv=none; b=vC26hByD+qwPYONXX8kvJHaeREwO8L9XJkUQ2d6zLkFVVrksoRS9Yn7lqF48vXEw5r08oG /5pWULz0WKxc6m//GIWTOU+fGcbx5yCmNFV3VuqJyWyTuhBqo47l84bV6/mTkg5hfhlPHP hO8JstGRzwingTI2lvR2nVG+JMB0TJk= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=STmkpTek; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725581451; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8QXDrXH1Ww24AqoLO4gcy/YnLraR5k8WZQjyiNk/5Pk=; b=B+nBqiKgev9TZOzC+Ye5tl8fuTex5nYVcz8RsDlJqo6u347POpGMpSsZ4NcRjZQMLy18Rm wdBkazX/OV+Q5mWVzKLhftjlqp9KYWAyIHIdAtV0bheTMAb09YwlsQbsQQQMSawTJZluPq Wz+RRdciuOvaob7VDCCRKhRjtTaAKz4= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-206bd1c6ccdso14091325ad.3 for ; Thu, 05 Sep 2024 17:11:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725581475; x=1726186275; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8QXDrXH1Ww24AqoLO4gcy/YnLraR5k8WZQjyiNk/5Pk=; b=STmkpTekNJdZyRtBkZF/t+OrgKyxqTMyuG8YgE8QWOhcYhg5K3KA2f2cbQmpMZTdTo D6Dkdm7NIEfwSGj7n+IBedhJwwCSA7t4tDTJfLxj1RfIWo/5Qkr5/wrArLD/7nQkLi9Y QA/zIWFhdoJxSqRw3DPTQ5LFZdioSplPmGGNc+NL4N8WAHG8Eo1I5NFgHLzwJBq3vcOp OLZ47dpw9tYtDURJaxlmhiORXPl19Qw4rKBglK+SLlU51Hn9F9oAsORnHkHyhDaqi/zC HiLPDXDYa/T7B/JCniOUUDXIVDYj8Ac1cciQZYwAnHk68X7FTrZD2B5KIZdz6NSU8Grc 4B3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725581475; x=1726186275; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8QXDrXH1Ww24AqoLO4gcy/YnLraR5k8WZQjyiNk/5Pk=; b=ut6W1vraBrq9ZYD69drnJH10VCgfomjuo59HoQdr+WDP7BV0ydAKvHTomuFMGiyzvD qK8JcX7aIA7sTnbgwDlZwpJFiFqLauNA+ZxNwZbJ39PqxXXBQGD9JdCItCmzy1QWjTE5 HExmv/euYwUAU2S8PoyIDMUtrFMcdfQ/+vupyiKLkdhYTSXsSnlbit8cZsHS2pG67kJV vewQs7GS2Za3dQ5WiPh8hfYEfA2UGhEA08Q+KlAqpyOsqUT4AKoV7/EKitU9W93ZYX8g U6V+S2dwUXpJgZTh9QrcyuhZgz3WEZh4hrlfyLUudpk8isQeM3YDXZvgGEnoTPDxLXJ6 rQ4w== X-Forwarded-Encrypted: i=1; AJvYcCXrDkxcGyQhhjVAyI84JEzH/BuKbjcOY8sOT4Cfl3s4vj6R0CsI59hCnYIbfZWUsAzJidtBV5kJTw==@kvack.org X-Gm-Message-State: AOJu0YzwLEaAJtLj4EuQG3ywzr3BZp+voe73S1o1NS4LJ3lGsH/cPlW5 72F8O5SVarSSPQGFn/iI7xYBei+aKOYqdy0ZfGwsIyZYLc2t38DK X-Google-Smtp-Source: AGHT+IFvLappnY16i84maiK/47bZsBM5+lg21Pra1SjZeJopowh0cGVj6dcf6eL6EAv/dFBjX036Cw== X-Received: by 2002:a17:902:ce81:b0:1fb:1afb:b864 with SMTP id d9443c01a7336-206f049d07amr8865735ad.5.1725581474696; Thu, 05 Sep 2024 17:11:14 -0700 (PDT) Received: from Barrys-MBP.hub ([2407:7000:8942:5500:cd42:f8ae:5222:5ab7]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea3548dsm33447445ad.140.2024.09.05.17.11.05 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 05 Sep 2024 17:11:14 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hanchuanhua@oppo.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hch@infradead.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, ryncsn@gmail.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, Usama Arif Subject: [PATCH v8 1/3] mm: Fix swap_read_folio_zeromap() for large folios with partial zeromap Date: Fri, 6 Sep 2024 12:10:45 +1200 Message-Id: <20240906001047.1245-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240906001047.1245-1-21cnbao@gmail.com> References: <20240906001047.1245-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 007DF40022 X-Rspamd-Server: rspam01 X-Stat-Signature: d595cs49e1utc5utc8cd3poiitz5noj7 X-HE-Tag: 1725581475-927709 X-HE-Meta: U2FsdGVkX1+vi2HDMY+5HNqD9tvBhR2db+DoxyYrNxKAglCqCvmTclP6IcpE9SeR+KLbQ7aLgN4SsKvyU/NotyJp5SNqDtrUXYpgKToisc6cEArolBnw46Bz5xpf2nEQjZq6rPH/qkIbCD/vjF8eohjwdf/pil8twBK07zMB+AEK4hYRcSw0KMCJRXF8Xcw5zQ7cYXwQ1m2P2ZD1crTJXOZCjbJ5UVBJ5tA476+7MObbo8n77uHJILUDkL4a5PQXhqw7oyrcxTDW0fWkaElqKj6fPVCmHKBIGSqXAv25NoWkJ80h3aQ0DEtafKkVeC9NeB+QqOxhTpWjtCl+wIOHfORItKG4TjTIaDo/JM5M2131znbNO+J8s8E5FShQNe+lgpc05IuB15NAgXVRzEznk+vz6K9KbSi3aeO/bcljvDRGbLZxacmbu6cotaUHtccIDNjBJAmW1XkFUnVVWOo/HjE0YJJ5UuCqYh+VN95v1DDtqJ0ei0xt1y4QCJ/rFWZo/+B5CSvJWOsJMUOPsWxWoovkW1FnJz+Imx03BZbTiA4rwmVFYu2V/Ul1CMHZmkVbfEBOYNAo8HdXO92+XkTuFM08P/Iftctrezk0VKjS/DlCG4I3UDrZfAAmGihJVpSNyQ89IKsX4z4bx3OSG0AkdtlvcxsxqDO4umYWhpst4On0IdhNgLgVy0gUMxPYIAlx22fey93T0KU+Qf7vGdnKjIVkeLNBhlnm6/eUdoMfsXsAQU/soIB31YYczBNhGdrUIf5NOMkzfQuctkTTAZTwvePA8/HEZ+FZYG3vmB8i4wk8DuXTbG18S1kAkvIDD+a2jtvJ6T+KUC1pJU87DpR0e5kSOnsOUN3PGazqMonA+qYSmigjf2rRdvV7twWt+CZgs0XwIF3QfK4KgPmWp55VEp2NzGi6/GIr7IqBkaLrMVkEUkEc97Qi6eCN9hIOk+v3ycTdvgSBV5UlZ/osL7e kIemMjo+ BBBOIj9qeKFwQn9eL18OKu0G9HOl4+iqeukO6+BnjKuQsg2uKqpjVYy0e/gBAagnN02iPFYwnJZZyzBVZRt/ddHIeinqNvoKvqCi8U8HdzWb3728aDWTQQjiJ8vBwcgsEVbsZhrQZJL4FgWu6EVcn1v4QoOT/R2D6dveIw6cZb1mNDIe9fHMiQ3ybSsVO+MgmZpvB1L9PPXIB0KLUDDBNdBH0UXDRazCL8KZ/MNKX+MJWBG12/ooYsyL6g/KIePbCV8WFufftT+FZ2/EfUkFHCXPW6MChJhFueki3cwR+umF+1edNvYPUTHSHkyZMRubrGaisLwUogqLafjCU8j/6Ft0f5weCX5ctDsH+KvfLFwMqNiTbOfrY/+qCOYICCct1Uti4qttQ7vj6IwUlSny5q3/jNES6yZ5H5LhNBJuNyf3HE8RhR2JV1P66uEWQeB8sq/Dj8piQcdZilssAQu6qbgg9OIdI07/gPydYJU3Ywh/yRY4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song There could be a corner case where the first entry is non-zeromap, but a subsequent entry is zeromap. In this case, we should not let swap_read_folio_zeromap() return false since we will still read corrupted data. Additionally, the iteration of test_bit() is unnecessary and can be replaced with bitmap operations, which are more efficient. We can adopt the style of swap_pte_batch() and folio_pte_batch() to introduce swap_zeromap_batch() which seems to provide the greatest flexibility for the caller. This approach allows the caller to either check if the zeromap status of all entries is consistent or determine the number of contiguous entries with the same status. Since swap_read_folio() can't handle reading a large folio that's partially zeromap and partially non-zeromap, we've moved the code to mm/swap.h so that others, like those working on swap-in, can access it. Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap") Cc: Usama Arif Cc: Yosry Ahmed Signed-off-by: Barry Song Signed-off-by: Barry Song Reviewed-by: Yosry Ahmed --- mm/page_io.c | 32 +++++++------------------------- mm/swap.h | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 25 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 4bc77d1c6bfa..2dfe2273a1f1 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -226,26 +226,6 @@ static void swap_zeromap_folio_clear(struct folio *folio) } } -/* - * Return the index of the first subpage which is not zero-filled - * according to swap_info_struct->zeromap. - * If all pages are zero-filled according to zeromap, it will return - * folio_nr_pages(folio). - */ -static unsigned int swap_zeromap_folio_test(struct folio *folio) -{ - struct swap_info_struct *sis = swp_swap_info(folio->swap); - swp_entry_t entry; - unsigned int i; - - for (i = 0; i < folio_nr_pages(folio); i++) { - entry = page_swap_entry(folio_page(folio, i)); - if (!test_bit(swp_offset(entry), sis->zeromap)) - return i; - } - return i; -} - /* * We may have stale swap cache pages in memory: notice * them here and get rid of the unnecessary final write. @@ -524,19 +504,21 @@ static void sio_read_complete(struct kiocb *iocb, long ret) static bool swap_read_folio_zeromap(struct folio *folio) { - unsigned int idx = swap_zeromap_folio_test(folio); - - if (idx == 0) - return false; + int nr_pages = folio_nr_pages(folio); + bool is_zeromap; + int nr_zeromap = swap_zeromap_batch(folio->swap, nr_pages, &is_zeromap); /* * Swapping in a large folio that is partially in the zeromap is not * currently handled. Return true without marking the folio uptodate so * that an IO error is emitted (e.g. do_swap_page() will sigbus). */ - if (WARN_ON_ONCE(idx < folio_nr_pages(folio))) + if (WARN_ON_ONCE(nr_zeromap != nr_pages)) return true; + if (!is_zeromap) + return false; + folio_zero_range(folio, 0, folio_size(folio)); folio_mark_uptodate(folio); return true; diff --git a/mm/swap.h b/mm/swap.h index f8711ff82f84..1cc56a02fb5f 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -80,6 +80,32 @@ static inline unsigned int folio_swap_flags(struct folio *folio) { return swp_swap_info(folio->swap)->flags; } + +/* + * Return the count of contiguous swap entries that share the same + * zeromap status as the starting entry. If is_zeromap is not NULL, + * it will return the zeromap status of the starting entry. + */ +static inline int swap_zeromap_batch(swp_entry_t entry, int max_nr, + bool *is_zeromap) +{ + struct swap_info_struct *sis = swp_swap_info(entry); + unsigned long start = swp_offset(entry); + unsigned long end = start + max_nr; + bool start_entry_zeromap; + + start_entry_zeromap = test_bit(start, sis->zeromap); + if (is_zeromap) + *is_zeromap = start_entry_zeromap; + + if (max_nr <= 1) + return max_nr; + if (start_entry_zeromap) + return find_next_zero_bit(sis->zeromap, end, start) - start; + else + return find_next_bit(sis->zeromap, end, start) - start; +} + #else /* CONFIG_SWAP */ struct swap_iocb; static inline void swap_read_folio(struct folio *folio, struct swap_iocb **plug) @@ -171,6 +197,13 @@ static inline unsigned int folio_swap_flags(struct folio *folio) { return 0; } + +static inline int swap_zeromap_batch(swp_entry_t entry, int max_nr, + bool *has_zeromap) +{ + return 0; +} + #endif /* CONFIG_SWAP */ #endif /* _MM_SWAP_H */ From patchwork Fri Sep 6 00:10:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13793111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96052CE7AA5 for ; Fri, 6 Sep 2024 00:11:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 198086B008C; Thu, 5 Sep 2024 20:11:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 149126B0092; Thu, 5 Sep 2024 20:11:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2BAD6B0093; Thu, 5 Sep 2024 20:11:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D67136B008C for ; Thu, 5 Sep 2024 20:11:28 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5EC92A9F5D for ; Fri, 6 Sep 2024 00:11:28 +0000 (UTC) X-FDA: 82532384256.30.0925156 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf22.hostedemail.com (Postfix) with ESMTP id 8F1A7C0003 for ; Fri, 6 Sep 2024 00:11:25 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BBw2uoCs; spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725581460; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ywsxeCwLk01Zemkk/XL229z2/fbHB1/7YpgelTeC42M=; b=zQs5jpR0dUXiuRuS0rV090xfUrDGqXbGJLN58wgL9lEpP4Ct1js0wF1QeYtx2WXFFWVJis QKJ5+wfMs9Xta+QsMfp70KrUojoCnvN5tV8Eqy2bpyB0tmifr1t6Qy/skzzsSAq1mlYCks w3I/aoxxXfV1bRinzPBC9+am415ZtVk= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BBw2uoCs; spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725581460; a=rsa-sha256; cv=none; b=vrrBpq857TDSWUgvFApQmp+Moc9BARcXFC/Xb9jPA81TDsTDCON1Qm5H8X4x4VzOCnEtXH tkKWv6Zj/nQPdEAYRAL7mnb2Fa8kkPZPL+E7xdGaRTxxmw7UlhNeGenbMpnAPlXVaTMqqw RRmosd/C6Acr5e8kOV7C24oyAgcasUM= Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-718d6ad6050so162622b3a.0 for ; Thu, 05 Sep 2024 17:11:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725581484; x=1726186284; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ywsxeCwLk01Zemkk/XL229z2/fbHB1/7YpgelTeC42M=; b=BBw2uoCsIU6c6zGxExPqh69Dcco1r+VfFbLaeJdyLyFgCVeGBd1HOxLgcSaIleXzEl 8hNcHOP9/xfsJaDuWDgeU9rL5N4BOq4GNGm4cX5Y/yCw8AitQMYvM89hE4dj40t30z3g 4qxkLYsyeGp2mr3teO0670n7vF6yw7LUImhTYJWtgHdYl39BR8l1I6nnb3M6d3PEnTlu XxpXQf6aftNfcqtJZt1HmTJEaij0vnevD6oy47jvPWwlfwoABuT9xnpZcxGVwKvwj9wm rTCxVbyUQjtAyGYxFtAIVdppN6Lqm96zd8mWLifvr0Qr2qGTxxH9Qnto2mE9blw3KGL/ toQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725581484; x=1726186284; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ywsxeCwLk01Zemkk/XL229z2/fbHB1/7YpgelTeC42M=; b=gRPc+6lV5U+4dNWRJKxr9poSsw/jlGzOR/SycbaOe5jElV07ceosjG5idB1FxMrDbM 3ooW4pna2p93yGzv+wsuPqKNjTR1KBk+JIx1ksvOGvqVoTX918ssD2jxl+0HJjgsbrAo Xsz5RvdTQyBVHH//+vjhHNLzLm+bioGxXK93DRwIrIatiJT3YvM88z8gy54LX8JNZZ9x I/KG8WQ8dmb2H/lcNeN3U6mvQD7j5UtwhyoDjSDv2HoEmSmGBEXtY3dcvQFrkoOpY4+X O9qfemMMValoL9zPM06Uh/mLAcbHVU9qS2B9Sa38xST8IaWV6Sqk1Vu+hPU2dn+Lf6R8 6vkg== X-Forwarded-Encrypted: i=1; AJvYcCVZgkf8YEStpbnz0TcgT0bJPYQIwcjYZksAPgyBYn2puuTI8OlEloNsL+OKlrt0NTCy0eax2m2bSA==@kvack.org X-Gm-Message-State: AOJu0Yytmjw5jc2xksOgpBZ3MA0UI+6BBde4mjlMSDavqD255AcwJX79 Icuk3RPmre46Z6HrR2nu+VZI/Ezlychot4AcytZPJzz8BpIWepdAFRTaYQzl X-Google-Smtp-Source: AGHT+IFsT4gnjeGJ6BsOu0cQW9p32ygkcd0pzMLcI1W+NEiU9e52v6MkwZke3r/as0onruSbW4ToHQ== X-Received: by 2002:a05:6a20:cf83:b0:1ce:d418:a42c with SMTP id adf61e73a8af0-1cf1d0eff88mr747550637.19.1725581484279; Thu, 05 Sep 2024 17:11:24 -0700 (PDT) Received: from Barrys-MBP.hub ([2407:7000:8942:5500:cd42:f8ae:5222:5ab7]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea3548dsm33447445ad.140.2024.09.05.17.11.15 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 05 Sep 2024 17:11:23 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hanchuanhua@oppo.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hch@infradead.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, ryncsn@gmail.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com Subject: [PATCH v8 2/3] mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios Date: Fri, 6 Sep 2024 12:10:46 +1200 Message-Id: <20240906001047.1245-3-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240906001047.1245-1-21cnbao@gmail.com> References: <20240906001047.1245-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: 8p5417u4kxa1n73oedncjxmr4x6yyu5x X-Rspamd-Queue-Id: 8F1A7C0003 X-Rspamd-Server: rspam11 X-HE-Tag: 1725581485-361244 X-HE-Meta: U2FsdGVkX19CuahZ+sreadBuk/ydCRnWX9Q8jpQW9M5w2Lhb5bsrg0Xm2dHVh847lMvRHQZE9WIMQXOMN7+tPEgJf6hENrJfaHv6nlQEcbrLb5lpo85VURZ2x5KycZWhcAYjWpRUs+6dU/P5/Fi+KBxmW5PPVMZdnRb9L1uBBCMUMZ8iQjLpAioN0ylZulO+AbWUEKluuFFnbsvaG4wbamyaJ7un3MpwYvEt/ZVaZYiw51AQh87FRTzTQq3hlTestKFzjANW7Oey9g32q+2ItdgtLWFGcuT1VmzrU/k/5dNEH5vxl4PbuOfmTSli7fVX0X6Ey/M+78myPzUaj8vrTGZAbUDZ2NcZxo+ReR6sidyiESaETn4hFfhTMEpEIzZkybUof4gwDo25zXB3fjhGlwW0KnWxKsfdwlEcVGJOS9l4DHY7rS52FAdZob6XnAli2SZ2WvFeSMO3kKCfeZ1YctWbuSoWg+yBvAER1gWGJbR0RLvEwPE3Yg1CdbxMMhfZAzz+KXIeUg6CwsCOvDIJ3LaAWGwMcY/iqC/taxdU1JeivLuzgyE19HzvQo0Am89EIWfJcpABLLqzf0HMSJrS51nExxJt1vbJBbqKMS5JtQ5kojQp/Ih9Ly6kSRfQNiIXIRnIS2RniyZVSdYjnMBSeLZLazQS/aEBD2+Gu0y1nIo3ckOlfPIyPT78VaUUDofff50v4wN8b7hB1IY3Q0Ujm/j0vv11+O3AqlVaKgR7E50xQze1Nt4Rd28/vDb0FrmHaUVJt+FRqe5EHo9lkNfTGMYOv9HkuL4OvVqmd30DwkSxSMzW9P/5++HwtHH4yiWnnNMKzD4IWsulcl9EeeiYha0w5foXGRmpGlPkeIPc19Q88virAgYBC6y0gg9FS+C0V4Rz/A0/rnYtmOR+4F3rUoIygdv0UNgqfRnRIfSpaL+trFGjzaC5v1gK7MyZJxwidDRYI+7lJDv9MVhBF9Y x3S2dABs FcjD/Dt2ysJuRCL578/TK6PdjTv3rnk2v1GVZ0vIy8awidleWaY9Y4lWGcmX8i5Gc+eEuNWWv49gU4J3z2wyaCPHv4F4hB/c1K724Lrh+fP8lVJNs0/fDC+SC8hQFejhH4qaezAAi835jMAcIhdND4GVSn/hP5SkD2QrxG/ScwNoS9kNeOKIDnQLqzZFz7FN0ZWJDrSFklJ4HSNG6iiQ9N7fNxgvzrHUU5QR/2ejZXF7Um2fo24+fhLSG47QBK2JR2Avay15AMrnFTlLP1E8KV2WLMg/z530nRYSydgcZGOzccTlOaavF1cEyWrLYB1RjJ5QsVITV5YBy+wxNXF3em33Oc/nG5f81vpqk3vO2n1J7TpXMR3t8f/I4uivU2d3H1PHkimRyoUzaqIjdNvgNAj5xsd1b7fDOzcLkSr1VraRiSLb7aK6pe4Jj8e0XneVv1diQTxq8Ymx4XixXueg7J99qLQvWVZ8x+ccj2fICYDnrRXjxK6QhgWMwWpA0TesPkjABSX15MS+wxH+XOaoPplRuKatwlvVtwVhirrEc5iRzrGggEH0kRNjYfR9eu2LRPMNJQp0IEKHGq5dozQNu9dY1Eh13iuEnRVUD7xGXlvb5rW0DtTMP8dS4+6eHPjxHrVxDxVGAjpyi6VAmPxoZ3gUrhXp8jRheL0Q+2KB28VA/WQCQquwwAe9bHKyfy735vfU5S5AfcaacB3GkE5hlucCl4d2fcv8vHt42XeQhArwCdTezgjx3CKjCS9E71LxSBJCalKfOkPy1bdYmSn6VSN7AVuXYRn66fe+omoEhPzZE8uVsflVgIU6hAz/x5YdtzfGdYk9wPeAB7fiYptU1ERUSI4BTUQP2vRD5PV1XxXO2qQyk+fFF4ZOoszsF5EDvQqg4//PjNiZlGXptIe+xKtxoJgWbhmr7KO9xMex1H+sQ2E95mpDInfpVkIySXIiRXlbstdc5QCv9n1HJeLlntJGn9S0w 4Po+oqQj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song With large folios swap-in, we might need to uncharge multiple entries all together, add nr argument in mem_cgroup_swapin_uncharge_swap(). For the existing two users, just pass nr=1. Signed-off-by: Barry Song Acked-by: Chris Li Cc: Shakeel Butt Cc: Baolin Wang Cc: Christoph Hellwig Cc: David Hildenbrand Cc: Gao Xiang Cc: "Huang, Ying" Cc: Hugh Dickins Cc: Johannes Weiner Cc: Kairui Song Cc: Kairui Song Cc: Kalesh Singh Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Minchan Kim Cc: Nhat Pham Cc: Ryan Roberts Cc: Sergey Senozhatsky Cc: Suren Baghdasaryan Cc: Yang Shi Cc: Yosry Ahmed Reviewed-by: Yosry Ahmed --- include/linux/memcontrol.h | 5 +++-- mm/memcontrol.c | 7 ++++--- mm/memory.c | 2 +- mm/swap_state.c | 2 +- 4 files changed, 9 insertions(+), 7 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 2ef94c74847d..34d2da05f2f1 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -699,7 +699,8 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp, int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); -void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); + +void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry, unsigned int nr_pages); void __mem_cgroup_uncharge(struct folio *folio); @@ -1206,7 +1207,7 @@ static inline int mem_cgroup_swapin_charge_folio(struct folio *folio, return 0; } -static inline void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry) +static inline void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry, unsigned int nr) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index bda6f75d22ff..c0d36ca20332 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4559,14 +4559,15 @@ int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, /* * mem_cgroup_swapin_uncharge_swap - uncharge swap slot - * @entry: swap entry for which the page is charged + * @entry: the first swap entry for which the pages are charged + * @nr_pages: number of pages which will be uncharged * * Call this function after successfully adding the charged page to swapcache. * * Note: This function assumes the page for which swap slot is being uncharged * is order 0 page. */ -void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry) +void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry, unsigned int nr_pages) { /* * Cgroup1's unified memory+swap counter has been charged with the @@ -4586,7 +4587,7 @@ void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry) * let's not wait for it. The page already received a * memory+swap charge, drop the swap entry duplicate. */ - mem_cgroup_uncharge_swap(entry, 1); + mem_cgroup_uncharge_swap(entry, nr_pages); } } diff --git a/mm/memory.c b/mm/memory.c index 42674c0748cb..cdf03b39a92c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4100,7 +4100,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) ret = VM_FAULT_OOM; goto out_page; } - mem_cgroup_swapin_uncharge_swap(entry); + mem_cgroup_swapin_uncharge_swap(entry, 1); shadow = get_shadow_from_swap_cache(entry); if (shadow) diff --git a/mm/swap_state.c b/mm/swap_state.c index a042720554a7..4669f29cf555 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -522,7 +522,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, if (add_to_swap_cache(new_folio, entry, gfp_mask & GFP_RECLAIM_MASK, &shadow)) goto fail_unlock; - mem_cgroup_swapin_uncharge_swap(entry); + mem_cgroup_swapin_uncharge_swap(entry, 1); if (shadow) workingset_refault(new_folio, shadow); From patchwork Fri Sep 6 00:10:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13793112 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8719CE7AA6 for ; Fri, 6 Sep 2024 00:11:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 605516B0093; Thu, 5 Sep 2024 20:11:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B6B36B0095; Thu, 5 Sep 2024 20:11:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42F086B0096; Thu, 5 Sep 2024 20:11:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 204E66B0093 for ; Thu, 5 Sep 2024 20:11:38 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9A6AF1A0E9B for ; Fri, 6 Sep 2024 00:11:37 +0000 (UTC) X-FDA: 82532384634.05.37A4E5A Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) by imf02.hostedemail.com (Postfix) with ESMTP id 9B8308000F for ; Fri, 6 Sep 2024 00:11:35 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=icW3irMp; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725581495; a=rsa-sha256; cv=none; b=EY19KmKFO7Z0LMD4ovwRrotj4o4q9rLeZ6tn7Iq0eZn5KMGp5JwDmWGq66q55rmi6ru3bZ Yyg56BfkAts4p/SA1fSOWlgsQ8lXbTjblF+adLJFfp0+uk0JVRsskrl6eMKc2e/qGBfXvb Jl2pbO4WRQ0xGbOoC4fgMwoVSvRWmkg= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=icW3irMp; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725581495; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vV2tR4Goy2aJJyrtdQU+NgrOzgVlBcOxf/RbnCByfrk=; b=B8BzmaG6gNDzovB/q68ECEbhoeNWYlRBDQdFZV9+wsPwBnIrkxyRmPUxtNsRARg3FWkDE4 V+EaSddOrHDuaJGmWL/MLaUH6zlHL696xng/gv7pl5pTW1xbnor0t5U9In51ahIcNg/b6D vBRJ9bbkavWeyOpHBfFE9RD7x/XRBuE= Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-7cd8803fe0aso1108748a12.0 for ; Thu, 05 Sep 2024 17:11:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725581494; x=1726186294; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vV2tR4Goy2aJJyrtdQU+NgrOzgVlBcOxf/RbnCByfrk=; b=icW3irMppLXELNmvjhGZZEN1RXdQtI/n6Ok15Ytv5RfcP/iYSx7uUYNsfoThHU4i3i M/NTxA8RIwv3G4bmoHzKZSLn9GisHCsY1zNFkyuqFM6BwRxxep+2d6WyR3O4lqJ8d1yj XWzjnzeXe9HnlS9TtDwyeFFbfZN/X/gABgQGnTnXN+O/FSzsztvvKB/xfX4/Ks33gWmF b/ydukiem8kDvD/wpgved10Vyt1uxKtY76yW45ZgUvZv37GJpZ+21dHgAAEA+0oRnce+ BlsBzQCpWFwkh2shked1A+SrR+HoL5ptYwFeat5tdWMU8fuXpqB2ZAmHABU3Jxqos4h4 cfYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725581494; x=1726186294; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vV2tR4Goy2aJJyrtdQU+NgrOzgVlBcOxf/RbnCByfrk=; b=IV/HQ2S+ojeBMWuVCgnGHXBF6ntmLIrDhP4VRNh0rVA2Jj+m1r6ySsgkxBlE6bsdKF T4hs4exAsICC7cJgdLtkR+o6BsKxSelANnzpIrL3WqdXuLiu8aAJjoG/MBhIVLtqlBRU QR76Ogwnl2epnVePZLMbDk6SWahumbHlXTOld7A0a9WHohK4g3ygg+UUcIpNOEqVBLfV pbKp+e6gVXMsXVVn3eSrGjNMSHhz5eVWDwhPToBjdfxbkt5VBNSTEq9qjuTlR40OShs4 yNgJ9fg955QvfEslJwHi8D2cO9VeBKL9Q803cmyWkSm8AEDEmt8ZSDfAWtU0XdsqlsnT eQ5w== X-Forwarded-Encrypted: i=1; AJvYcCVjUYmlt4a7HEu6U3rw3e25MDr/n9bDPdH0Ra3nJvqGMPjHgBwgIi2pfWcPcA+iDncTVZn3Anxu0w==@kvack.org X-Gm-Message-State: AOJu0YydIKwViM3MbecHOYpzag60ToQNplDrvz4XSPACdC1RGmlyTcyk xqvnms/h2YN4Aqo03uCnf0Zaeb2W7fG1mIaBZSk+VUXnZiovku2k X-Google-Smtp-Source: AGHT+IE/5ayyT4POxf05BK1qR6oZpfvzYds6J6RnEONZo79n5nJXkDSyQzBIZMs/m4aOfyev07e4jg== X-Received: by 2002:a17:903:18d:b0:206:b5b8:25ef with SMTP id d9443c01a7336-206b5b82940mr86841055ad.15.1725581494216; Thu, 05 Sep 2024 17:11:34 -0700 (PDT) Received: from Barrys-MBP.hub ([2407:7000:8942:5500:cd42:f8ae:5222:5ab7]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-206aea3548dsm33447445ad.140.2024.09.05.17.11.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 05 Sep 2024 17:11:33 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hanchuanhua@oppo.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hch@infradead.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, ryncsn@gmail.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, Usama Arif , Kanchana P Sridhar Subject: [PATCH v8 3/3] mm: support large folios swap-in for sync io devices Date: Fri, 6 Sep 2024 12:10:47 +1200 Message-Id: <20240906001047.1245-4-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20240906001047.1245-1-21cnbao@gmail.com> References: <20240906001047.1245-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: enetqesq1ipskyxoihp3kwnb5cpeh8w4 X-Rspamd-Queue-Id: 9B8308000F X-Rspamd-Server: rspam02 X-HE-Tag: 1725581495-767969 X-HE-Meta: U2FsdGVkX188TOY2LHJNyhfJHpf0NqGD+X3iRpxspRc0C5OUsh4uTZcUOQwIsuEX4B19pbLYYztCVCsCmRUtmCO2JzxUDWbI5K+rJdcFNe4hDdVl3V+7ZkaCmsCQ3kH7IzkSwxStvh4xKDB5WfFkFqSDv7TZamYyj6mzcq4E8c75bD5LoVepCLWSxm2okfhzCTdJx2MMpANAC8sAN4Ct2U2b6BwcKdqdToE6f8BTY++0HU9nEAqCZH+1cHTmQcD+sqOHOXaE5SjH79ZQBiiaWQj1vYtz5hzfkXgf+wLvAfdPuUDGwdlzDXTm95m0WMx2Ttir8FOTl4/F+pj5sZp8aJHNj5ktRnVGASUhOZYiKejRFmj1y6ghrn3zJP0stw3XNY0EBJtU7+hwug2t6E8IxxaWXiviKL9fN2h3WS3tP9URBchr5zwCSrcWqEQIxXdX854mBIw0J27Cu7PwcDvZImcXARak5BssZcS9tVRaqrhpd3LvqTWnJfnsVChbVlcP31BBZa0arjh5RxOWY9QfWMA20RVNQIHO91nghLcl3QM2XdJDN4lFv1AMEXlzCddUjdrWbW89Kma6WXqGjOBv6ApT+6J/42ekEL7WZifV+4YP4DAvj2f1DuoaH/YkW2w1z9yZyrt/q5sh2kO0qExm1AI3tdlGcxJZNSLBr3ruvhWxbh3gWuF3v9W0RQpvhVkAo52ypEehxtcP3RIimnlBBiwogjxA8xkECdK9JppBuoZAbR29Ifdz6DGfXTP7bjpL1ghLzwwuavG5+UvYahXE69RYxQzX9fc6cbXzDD9igQy5WVgd6STKIUZhAcAm3bv/4UbN29b42wbKlIN0f84vrn71/vo+jrI+E87tIcHgxtnRpGHzVlBsQAt7IglWPjnrm7wFGL7xJSEab2m6oYbwfG0s42JYNM0kVklk+Uhye7FVF55aqF5LDsFIh4oaJK2USeuxcF7F4aMVhMLMQz9 OyLxLnGH h4+1u5OCKmLn02ZfRVU9ivbyE0SPE+h/1hSnKVzKFDMzF7w73pOXFYyVAG+JmSNb1Mudt1/4oX4kv1JmYpAg9drB3S3BLUHQp7LgF+Om/oIFxwdvmn0jqKIrg17tvpwYMDASOnIjXQ74F8smPaPfzb7b01qs1VaQUFYYI4K9sZbfomvnYd8RBvIqoet3a0mlVNSUnv0/t9E7vEaucLCY3ak3DiD/tchqxZyL1mYtSLyvdZ2aV8wuhH14HgWpAHAPLgw1gRItbLWNacPsLI2ByERRCAS1+TLqgUlkBatSrd7pHWSySbJinZXYmqst26OHwE4rmCs0JqgORS+b7tXNMAW8Fcv5p7ppvNUcOZBStiLuBkl/UqZ7kpT7i5bAT1kphoS5NR6GteFxAyf4jgIeG35wBrjwHUpi2luLy2Ty2fJZVaC3otjhiMnNNaSX98PrzIzUBcz8p/0HpoDv3sZ1u6RdL2A/EfP9irX0fh8dkLE6M+Gh6fWyoL7pypZglnrKCcbB0TlDcaE9EEb9SRE1yeq7NPQe6/tiv6BgzFywb7XUw1C9etdB/wcxYQ1SWSzl2VF8W8ZEmOKZmP6rsFzK5mwAGY6Edr2/uGWMsoO088mnddfsj01B5J+yUN5og1/oRGFL4FycVebNia0dSFc3WyF9UNAlQZc+CFmYSFlwX8mwDs9ZroaCXpPNjbkYYMXq9fqmvbDAiXC6CIl8j56vhB5UZ9kU9lNdFSfP1X4d+15rQzvRFvieHAh1vdO07sCXo6MnKXD8hK5zRgkd1l/Xba5kFV3BhHLPBMpgTk1BVLR/fNSgzO8XoOpRbtw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chuanhua Han Currently, we have mTHP features, but unfortunately, without support for large folio swap-ins, once these large folios are swapped out, they are lost because mTHP swap is a one-way process. The lack of mTHP swap-in functionality prevents mTHP from being used on devices like Android that heavily rely on swap. This patch introduces mTHP swap-in support. It starts from sync devices such as zRAM. This is probably the simplest and most common use case, benefiting billions of Android phones and similar devices with minimal implementation cost. In this straightforward scenario, large folios are always exclusive, eliminating the need to handle complex rmap and swapcache issues. It offers several benefits: 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP after swap-out and swap-in. Large folios in the buddy system are also preserved as much as possible, rather than being fragmented due to swap-in. 2. Eliminates fragmentation in swap slots and supports successful THP_SWPOUT. w/o this patch (Refer to the data from Chris's and Kairui's latest swap allocator optimization while running ./thp_swap_allocator_test w/o "-a" option [1]): ./thp_swap_allocator_test Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 2: swpout inc: 131, swpout fallback inc: 101, Fallback percentage: 43.53% Iteration 3: swpout inc: 71, swpout fallback inc: 155, Fallback percentage: 68.58% Iteration 4: swpout inc: 55, swpout fallback inc: 168, Fallback percentage: 75.34% Iteration 5: swpout inc: 35, swpout fallback inc: 191, Fallback percentage: 84.51% Iteration 6: swpout inc: 25, swpout fallback inc: 199, Fallback percentage: 88.84% Iteration 7: swpout inc: 23, swpout fallback inc: 205, Fallback percentage: 89.91% Iteration 8: swpout inc: 9, swpout fallback inc: 219, Fallback percentage: 96.05% Iteration 9: swpout inc: 13, swpout fallback inc: 213, Fallback percentage: 94.25% Iteration 10: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74% Iteration 11: swpout inc: 16, swpout fallback inc: 213, Fallback percentage: 93.01% Iteration 12: swpout inc: 10, swpout fallback inc: 210, Fallback percentage: 95.45% Iteration 13: swpout inc: 16, swpout fallback inc: 212, Fallback percentage: 92.98% Iteration 14: swpout inc: 12, swpout fallback inc: 212, Fallback percentage: 94.64% Iteration 15: swpout inc: 15, swpout fallback inc: 211, Fallback percentage: 93.36% Iteration 16: swpout inc: 15, swpout fallback inc: 200, Fallback percentage: 93.02% Iteration 17: swpout inc: 9, swpout fallback inc: 220, Fallback percentage: 96.07% w/ this patch (always 0%): Iteration 1: swpout inc: 948, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 2: swpout inc: 953, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 3: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 4: swpout inc: 952, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 5: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 6: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 7: swpout inc: 947, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 8: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 9: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 10: swpout inc: 945, swpout fallback inc: 0, Fallback percentage: 0.00% Iteration 11: swpout inc: 947, swpout fallback inc: 0, Fallback percentage: 0.00% ... 3. With both mTHP swap-out and swap-in supported, we offer the option to enable zsmalloc compression/decompression with larger granularity[2]. The upcoming optimization in zsmalloc will significantly increase swap speed and improve compression efficiency. Tested by running 100 iterations of swapping 100MiB of anon memory, the swap speed improved dramatically: time consumption of swapin(ms) time consumption of swapout(ms) lz4 4k 45274 90540 lz4 64k 22942 55667 zstdn 4k 85035 186585 zstdn 64k 46558 118533 The compression ratio also improved, as evaluated with 1 GiB of data: granularity orig_data_size compr_data_size 4KiB-zstd 1048576000 246876055 64KiB-zstd 1048576000 199763892 Without mTHP swap-in, the potential optimizations in zsmalloc cannot be realized. 4. Even mTHP swap-in itself can reduce swap-in page faults by a factor of nr_pages. Swapping in content filled with the same data 0x11, w/o and w/ the patch for five rounds (Since the content is the same, decompression will be very fast. This primarily assesses the impact of reduced page faults): swp in bandwidth(bytes/ms) w/o w/ round1 624152 1127501 round2 631672 1127501 round3 620459 1139756 round4 606113 1139756 round5 624152 1152281 avg 621310 1137359 +83% 5. With both mTHP swap-out and swap-in supported, we offer the option to enable hardware accelerators(Intel IAA) to do parallel decompression with which Kanchana reported 7X improvement on zRAM read latency[3]. [1] https://lore.kernel.org/all/20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org/ [2] https://lore.kernel.org/all/20240327214816.31191-1-21cnbao@gmail.com/ [3] https://lore.kernel.org/all/cover.1714581792.git.andre.glover@linux.intel.com/ Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song Cc: Baolin Wang Cc: Chris Li Cc: Christoph Hellwig Cc: David Hildenbrand Cc: Gao Xiang Cc: "Huang, Ying" Cc: Hugh Dickins Cc: Johannes Weiner Cc: Kairui Song Cc: Kalesh Singh Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Minchan Kim Cc: Nhat Pham Cc: Ryan Roberts Cc: Sergey Senozhatsky Cc: Shakeel Butt Cc: Suren Baghdasaryan Cc: Yang Shi Cc: Yosry Ahmed Cc: Usama Arif Cc: Kanchana P Sridhar --- mm/memory.c | 261 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 234 insertions(+), 27 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index cdf03b39a92c..d35dd8d99c8a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3985,6 +3985,194 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf) return VM_FAULT_SIGBUS; } +static struct folio *__alloc_swap_folio(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct folio *folio; + swp_entry_t entry; + + folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, + vmf->address, false); + if (!folio) + return NULL; + + entry = pte_to_swp_entry(vmf->orig_pte); + if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, + GFP_KERNEL, entry)) { + folio_put(folio); + return NULL; + } + + return folio; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) +{ + struct swap_info_struct *si = swp_swap_info(entry); + pgoff_t offset = swp_offset(entry); + int i; + + /* + * While allocating a large folio and doing swap_read_folio, which is + * the case the being faulted pte doesn't have swapcache. We need to + * ensure all PTEs have no cache as well, otherwise, we might go to + * swap devices while the content is in swapcache. + */ + for (i = 0; i < max_nr; i++) { + if ((si->swap_map[offset + i] & SWAP_HAS_CACHE)) + return i; + } + + return i; +} + +/* + * Check if the PTEs within a range are contiguous swap entries + * and have consistent swapcache, zeromap. + */ +static bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, int nr_pages) +{ + unsigned long addr; + swp_entry_t entry; + int idx; + pte_t pte; + + addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); + idx = (vmf->address - addr) / PAGE_SIZE; + pte = ptep_get(ptep); + + if (!pte_same(pte, pte_move_swp_offset(vmf->orig_pte, -idx))) + return false; + entry = pte_to_swp_entry(pte); + if (swap_pte_batch(ptep, nr_pages, pte) != nr_pages) + return false; + + /* + * swap_read_folio() can't handle the case a large folio is hybridly + * from different backends. And they are likely corner cases. Similar + * things might be added once zswap support large folios. + */ + if (unlikely(swap_zeromap_batch(entry, nr_pages, NULL) != nr_pages)) + return false; + if (unlikely(non_swapcache_batch(entry, nr_pages) != nr_pages)) + return false; + + return true; +} + +static inline unsigned long thp_swap_suitable_orders(pgoff_t swp_offset, + unsigned long addr, + unsigned long orders) +{ + int order, nr; + + order = highest_order(orders); + + /* + * To swap in a THP with nr pages, we require that its first swap_offset + * is aligned with that number, as it was when the THP was swapped out. + * This helps filter out most invalid entries. + */ + while (orders) { + nr = 1 << order; + if ((addr >> PAGE_SHIFT) % nr == swp_offset % nr) + break; + order = next_order(&orders, order); + } + + return orders; +} + +static struct folio *alloc_swap_folio(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + unsigned long orders; + struct folio *folio; + unsigned long addr; + swp_entry_t entry; + spinlock_t *ptl; + pte_t *pte; + gfp_t gfp; + int order; + + /* + * If uffd is active for the vma we need per-page fault fidelity to + * maintain the uffd semantics. + */ + if (unlikely(userfaultfd_armed(vma))) + goto fallback; + + /* + * A large swapped out folio could be partially or fully in zswap. We + * lack handling for such cases, so fallback to swapping in order-0 + * folio. + */ + if (!zswap_never_enabled()) + goto fallback; + + entry = pte_to_swp_entry(vmf->orig_pte); + /* + * Get a list of all the (large) orders below PMD_ORDER that are enabled + * and suitable for swapping THP. + */ + orders = thp_vma_allowable_orders(vma, vma->vm_flags, + TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1); + orders = thp_vma_suitable_orders(vma, vmf->address, orders); + orders = thp_swap_suitable_orders(swp_offset(entry), + vmf->address, orders); + + if (!orders) + goto fallback; + + pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, + vmf->address & PMD_MASK, &ptl); + if (unlikely(!pte)) + goto fallback; + + /* + * For do_swap_page, find the highest order where the aligned range is + * completely swap entries with contiguous swap offsets. + */ + order = highest_order(orders); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + if (can_swapin_thp(vmf, pte + pte_index(addr), 1 << order)) + break; + order = next_order(&orders, order); + } + + pte_unmap_unlock(pte, ptl); + + /* Try allocating the highest of the remaining orders. */ + gfp = vma_thp_gfp_mask(vma); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + folio = vma_alloc_folio(gfp, order, vma, addr, true); + if (folio) { + if (!mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, + gfp, entry)) + return folio; + folio_put(folio); + } + order = next_order(&orders, order); + } + +fallback: + return __alloc_swap_folio(vmf); +} +#else /* !CONFIG_TRANSPARENT_HUGEPAGE */ +static inline bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, int nr_pages) +{ + return false; +} + +static struct folio *alloc_swap_folio(struct vm_fault *vmf) +{ + return __alloc_swap_folio(vmf); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4073,34 +4261,34 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { - /* - * Prevent parallel swapin from proceeding with - * the cache flag. Otherwise, another thread may - * finish swapin first, free the entry, and swapout - * reusing the same entry. It's undetectable as - * pte_same() returns true due to entry reuse. - */ - if (swapcache_prepare(entry, 1)) { - /* Relax a bit to prevent rapid repeated page faults */ - schedule_timeout_uninterruptible(1); - goto out; - } - need_clear_cache = true; - /* skip swapcache */ - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, - vma, vmf->address, false); + folio = alloc_swap_folio(vmf); if (folio) { __folio_set_locked(folio); __folio_set_swapbacked(folio); - if (mem_cgroup_swapin_charge_folio(folio, - vma->vm_mm, GFP_KERNEL, - entry)) { - ret = VM_FAULT_OOM; + nr_pages = folio_nr_pages(folio); + if (folio_test_large(folio)) + entry.val = ALIGN_DOWN(entry.val, nr_pages); + /* + * Prevent parallel swapin from proceeding with + * the cache flag. Otherwise, another thread + * may finish swapin first, free the entry, and + * swapout reusing the same entry. It's + * undetectable as pte_same() returns true due + * to entry reuse. + */ + if (swapcache_prepare(entry, nr_pages)) { + /* + * Relax a bit to prevent rapid + * repeated page faults. + */ + schedule_timeout_uninterruptible(1); goto out_page; } - mem_cgroup_swapin_uncharge_swap(entry, 1); + need_clear_cache = true; + + mem_cgroup_swapin_uncharge_swap(entry, nr_pages); shadow = get_shadow_from_swap_cache(entry); if (shadow) @@ -4206,6 +4394,24 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_nomap; } + /* allocated large folios for SWP_SYNCHRONOUS_IO */ + if (folio_test_large(folio) && !folio_test_swapcache(folio)) { + unsigned long nr = folio_nr_pages(folio); + unsigned long folio_start = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); + unsigned long idx = (vmf->address - folio_start) / PAGE_SIZE; + pte_t *folio_ptep = vmf->pte - idx; + pte_t folio_pte = ptep_get(folio_ptep); + + if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) || + swap_pte_batch(folio_ptep, nr, folio_pte) != nr) + goto out_nomap; + + page_idx = idx; + address = folio_start; + ptep = folio_ptep; + goto check_folio; + } + nr_pages = 1; page_idx = 0; address = vmf->address; @@ -4337,11 +4543,12 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_add_lru_vma(folio, vma); } else if (!folio_test_anon(folio)) { /* - * We currently only expect small !anon folios, which are either - * fully exclusive or fully shared. If we ever get large folios - * here, we have to be careful. + * We currently only expect small !anon folios which are either + * fully exclusive or fully shared, or new allocated large + * folios which are fully exclusive. If we ever get large + * folios within swapcache here, we have to be careful. */ - VM_WARN_ON_ONCE(folio_test_large(folio)); + VM_WARN_ON_ONCE(folio_test_large(folio) && folio_test_swapcache(folio)); VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); folio_add_new_anon_rmap(folio, vma, address, rmap_flags); } else { @@ -4384,7 +4591,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) out: /* Clear the swap cache pin for direct swapin after PTL unlock */ if (need_clear_cache) - swapcache_clear(si, entry, 1); + swapcache_clear(si, entry, nr_pages); if (si) put_swap_device(si); return ret; @@ -4400,7 +4607,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_put(swapcache); } if (need_clear_cache) - swapcache_clear(si, entry, 1); + swapcache_clear(si, entry, nr_pages); if (si) put_swap_device(si); return ret;