From patchwork Tue Sep 24 01:17:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 13810020 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D89DECF9C5B for ; Tue, 24 Sep 2024 01:17:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4BB066B0095; Mon, 23 Sep 2024 21:17:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 421596B0096; Mon, 23 Sep 2024 21:17:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 161EE6B0099; Mon, 23 Sep 2024 21:17:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E083A6B0095 for ; Mon, 23 Sep 2024 21:17:23 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 99D4C16145B for ; Tue, 24 Sep 2024 01:17:23 +0000 (UTC) X-FDA: 82597868766.29.5E95451 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) by imf09.hostedemail.com (Postfix) with ESMTP id 71BD6140007 for ; Tue, 24 Sep 2024 01:17:21 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=akDrgN+F; spf=pass (imf09.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.12 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727140581; a=rsa-sha256; cv=none; b=0hmn/p7GYYRG/7zmhjyxL4oWLqNKvC7+U70AVvkTA+zfYbbvUPL+rDf0f9qx+2ZpZEDIVd C7sVXUMe7UK02ArayWVzgD/JhmmWc2IJzj8nPo4k1jqSxDp0+/5ZjnILIQkWGEDMPSKgNX TP7LsJ2RgxEprB7QwFFLmFBsF+eglHI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=akDrgN+F; spf=pass (imf09.hostedemail.com: domain of kanchana.p.sridhar@intel.com designates 192.198.163.12 as permitted sender) smtp.mailfrom=kanchana.p.sridhar@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727140581; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wpk9e2m9w0bKMmnLcils/bKHy9sueQxxMvfHpFn3D8M=; b=Jw7MN8fh9bCl801C8lhTuMTYu60JVmpIACr/z9ZN3FOXibCmv33eribL+ReG7O2Jtbnb5j eGLgbzF8uCwkRsJCDAU3FnZ4vBOYu4iJlwkJCmuORyPZw5Ivydd9vC1OCEzkRZvtctFY0g m+ZNafUAQladvrVy5Q3wXipud867PXs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1727140641; x=1758676641; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TC61NMZkVLT5VlFsRHxoHlVUivXlE7pDxlM31DB9mpg=; b=akDrgN+FuB02OT8XP1hpTfh0I4GckvigNFiYiJz7JkNNDUzeFnt54A4i mC7Lo68KNxi68c5kSCfA4z7QzGmbTLHcv5wCEZJ1VA7pQ6iotEHf4PDRi xLVPbawP3NuyUNroZqU5C7GLw5ptjsAzkX5mTez2KBtyzh4WDW7ZIr2Cc +hJ7+7C5hlVJ/H9sEkvEuszOgsiItTSO3ca9GSg6kwGZ48PKKZBGbCybB iqarnaHw3sJgQmKY2iCSH1L+IOA+n5WQPi9gEMxTKH4lDqJpY7pgdWEil 8tVFdhSaY4qnxgURE8vjIu3OL9y7cVicjP7+S4uJ1aE2aYCM55/2n4JpS Q==; X-CSE-ConnectionGUID: ykAbTKflQQWVHHbGCsCllw== X-CSE-MsgGUID: ppdh4lNGQ8S9K+sDN5ZhQQ== X-IronPort-AV: E=McAfee;i="6700,10204,11204"; a="30002043" X-IronPort-AV: E=Sophos;i="6.10,253,1719903600"; d="scan'208";a="30002043" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Sep 2024 18:17:11 -0700 X-CSE-ConnectionGUID: h6kZ0n4KQd2ezP+USAgHWA== X-CSE-MsgGUID: 6ihG0j0jSjSdRCs1upV+/g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,253,1719903600"; d="scan'208";a="108688460" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.6]) by orviesa001.jf.intel.com with ESMTP; 23 Sep 2024 18:17:10 -0700 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, shakeel.butt@linux.dev, ryan.roberts@arm.com, ying.huang@intel.com, 21cnbao@gmail.com, akpm@linux-foundation.org Cc: nanhai.zou@intel.com, wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store(). Date: Mon, 23 Sep 2024 18:17:07 -0700 Message-Id: <20240924011709.7037-7-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20240924011709.7037-1-kanchana.p.sridhar@intel.com> References: <20240924011709.7037-1-kanchana.p.sridhar@intel.com> MIME-Version: 1.0 X-Stat-Signature: mtj3ynzauphn4y79a33yessgizge5g8q X-Rspamd-Queue-Id: 71BD6140007 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1727140641-966411 X-HE-Meta: U2FsdGVkX1++STwyzNbHmJRZsxm0IEDWsNM98DbWKsB+UPtux8CEhyjLjTiuSBbDAVj0ioCjFuK03lxzIAvNB4LrEM7Cq4dNE3eHHmPHdJCulx9thP+D3C1/cgZdBE3nPpk3vTQDlWWgY3ctz+a4sOaVHfuejIkPV3XnEpFLnL40FEYrlOmRfl0YJQa9oze0EreftRduzKp8HzM6g6V7XSmQwy5jWNNGGFFld3k1tUsY52TNr++idb9a+u4ua0R+tQygRTpOClKHMjJL7onxQ4VeTKB6NHTP6NED+BvSVKtAqHypeXKOjyQlVK5mytsk4uIUOceA6CMQgXOGf7PeXSnFNuMkwxI3ad+kophLQwyeJjhl7Q5YIWs3KBc6EmKYCuc5Xan5jc4ug3ipUElb9zovvnYt6QXzlidBfwW9KSVlwtsxb83PwAooY7dfB4v1RJ8dwNAo/OTMJsW4GRg0XpIlFUhZdlFqJVTEARFQ3wL9OeJJ/ovdm8ohnphp0+yakEm2saHkj1Coo4ZkP7q/waoj5k2P/qJepvoD5gRn0M36pXAavIJyJ3IsGtpmTqoQ14MoZDiXnHak6lRaHxGKvP25hVw0v4B5a+GR6hXN0+FSflzyKTodUlDvoRbV0Q1nWu04479NcLL0erlZ2wOP1JDa8U8av2lZdxjIRVfHziGBektFY5iohsYxZVF/KtQZMeT+WMfo2Jvmnvulnitk3gdkE4LVUZRMfE4eRiXL/ximUn13RLP1nljo8T6umI85Yf6kZ56YkS819f8mDubI/676/03trmMCOj0hxidRrfyDUz42NDh+DpdPW1uCi9TkxjiZjtCAjJzZWsrfMCk3XRXCqy+859K29KPeS5T45oHqR12Z7d774Snbe3syCHNgYCNrTj53STc9PvJot6aegxhjzrunl657P5zPRB5Hy2ylep5vHCFPzRXEFmNpcfq/x+aEw54YfB8O7xZcXVb PtGcuixO NOJe/B5ImYObM/rS+UBPYXt8z6CzzsHaKAIin5ccdEA2taYa7Z+TQ+qH+pjmE+c39+JdV+nI96HyZSb5k/tcovWnoZhKqqgC57CB3XJM3aYOk5J8vmuoe9eiGE87nS4znvkdzjA20IoeY1V45XPUiRko1CLgQVadNySQmtFz/cyMiYOWk/Sfj0IjnTwKIOJ0k2HIOAa1+tD4Dh5gavH0McJq16PAzqnna2mNmA4M8zl1thneFsCLAaKq+KfG01GD6bmpHSPyVPcncIbhPWNTwFFzQrwD/r4jAenRRxWbWqYA2Bl401459AA+g5w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: zswap_store() will now store mTHP and PMD-size THP folios by compressing them page by page. This patch provides a sequential implementation of storing an mTHP in zswap_store() by iterating through each page in the folio to compress and store it in the zswap zpool. Towards this goal, zswap_compress() is modified to take a page instead of a folio as input. Each page's swap offset is stored as a separate zswap entry. If an error is encountered during the store of any page in the mTHP, all previous pages/entries stored will be invalidated. Thus, an mTHP is either entirely stored in ZSWAP, or entirely not stored in ZSWAP. This forms the basis for building batching of pages during zswap store of large folios by compressing batches of up to say, 8 pages in an mTHP in parallel in hardware, with the Intel In-Memory Analytics Accelerator (Intel IAA). A new config variable CONFIG_ZSWAP_STORE_THP_DEFAULT_ON (off by default) will enable/disable zswap storing of (m)THP. The corresponding tunable zswap module parameter is "mthp_enabled". This change reuses and adapts the functionality in Ryan Roberts' RFC patch [1]: "[RFC,v1] mm: zswap: Store large folios without splitting" [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u Also, addressed some of the RFC comments from the discussion in [1]. Co-developed-by: Ryan Roberts Signed-off-by: Signed-off-by: Kanchana P Sridhar --- mm/Kconfig | 8 ++++ mm/zswap.c | 122 +++++++++++++++++++++++++---------------------------- 2 files changed, 66 insertions(+), 64 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index 09aebca1cae3..c659fb732ec4 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -59,6 +59,14 @@ config ZSWAP_SHRINKER_DEFAULT_ON reducing the chance that cold pages will reside in the zswap pool and consume memory indefinitely. +config ZSWAP_STORE_THP_DEFAULT_ON + bool "Store mTHP and THP folios in zswap" + depends on ZSWAP + default n + help + If selected, zswap will process mTHP and THP folios by + compressing and storing each 4K page in the large folio. + choice prompt "Default compressor" depends on ZSWAP diff --git a/mm/zswap.c b/mm/zswap.c index 8f2e0ab34c84..16ab770546d6 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -127,6 +127,14 @@ static bool zswap_shrinker_enabled = IS_ENABLED( CONFIG_ZSWAP_SHRINKER_DEFAULT_ON); module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644); +/* + * Enable/disable zswap processing of mTHP folios. + * For now, only zswap_store will process mTHP folios. + */ +static bool zswap_mthp_enabled = IS_ENABLED( + CONFIG_ZSWAP_STORE_THP_DEFAULT_ON); +module_param_named(mthp_enabled, zswap_mthp_enabled, bool, 0644); + bool zswap_is_enabled(void) { return zswap_enabled; @@ -1471,9 +1479,9 @@ static void zswap_delete_stored_offsets(struct xarray *tree, * @objcg: The folio's objcg. * @pool: The zswap_pool to store the compressed data for the page. */ -static bool __maybe_unused zswap_store_page(struct folio *folio, long index, - struct obj_cgroup *objcg, - struct zswap_pool *pool) +static bool zswap_store_page(struct folio *folio, long index, + struct obj_cgroup *objcg, + struct zswap_pool *pool) { swp_entry_t swp = folio->swap; int type = swp_type(swp); @@ -1551,51 +1559,63 @@ static bool __maybe_unused zswap_store_page(struct folio *folio, long index, return false; } +/* + * Modified to store mTHP folios. Each page in the mTHP will be compressed + * and stored sequentially. + */ bool zswap_store(struct folio *folio) { long nr_pages = folio_nr_pages(folio); swp_entry_t swp = folio->swap; pgoff_t offset = swp_offset(swp); struct xarray *tree = swap_zswap_tree(swp); - struct zswap_entry *entry; struct obj_cgroup *objcg = NULL; struct mem_cgroup *memcg = NULL; + struct zswap_pool *pool; + bool ret = false; + long index; VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); - /* Large folios aren't supported */ - if (folio_test_large(folio)) + /* Storing large folios isn't enabled */ + if (!zswap_mthp_enabled && folio_test_large(folio)) return false; if (!zswap_enabled) - goto check_old; + goto reject; - /* Check cgroup limits */ + /* + * Check cgroup limits: + * + * The cgroup zswap limit check is done once at the beginning of an + * mTHP store, and not within zswap_store_page() for each page + * in the mTHP. We do however check the zswap pool limits at the + * start of zswap_store_page(). What this means is, the cgroup + * could go over the limits by at most (HPAGE_PMD_NR - 1) pages. + * However, the per-store-page zswap pool limits check should + * hopefully trigger the cgroup aware and zswap LRU aware global + * reclaim implemented in the shrinker. If this assumption holds, + * the cgroup exceeding the zswap limits could potentially be + * resolved before the next zswap_store, and if it is not, the next + * zswap_store would fail the cgroup zswap limit check at the start. + */ objcg = get_obj_cgroup_from_folio(folio); if (objcg && !obj_cgroup_may_zswap(objcg)) { memcg = get_mem_cgroup_from_objcg(objcg); if (shrink_memcg(memcg)) { mem_cgroup_put(memcg); - goto reject; + goto put_objcg; } mem_cgroup_put(memcg); } if (zswap_check_limits()) - goto reject; - - /* allocate entry */ - entry = zswap_entry_cache_alloc(GFP_KERNEL, folio_nid(folio)); - if (!entry) { - zswap_reject_kmemcache_fail++; - goto reject; - } + goto put_objcg; - /* if entry is successfully added, it keeps the reference */ - entry->pool = zswap_pool_current_get(); - if (!entry->pool) - goto freepage; + pool = zswap_pool_current_get(); + if (!pool) + goto put_objcg; if (objcg) { memcg = get_mem_cgroup_from_objcg(objcg); @@ -1606,60 +1626,34 @@ bool zswap_store(struct folio *folio) mem_cgroup_put(memcg); } - if (!zswap_compress(&folio->page, entry)) - goto put_pool; - - entry->swpentry = swp; - entry->objcg = objcg; - entry->referenced = true; - - if (!zswap_store_entry(tree, entry)) - goto store_failed; - - if (objcg) { - obj_cgroup_charge_zswap(objcg, entry->length); - count_objcg_event(objcg, ZSWPOUT); - } - /* - * We finish initializing the entry while it's already in xarray. - * This is safe because: - * - * 1. Concurrent stores and invalidations are excluded by folio lock. - * - * 2. Writeback is excluded by the entry not being on the LRU yet. - * The publishing order matters to prevent writeback from seeing - * an incoherent entry. + * Store each page of the folio as a separate entry. If we fail to store + * a page, unwind by removing all the previous pages we stored. */ - if (entry->length) { - INIT_LIST_HEAD(&entry->lru); - zswap_lru_add(&zswap_list_lru, entry); + for (index = 0; index < nr_pages; ++index) { + if (!zswap_store_page(folio, index, objcg, pool)) + goto put_pool; } - /* update stats */ - atomic_inc(&zswap_stored_pages); - count_vm_event(ZSWPOUT); - - return true; + ret = true; -store_failed: - zpool_free(entry->pool->zpool, entry->handle); put_pool: - zswap_pool_put(entry->pool); -freepage: - zswap_entry_cache_free(entry); -reject: + zswap_pool_put(pool); +put_objcg: obj_cgroup_put(objcg); if (zswap_pool_reached_full) queue_work(shrink_wq, &zswap_shrink_work); -check_old: +reject: /* - * If the zswap store fails or zswap is disabled, we must invalidate the - * possibly stale entry which was previously stored at this offset. - * Otherwise, writeback could overwrite the new data in the swapfile. + * If the zswap store fails or zswap is disabled, we must invalidate + * the possibly stale entries which were previously stored at the + * offsets corresponding to each page of the folio. Otherwise, + * writeback could overwrite the new data in the swapfile. */ - zswap_delete_stored_offsets(tree, offset, nr_pages); - return false; + if (!ret) + zswap_delete_stored_offsets(tree, offset, nr_pages); + + return ret; } bool zswap_load(struct folio *folio)