From patchwork Thu Oct 19 11:05:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13428659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5DFBCDB465 for ; Thu, 19 Oct 2023 11:06:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C1628D019D; Thu, 19 Oct 2023 07:06:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 349F88D0199; Thu, 19 Oct 2023 07:06:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EABB8D019D; Thu, 19 Oct 2023 07:06:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0AAA58D0199 for ; Thu, 19 Oct 2023 07:06:05 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D8579A12CA for ; Thu, 19 Oct 2023 11:06:04 +0000 (UTC) X-FDA: 81361931448.05.D245760 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf24.hostedemail.com (Postfix) with ESMTP id CAB9918001F for ; Thu, 19 Oct 2023 11:06:01 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf24.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697713562; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=KD3fGgZ0cj3Q+OnuxldJQNkSwLG7jf9/YyRKufHKhq8=; b=goiFvSzXp4adxMwKl4IMH6giTEXgv9y1FQQzXEf1VKnsDUmswopAsPLypPlQWLMhWjRdcc xZ9MuGjIvgocI/x3f/V/OIZClhjOo0/rpP6abxVpVjaLOpYXRQHx5tah+vatoO0VEy2uAi eY0HSGY+71G+2nSpB39AhR0eHXn7zUg= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf24.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697713562; a=rsa-sha256; cv=none; b=ECOEhIkoW66i/Xh97JJ/xcdx2RqSdGjl3hs2Y6XG/F9VyeeWXy1KMkEM66uqxMr/42e6sA msyN2tF0hiCQC6tdQ2hN8vCz//fFAjPRo4+ipmhWJSus8AYW/YiwuR2Urmlbi0wvo8MmT0 qQ/bmUzOcj8qtpVlcEZOGaH+QWzn5IQ= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D074D2F4; Thu, 19 Oct 2023 04:06:40 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 542DD3F762; Thu, 19 Oct 2023 04:05:57 -0700 (PDT) From: Ryan Roberts To: Seth Jennings , Dan Streetman , Vitaly Wool , Andrew Morton , Johannes Weiner , Domenico Cerasuolo Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1] mm: zswap: Store large folios without splitting Date: Thu, 19 Oct 2023 12:05:43 +0100 Message-Id: <20231019110543.3284654-1-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: CAB9918001F X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: yuk9rimxjqyqrptj6xnr5hrf65bw8pmq X-HE-Tag: 1697713561-350024 X-HE-Meta: U2FsdGVkX1+glIVunXkk8SjuM+nCuvjKWQ4JJhZEu4hTKCg87cUSC9oHF+h3MX+DKkE65k3wWRhk9rSDlsnHhV3eohWhHD4Ij7+WCX6OCjFuS+/z3XwYb4UA4kwObWA85dL1eud1gASgNQFSTX6s/5rHSl1zbqTP7Nr3EEjO/x+fk+YxDjMAUei64O2B0GbDnqw9INnhOTSkufeCUa/BjUleyuNtOTESZOXQzwB2zAgrRdf+pQ2eqGYd5c9mf0cnPSfbL4Tg4QuDOzQTBqXhwU2zNe59FbpGtlmZU8TKg+9kHW/kKhfhSV/gB9urhS9jAiXMK3rS9kBVwmnLdbTM+tyknKBmzDqprCSMJbdlr1rHsoXSBrSEuFThc9B+Cco+5scjrJ+p6N0cIDFOmo3RLDaZU7/5V9ajEdM08czFbEdZZpd7Vh9owbAyNoOAFaOm/GJbZsn1mKtv6Xytp1Vap+rzZq9Umd0C8GcbLPGr/rBOtlo3/+MyttlOHR1vhTTx1esCdTCTen+J5vMsHUe7KX3vsD5vkFNeKZEpA6uAwOZt7voRAp1TA/n47LquT1UCdtc+2in7T6apri6uGHNh/k3iqFTXWGkMkq706qjdQ1O+HfJyEmov3q+UAcYYdrl6TwxMqGCwNQHanHN05gIXIbk/b5WUV2LqYDLH8Oju4ZU+EqLor6ZNZ/ZJrwnTL9v8bkfJAUc08vFcU6FKR7FxA/9BLhwYBmTAHh4j9xsP9dCZ9kfwhwf2k70hYgEvc0RmONX9Zhee+ao8Nas5P9axz6IzM4wMDM4npmJ724r+g7yRxPYqWTAhdb4wSdm2i73f1D6dDYTGSjuocAoUSrOmqmRy+z8uPBAJwlLTu60H9O3LCHai/1ssIUuNJHulvQuJh9QrUuutfKgGaxszlFY3CNiPYQVYVK84KJf1ep1yRJx6jxm+HZ2cZfQ2T0qgiff6rfHcXFQGQuJzlWQ4tZh PV2T5zKy R3xP1yons+yYQkgMLn/fDc+FUejFehIY9RJnuimAcO+VEwDUcBaFVPf1dWOU453jZmsS32o1Ag1WZrmWeWT4hpHpObvTjiUh9C4KLOeWKIyuWVKQJQTdDpi+PRCuQlDezyGBVbsRlfsrXc7H/qrebWRPAVtuXmygCiosy5eR2kwokJx6Av62Etc0tNLiwYhHyabTQM12FbNJ8n0X6SERNXGQyLQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Previously zswap would refuse to store any folio bigger than order-0, and therefore all of those folios would be sent directly to the swap file. This is a minor inconvenience since swap can currently only support order-0 and PMD-sized THP, but with the pending introduction of "small-sized THP", and corresponding changes to swapfile to support any order of folio, these large folios will become more prevalent and without this zswap change, zswap will become unusable. Independently of the "small-sized THP" feature, this change makes it possible to store existing PMD-sized THPs in zswap. Modify zswap_store() to allow storing large folios. The function is split into 2 parts; zswap_store() does all the per-folio operations (i.e. checking there is enough space, etc). Then it calls a new helper, zswap_store_page(), for each page in the folio, which are stored as their own entries in the zswap pool. (These entries continue to be loaded back individually as single pages). If a store fails for any single page, then all previously successfully stored folio pages are invalidated. Signed-off-by: Ryan Roberts --- I've tested this on arm64 (m2) with zswap enabled, and running vm-scalability's `usemem` across multiple cores from within a memory-constrained memcg to force high volumes of swap. I've also run mm selftests and observe no regressions (although there is nothing [z]swap specific there) - does zswap have any specific tests I should run? This is based on mm-stable, since mm-unstable contains a zswap patch known to be buggy [1]. I thought it would be best to get comments on the shape, then do the rebase after that patch has been fixed. For context, small-sized THP is being discussed here [2], and I'm working on changes to swapfile to support non-PMD-sized large folios here [3]. [1] https://lore.kernel.org/linux-mm/21606fe5-fb9b-4d37-98ab-38c96819893b@arm.com/ [2] https://lore.kernel.org/linux-mm/20230929114421.3761121-1-ryan.roberts@arm.com/ [3] https://lore.kernel.org/linux-mm/20231017161302.2518826-1-ryan.roberts@arm.com/ Thanks, Ryan mm/zswap.c | 155 +++++++++++++++++++++++++++++++++-------------------- 1 file changed, 98 insertions(+), 57 deletions(-) base-commit: 158978945f3173b8c1a88f8c5684a629736a57ac -- 2.25.1 diff --git a/mm/zswap.c b/mm/zswap.c index 37d2b1cb2ecb..51cbfc4e1ef8 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1188,18 +1188,17 @@ static void zswap_fill_page(void *ptr, unsigned long value) memset_l(page, value, PAGE_SIZE / sizeof(unsigned long)); } -bool zswap_store(struct folio *folio) +static bool zswap_store_page(struct folio *folio, long index, + struct obj_cgroup *objcg, struct zswap_pool *pool) { swp_entry_t swp = folio->swap; int type = swp_type(swp); - pgoff_t offset = swp_offset(swp); - struct page *page = &folio->page; + pgoff_t offset = swp_offset(swp) + index; + struct page *page = folio_page(folio, index); struct zswap_tree *tree = zswap_trees[type]; struct zswap_entry *entry, *dupentry; struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; - struct obj_cgroup *objcg = NULL; - struct zswap_pool *pool; struct zpool *zpool; unsigned int dlen = PAGE_SIZE; unsigned long handle, value; @@ -1208,51 +1207,11 @@ bool zswap_store(struct folio *folio) gfp_t gfp; int ret; - VM_WARN_ON_ONCE(!folio_test_locked(folio)); - VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); - - /* Large folios aren't supported */ - if (folio_test_large(folio)) - return false; - - if (!zswap_enabled || !tree) + /* entry keeps the references if successfully stored. */ + if (!zswap_pool_get(pool)) return false; - - /* - * If this is a duplicate, it must be removed before attempting to store - * it, otherwise, if the store fails the old page won't be removed from - * the tree, and it might be written back overriding the new data. - */ - spin_lock(&tree->lock); - dupentry = zswap_rb_search(&tree->rbroot, offset); - if (dupentry) { - zswap_duplicate_entry++; - zswap_invalidate_entry(tree, dupentry); - } - spin_unlock(&tree->lock); - - /* - * XXX: zswap reclaim does not work with cgroups yet. Without a - * cgroup-aware entry LRU, we will push out entries system-wide based on - * local cgroup limits. - */ - objcg = get_obj_cgroup_from_folio(folio); - if (objcg && !obj_cgroup_may_zswap(objcg)) - goto reject; - - /* reclaim space if needed */ - if (zswap_is_full()) { - zswap_pool_limit_hit++; - zswap_pool_reached_full = true; - goto shrink; - } - - if (zswap_pool_reached_full) { - if (!zswap_can_accept()) - goto shrink; - else - zswap_pool_reached_full = false; - } + if (objcg) + obj_cgroup_get(objcg); /* allocate entry */ entry = zswap_entry_cache_alloc(GFP_KERNEL); @@ -1260,6 +1219,8 @@ bool zswap_store(struct folio *folio) zswap_reject_kmemcache_fail++; goto reject; } + entry->objcg = objcg; + entry->pool = pool; if (zswap_same_filled_pages_enabled) { src = kmap_atomic(page); @@ -1277,11 +1238,6 @@ bool zswap_store(struct folio *folio) if (!zswap_non_same_filled_pages_enabled) goto freepage; - /* if entry is successfully added, it keeps the reference */ - entry->pool = zswap_pool_current_get(); - if (!entry->pool) - goto freepage; - /* compress */ acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); @@ -1337,7 +1293,6 @@ bool zswap_store(struct folio *folio) entry->length = dlen; insert_entry: - entry->objcg = objcg; if (objcg) { obj_cgroup_charge_zswap(objcg, entry->length); /* Account before objcg ref is moved to tree */ @@ -1373,19 +1328,105 @@ bool zswap_store(struct folio *folio) put_dstmem: mutex_unlock(acomp_ctx->mutex); - zswap_pool_put(entry->pool); freepage: zswap_entry_cache_free(entry); reject: if (objcg) obj_cgroup_put(objcg); + zswap_pool_put(pool); return false; +} +bool zswap_store(struct folio *folio) +{ + long nr_pages = folio_nr_pages(folio); + swp_entry_t swp = folio->swap; + int type = swp_type(swp); + pgoff_t offset = swp_offset(swp); + struct zswap_tree *tree = zswap_trees[type]; + struct zswap_entry *entry; + struct obj_cgroup *objcg = NULL; + struct zswap_pool *pool; + bool ret = false; + long i; + + VM_WARN_ON_ONCE(!folio_test_locked(folio)); + VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); + + if (!zswap_enabled || !tree) + return false; + + /* + * If this is a duplicate, it must be removed before attempting to store + * it, otherwise, if the store fails the old page won't be removed from + * the tree, and it might be written back overriding the new data. + */ + spin_lock(&tree->lock); + for (i = 0; i < nr_pages; i++) { + entry = zswap_rb_search(&tree->rbroot, offset + i); + if (entry) { + zswap_duplicate_entry++; + zswap_invalidate_entry(tree, entry); + } + } + spin_unlock(&tree->lock); + + /* + * XXX: zswap reclaim does not work with cgroups yet. Without a + * cgroup-aware entry LRU, we will push out entries system-wide based on + * local cgroup limits. + */ + objcg = get_obj_cgroup_from_folio(folio); + if (objcg && !obj_cgroup_may_zswap(objcg)) + goto put_objcg; + + /* reclaim space if needed */ + if (zswap_is_full()) { + zswap_pool_limit_hit++; + zswap_pool_reached_full = true; + goto shrink; + } + + if (zswap_pool_reached_full) { + if (!zswap_can_accept()) + goto shrink; + else + zswap_pool_reached_full = false; + } + + pool = zswap_pool_current_get(); + if (!pool) + goto put_objcg; + + /* + * Store each page of the folio as a separate entry. If we fail to store + * a page, unwind by removing all the previous pages we stored. + */ + for (i = 0; i < nr_pages; i++) { + if (!zswap_store_page(folio, i, objcg, pool)) { + spin_lock(&tree->lock); + for (i--; i >= 0; i--) { + entry = zswap_rb_search(&tree->rbroot, offset + i); + if (entry) + zswap_invalidate_entry(tree, entry); + } + spin_unlock(&tree->lock); + goto put_pool; + } + } + + ret = true; +put_pool: + zswap_pool_put(pool); +put_objcg: + if (objcg) + obj_cgroup_put(objcg); + return ret; shrink: pool = zswap_pool_last_get(); if (pool && !queue_work(shrink_wq, &pool->shrink_work)) zswap_pool_put(pool); - goto reject; + goto put_objcg; } bool zswap_load(struct folio *folio)