From patchwork Sun Nov 17 18:53:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vitaly Wool X-Patchwork-Id: 11248603 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A581414E5 for ; Sun, 17 Nov 2019 18:53:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 55E9020730 for ; Sun, 17 Nov 2019 18:53:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=konsulko.com header.i=@konsulko.com header.b="q/Pwjj0Q" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 55E9020730 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=konsulko.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5BC6C6B000D; Sun, 17 Nov 2019 13:53:50 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 56FD26B000E; Sun, 17 Nov 2019 13:53:50 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45BB96B0010; Sun, 17 Nov 2019 13:53:50 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 28BE26B000D for ; Sun, 17 Nov 2019 13:53:50 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id D1C0F180AD806 for ; Sun, 17 Nov 2019 18:53:49 +0000 (UTC) X-FDA: 76166668578.11.skirt34_78a478d3f6725 X-Spam-Summary: 2,0,0,8456a75eef11aac5,d41d8cd98f00b204,vitaly.wool@konsulko.com,::linux-kernel@vger.kernel.org:ddstreet@ieee.org:akpm@linux-foundation.org:sjenning@redhat.com:johannes@sipsolutions.net:joern@purestorage.com:vitaly.wool@konsulko.com,RULES_HIT:1:41:69:355:379:541:800:960:966:968:973:988:989:1260:1311:1314:1345:1437:1515:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2553:2559:2562:2637:2895:2897:2902:2987:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4250:4321:4385:4605:5007:6119:6261:6653:7576:7875:7903:8603:9108:9592:10004:11026:11232:11473:11658:11914:12043:12257:12291:12296:12297:12438:12517:12519:12555:12679:12683:12895:12986:13138:13146:13161:13229:13230:13231:13869:13870:13894:14394:21063:21080:21324:21444:21451:21611:21627:21810:21939:21966:30005:30012:30034:30054:30056:30070:30090,0,RBL:209.85.208.195:@konsulko.com:.lbl8.mailshell.net-62.14.0.100 66.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache: 0,MSF:no X-HE-Tag: skirt34_78a478d3f6725 X-Filterd-Recvd-Size: 14279 Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Sun, 17 Nov 2019 18:53:48 +0000 (UTC) Received: by mail-lj1-f195.google.com with SMTP id g3so16237915ljl.11 for ; Sun, 17 Nov 2019 10:53:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=konsulko.com; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=HSO3ZkSXhTwG3Yf4d5TQxyjU1uorl7GYtU8QE5wEutc=; b=q/Pwjj0QoyjOPTxbuXgaHSV6HpaN52Z9dxxdClFdYj7QIq+jdOHkfAERWXaB0ZDdiR fpsaISwORvGdHNNshccMHOOXTfKcPVJGZA92ssmV5gUeUXQuq0rGBIRTZTIbtyFuVlJa JQ0bXNw3yb1UF8EdMiNyCt8aT2Z4ScHt+sQfo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=HSO3ZkSXhTwG3Yf4d5TQxyjU1uorl7GYtU8QE5wEutc=; b=m+32XU9ci+qLam3wFyrIKV7kfi221aKqYvYvldMP/OS9W+aJt2ui67D0FleJE7cZ4f IYZxTf48Q1wCj433n2XHcWVAoeHlluTCIPf7z/a1h2aTT8PCd+97RI2DxNXf6tUL+D8Z am/2cuiRJSeE2V5o6D7Br2jujYuCJZ46c8muwaJ13lZnppnfxhpe4NVQAb9cgApTkJ2i EDd18DFSsn/SPuQMUN2Mo+1Meby7xMr5CxYxEkJU5cV0f8wYkscrulP2lv6aKgHqfmG5 9yFFpj5FJW21s5eGUC4ol4B8qw/YnXhBWtKRcOp1FV8Bd/wtDK4q3QSeTVAOj29vZmoX hQzg== X-Gm-Message-State: APjAAAWTWBC/In27H6BYd7YTsyhy2LvEkRXswDhpk79R5irwGITdyb0N 0ihGsKOW/WGsAQ+xzKJ0TDfIGCR7atA= X-Google-Smtp-Source: APXvYqzZ81jEwbHJAjTG7NWEkgnOIa6j//y/Sz949cDhNnhw6yezKnstYVDiCHZbMDDGjVwh7KmfyQ== X-Received: by 2002:a2e:884c:: with SMTP id z12mr6742171ljj.41.1574016826738; Sun, 17 Nov 2019 10:53:46 -0800 (PST) Received: from taos.konsulko.bg (lan.nucleusys.com. [92.247.61.126]) by smtp.gmail.com with ESMTPSA id d28sm7530928lfn.33.2019.11.17.10.53.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 17 Nov 2019 10:53:45 -0800 (PST) From: vitaly.wool@konsulko.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, ddstreet@ieee.org, akpm@linux-foundation.org, sjenning@redhat.com, johannes@sipsolutions.net, joern@purestorage.com, Vitaly Wool Subject: [PATCH] zswap: use B-tree for search Date: Sun, 17 Nov 2019 20:53:32 +0200 Message-Id: <20191117185332.18998-1-vitaly.wool@konsulko.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Vitaly Wool The current zswap implementation uses red-black trees to store entries and to perform lookups. Although this algorithm obviously has complexity of O(log N) it still takes a while to complete lookup (or, even more for replacement) of an entry, when the amount of entries is huge (100K+). B-trees are known to handle such cases more efficiently (i. e. also with O(log N) complexity but with way lower coefficient) so trying zswap with B-trees was worth a shot. The implementation of B-trees that is currently present in Linux kernel isn't really doing things in the best possible way (i. e. it has recursion) but the testing I've run still shows a very significant performance increase. The usage pattern of B-tree here is not exactly following the guidelines but it is due to the fact that pgoff_t may be both 32 and 64 bits long. Tested on qemu-kvm (-smp 2 -m 1024) with zswap in the following configuration: * zpool: z3fold * max_pool_percent: 100 and the swap size of 1G. Test command: $ stress-ng --io 4 --vm 4 --vm-bytes 1000M --timeout 300s --metrics This, averaged over 20 runs on qemu-kvm (-smp 2 -m 1024) gives the following io bogo ops: * original: 73778.8 * btree: 393999 Signed-off-by: Vitaly Wool --- mm/Kconfig | 1 + mm/zswap.c | 147 +++++++++++++++++++++++------------------------------ 2 files changed, 65 insertions(+), 83 deletions(-) diff --git a/mm/Kconfig b/mm/Kconfig index a5dae9a7eb51..b278026de513 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -527,6 +527,7 @@ config ZSWAP depends on FRONTSWAP && CRYPTO=y select CRYPTO_LZO select ZPOOL + select BTREE help A lightweight compressed cache for swap pages. It takes pages that are in the process of being swapped out and attempts to diff --git a/mm/zswap.c b/mm/zswap.c index 46a322316e52..5f46c4f7a6bf 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -21,7 +21,7 @@ #include #include #include -#include +#include #include #include #include @@ -134,7 +134,6 @@ struct zswap_pool { * This structure contains the metadata for tracking a single compressed * page within zswap. * - * rbnode - links the entry into red-black tree for the appropriate swap type * offset - the swap offset for the entry. Index into the red-black tree. * refcount - the number of outstanding reference to the entry. This is needed * to protect against premature freeing of the entry by code @@ -149,7 +148,6 @@ struct zswap_pool { * value - value of the same-value filled pages which have same content */ struct zswap_entry { - struct rb_node rbnode; pgoff_t offset; int refcount; unsigned int length; @@ -166,11 +164,11 @@ struct zswap_header { /* * The tree lock in the zswap_tree struct protects a few things: - * - the rbtree + * - the tree * - the refcount field of each entry in the tree */ struct zswap_tree { - struct rb_root rbroot; + struct btree_head head; spinlock_t lock; }; @@ -252,7 +250,6 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t gfp) if (!entry) return NULL; entry->refcount = 1; - RB_CLEAR_NODE(&entry->rbnode); return entry; } @@ -262,58 +259,18 @@ static void zswap_entry_cache_free(struct zswap_entry *entry) } /********************************* -* rbtree functions +* btree functions **********************************/ -static struct zswap_entry *zswap_rb_search(struct rb_root *root, pgoff_t offset) -{ - struct rb_node *node = root->rb_node; - struct zswap_entry *entry; +static struct btree_geo *btree_pgofft_geo; - while (node) { - entry = rb_entry(node, struct zswap_entry, rbnode); - if (entry->offset > offset) - node = node->rb_left; - else if (entry->offset < offset) - node = node->rb_right; - else - return entry; - } - return NULL; -} - -/* - * In the case that a entry with the same offset is found, a pointer to - * the existing entry is stored in dupentry and the function returns -EEXIST - */ -static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry, - struct zswap_entry **dupentry) +static struct zswap_entry *zswap_search(struct btree_head *head, pgoff_t offset) { - struct rb_node **link = &root->rb_node, *parent = NULL; - struct zswap_entry *myentry; - - while (*link) { - parent = *link; - myentry = rb_entry(parent, struct zswap_entry, rbnode); - if (myentry->offset > entry->offset) - link = &(*link)->rb_left; - else if (myentry->offset < entry->offset) - link = &(*link)->rb_right; - else { - *dupentry = myentry; - return -EEXIST; - } - } - rb_link_node(&entry->rbnode, parent, link); - rb_insert_color(&entry->rbnode, root); - return 0; + return btree_lookup(head, btree_pgofft_geo, &offset); } -static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) +static void zswap_erase(struct btree_head *head, struct zswap_entry *entry) { - if (!RB_EMPTY_NODE(&entry->rbnode)) { - rb_erase(&entry->rbnode, root); - RB_CLEAR_NODE(&entry->rbnode); - } + btree_remove(head, btree_pgofft_geo, &entry->offset); } /* @@ -342,25 +299,40 @@ static void zswap_entry_get(struct zswap_entry *entry) /* caller must hold the tree lock * remove from the tree and free it, if nobody reference the entry */ -static void zswap_entry_put(struct zswap_tree *tree, +static void zswap_entry_put(struct btree_head *head, struct zswap_entry *entry) { int refcount = --entry->refcount; BUG_ON(refcount < 0); if (refcount == 0) { - zswap_rb_erase(&tree->rbroot, entry); + zswap_erase(head, entry); zswap_free_entry(entry); } } +static int zswap_insert_or_replace(struct btree_head *head, + struct zswap_entry *entry) +{ + struct zswap_entry *old; + + do { + old = btree_remove(head, btree_pgofft_geo, &entry->offset); + if (old) { + zswap_duplicate_entry++; + zswap_entry_put(head, old); + } + } while (old); + return btree_insert(head, btree_pgofft_geo, &entry->offset, entry, + GFP_ATOMIC); +} /* caller must hold the tree lock */ -static struct zswap_entry *zswap_entry_find_get(struct rb_root *root, +static struct zswap_entry *zswap_entry_find_get(struct btree_head *head, pgoff_t offset) { struct zswap_entry *entry; - entry = zswap_rb_search(root, offset); + entry = zswap_search(head, offset); if (entry) zswap_entry_get(entry); @@ -861,7 +833,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) /* find and ref zswap entry */ spin_lock(&tree->lock); - entry = zswap_entry_find_get(&tree->rbroot, offset); + entry = zswap_entry_find_get(&tree->head, offset); if (!entry) { /* entry was invalidated */ spin_unlock(&tree->lock); @@ -910,7 +882,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) spin_lock(&tree->lock); /* drop local reference */ - zswap_entry_put(tree, entry); + zswap_entry_put(&tree->head, entry); /* * There are two possible situations for entry here: @@ -919,8 +891,8 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) * because invalidate happened during writeback * search the tree and free the entry if find entry */ - if (entry == zswap_rb_search(&tree->rbroot, offset)) - zswap_entry_put(tree, entry); + if (entry == zswap_search(&tree->head, offset)) + zswap_entry_put(&tree->head, entry); spin_unlock(&tree->lock); goto end; @@ -934,7 +906,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) */ fail: spin_lock(&tree->lock); - zswap_entry_put(tree, entry); + zswap_entry_put(&tree->head, entry); spin_unlock(&tree->lock); end: @@ -988,7 +960,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, struct page *page) { struct zswap_tree *tree = zswap_trees[type]; - struct zswap_entry *entry, *dupentry; + struct zswap_entry *entry; struct crypto_comp *tfm; int ret; unsigned int hlen, dlen = PAGE_SIZE; @@ -1096,16 +1068,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, insert_entry: /* map */ spin_lock(&tree->lock); - do { - ret = zswap_rb_insert(&tree->rbroot, entry, &dupentry); - if (ret == -EEXIST) { - zswap_duplicate_entry++; - /* remove from rbtree */ - zswap_rb_erase(&tree->rbroot, dupentry); - zswap_entry_put(tree, dupentry); - } - } while (ret == -EEXIST); + ret = zswap_insert_or_replace(&tree->head, entry); spin_unlock(&tree->lock); + if (ret < 0) { + zswap_reject_alloc_fail++; + goto freepage; + } /* update stats */ atomic_inc(&zswap_stored_pages); @@ -1138,7 +1106,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, /* find */ spin_lock(&tree->lock); - entry = zswap_entry_find_get(&tree->rbroot, offset); + entry = zswap_entry_find_get(&tree->head, offset); if (!entry) { /* entry was written back */ spin_unlock(&tree->lock); @@ -1168,7 +1136,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, freeentry: spin_lock(&tree->lock); - zswap_entry_put(tree, entry); + zswap_entry_put(&tree->head, entry); spin_unlock(&tree->lock); return 0; @@ -1182,36 +1150,41 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset) /* find */ spin_lock(&tree->lock); - entry = zswap_rb_search(&tree->rbroot, offset); + entry = zswap_search(&tree->head, offset); if (!entry) { /* entry was written back */ spin_unlock(&tree->lock); return; } - /* remove from rbtree */ - zswap_rb_erase(&tree->rbroot, entry); + /* remove from tree */ + zswap_erase(&tree->head, entry); /* drop the initial reference from entry creation */ - zswap_entry_put(tree, entry); + zswap_entry_put(&tree->head, entry); spin_unlock(&tree->lock); } +void do_free_entry(void *elem, unsigned long opaque, unsigned long *key, + size_t index, void *func2) +{ + struct zswap_entry *entry = elem; + zswap_free_entry(entry); +} + /* frees all zswap entries for the given swap type */ static void zswap_frontswap_invalidate_area(unsigned type) { struct zswap_tree *tree = zswap_trees[type]; - struct zswap_entry *entry, *n; if (!tree) return; /* walk the tree and free everything */ spin_lock(&tree->lock); - rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode) - zswap_free_entry(entry); - tree->rbroot = RB_ROOT; + btree_visitor(&tree->head, btree_pgofft_geo, 0, do_free_entry, NULL); + btree_destroy(&tree->head); spin_unlock(&tree->lock); kfree(tree); zswap_trees[type] = NULL; @@ -1226,8 +1199,11 @@ static void zswap_frontswap_init(unsigned type) pr_err("alloc failed, zswap disabled for swap type %d\n", type); return; } - - tree->rbroot = RB_ROOT; + if (btree_init(&tree->head) < 0) { + pr_err("couldn't init the tree head\n"); + kfree(tree); + return; + } spin_lock_init(&tree->lock); zswap_trees[type] = tree; } @@ -1302,6 +1278,11 @@ static int __init init_zswap(void) zswap_init_started = true; + if (sizeof(pgoff_t) == 8) + btree_pgofft_geo = &btree_geo64; + else + btree_pgofft_geo = &btree_geo32; + if (zswap_entry_cache_create()) { pr_err("entry cache creation failed\n"); goto cache_fail;