From patchwork Wed Jan 18 00:18:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "T.J. Alumbaugh" X-Patchwork-Id: 13105285 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DD20C00A5A for ; Wed, 18 Jan 2023 00:18:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A18F6B0078; Tue, 17 Jan 2023 19:18:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3038B6B007B; Tue, 17 Jan 2023 19:18:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 156856B007D; Tue, 17 Jan 2023 19:18:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 04EA46B0078 for ; Tue, 17 Jan 2023 19:18:48 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BCA54140933 for ; Wed, 18 Jan 2023 00:18:47 +0000 (UTC) X-FDA: 80366009094.22.F524066 Received: from mail-ot1-f73.google.com (mail-ot1-f73.google.com [209.85.210.73]) by imf18.hostedemail.com (Postfix) with ESMTP id 273981C0002 for ; Wed, 18 Jan 2023 00:18:45 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=XayJErSb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of 35TrHYwgKCJwP6HQI76QCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--talumbau.bounces.google.com designates 209.85.210.73 as permitted sender) smtp.mailfrom=35TrHYwgKCJwP6HQI76QCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--talumbau.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674001126; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fBsOlKlcuwYg1TO5fnMZqwWlWQzMMR0DkSccvt40dYk=; b=Ij0QV7ttdt0Z6c1AlvxuaDxCz0QDNhSGI7SyeczMqGYdFnNHn9H0WfooKUwa1O6k3BUFhc OvhydQdOIPcHTXFSBZD13uW6vPzCNFS0kLiM4nRpyOhEdbF8TDJK+1+jd7TE7fa8k293A6 b98F98c6rktXoP/LqA42tPkf9h4YH1Y= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=XayJErSb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of 35TrHYwgKCJwP6HQI76QCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--talumbau.bounces.google.com designates 209.85.210.73 as permitted sender) smtp.mailfrom=35TrHYwgKCJwP6HQI76QCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--talumbau.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674001126; a=rsa-sha256; cv=none; b=3j/Xmbr3jXPQKIuA3t4uLpPA4hvtRct79WgllwuFs3SGDICAQn5gtzgnYjY5rPv9K+/Jio n5cVvwdc3af6EZm0ro9YkPnR0JvcNJYArif89m0uZF9bwEPbPlhq5lRE1x0aTS/Fy4K5ci 6KEO1/DYN3Yyu3hUPV9+90Hu9mNoDOQ= Received: by mail-ot1-f73.google.com with SMTP id j14-20020a056830270e00b006864ab0258cso1356930otu.23 for ; Tue, 17 Jan 2023 16:18:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fBsOlKlcuwYg1TO5fnMZqwWlWQzMMR0DkSccvt40dYk=; b=XayJErSb1aUIDJQCWLMV0aDRWh/VtUQOtmRqg/ZJxZnU40ttu0e9zM+Xhxtdl++5pZ lPE3P44X2Ej530Kl88sXanY2BnvXQKxEhrYGsF3QszLz1C4W4SMucbXVkGn/GRTAgsn2 fFc12zaIEm71Lua2HdCr1jxbKNOSRWe49p+eLsRiILK9qxazoqtwDNvvc2+AMNizUPm2 e+/XjaknM6u/pigZ5eKwgASGZNUD+/mH4qvaYvqulYP8AIKpFrvSNg+xpxnOuBFuUz7C HVMdu1disoW9P4vfqufI7AfwND1y3wI1M6ClDfOdPy2+sDpqlvuwiGDN617BzYG4SlL3 kYvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fBsOlKlcuwYg1TO5fnMZqwWlWQzMMR0DkSccvt40dYk=; b=SapKv0cKHsTjouiCoWf2eOupQW11pyz2zRlheyIch0KKWIeCV6hRcWRF7gyvjXNvka 0wIQEElGBRsthUAFzAr7uKdmDVjpou+sx8cx3fTYfdfbcYslOFU2Oua0iDgHaFOlbcoJ UyGvW3zZVDWs34xq1M7ReT+/ZInQMhmMyrOFAMZ8Oty8VaTd8KUupLjeWzSh+gBIeeHu 1uehQJNXC1pwCXyhlt7ia4FmgkoKNahaKRjgi39g3tKoejRRL5Q2BsQj60xqkdBb02oA Br9MLOKFV0zTdYpZ/FFTKxixqxaoeIrcHZdzKq65b6hxaGToI8yHdrhVAwXrOmwFRvrM ukgA== X-Gm-Message-State: AFqh2koLoe0Aw54k0kzRUm1N4G53uGX5k6c+GcGktim6ZuCRM+BlXwJ8 iMonSAOKSrpmbsQBzKT72Ug5qvTvZ4fn0w== X-Google-Smtp-Source: AMrXdXvyo6DW2fTIellOLWSjvwbbsAvYsTAnHGBC++acHS1loujh0p6QFwQu+rlsujM/lM0FuAgtv4WyxwJreA== X-Received: from talumbau.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:90d]) (user=talumbau job=sendgmr) by 2002:a05:6808:6d8:b0:35b:58e9:8890 with SMTP id m24-20020a05680806d800b0035b58e98890mr272274oih.243.1674001125047; Tue, 17 Jan 2023 16:18:45 -0800 (PST) Date: Wed, 18 Jan 2023 00:18:23 +0000 In-Reply-To: <20230118001827.1040870-1-talumbau@google.com> Mime-Version: 1.0 References: <20230118001827.1040870-1-talumbau@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230118001827.1040870-4-talumbau@google.com> Subject: [PATCH mm-unstable v1 3/7] mm: multi-gen LRU: section for Bloom filters From: "T.J. Alumbaugh" To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-mm@google.com, "T.J. Alumbaugh" X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 273981C0002 X-Stat-Signature: abekh7eo7t7b11bubxy7gccynfiujmig X-HE-Tag: 1674001125-111991 X-HE-Meta: U2FsdGVkX1/fDk2MZ+b7gNJt9NzT/8S87ePcZidprrA13LCYBcoRN5MVN20j85RWDeQcjuXO+lYeKUm8/iYvUl+YnKSKABoB2gKu/yBikpZuLaSg811v/66nZDnl4qH/xwlTu4VZgo22zRRt/PRw0eH5KDSOoy5hc/E8E/YoxkbEHqGceT1xM2OEj3cQ+osyY0ieQMlp1xOKkkd4i46Yiy/LspkL0c1nyD+mFF8697Eatdob5tgRRsWquKT8FDIOgfu1MMcGBlOkSU8/sM3P3isIK/JFo0f4IMp64/Zi4lpM2VorVlTUS7BYDBemKk+ZTU6aN8DHfHGAV0MG87HgmvhyDKDhW95crGuJyOLkhLJloHUjig5rRARoimU+KGgeO+eToaK1gERDMu7nmXViqsDbomr/GjtYb3+piPSM5NaRgiIGSprzNEM9a2L/k4LjHeFSEWX4EiHmMfv+ruZGHoUkownrUTghB0wh6+oErFbBYCpsGzC9iybZAKyLed/XnthQpWdyn1wfpiBa6LLO7yzpDrl2tJ/MUcr2tMm+SbsUWQIczhNERqc3RQBKJdZBYnDB2R/iTxjWsBLVejwZXJnwzjZyUV53wCbkIjbZ/JFCRYuJUu7gJ0Uq+QoT6WED/mpoaf34iB3+3YIewVifGn73P2LaFRrX6uxbBB+a8bDAfhJZuLKtGSLmK57ZTNLKGT7wnj8IlVBo898FwocsrP4xhCgewsta+n4GMV0Twf5oXON5tJtm14R6OtS5HMZzol9jmVVq1bEs22TH5AATTMJiEHYoi7YjS5qlhweRBzaJbOCg8zGGGbT8bkfqpaRETKrUjn25kjjoN3e79IlBUFzl2MVBg543X6S6C/SN6SE0GznVomcdh43EFH7CsdTD2QAuK6Zbaz7v44ywzA8fV9xymOc1AJO54J8OWbWdBjmMMPZyrbePnx/BQouH2LHbiDaiSP2gejO1BJsdyCY 0xcUlm1B Crdfi/O9CD/gLwGo6IrASlqgrR/Rtx5Y5NqJcIu7tliX3Y5XWZAHHE6hIYdeYBLwmWo+hcRFYbEknPj5nB3e5ZiGTBk4WAraLyCNI9Dm3XVTB7gDQ0HpSfKUM/KZoXvUuHtEtvXpKTWZ0RQvxmHAC2Ael5dkECHHKbdKUiBpPqtZTRgV3NuK1szWm9Efm9muBI9fZKoE9UVOo3f7OdV4HiPZDLoPQMIfi86cokpknNSRkSOjyp/Ehykcoj239H7oO7y+SKjn0Drprz3/k0Tg5P2g95JiOJBrLHJ1S423+1ulVHHu7Swk6NagGUmzBEPznMoc7ikubLMGgH+PzP8B0mb+O6OQmZu5MnK1l37GQJJipMvBlUETadcZm2MUSegrvh6C9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Move Bloom filters code into a dedicated section. Improve the design doc to explain Bloom filter usage and connection between aging and eviction in their use. Signed-off-by: T.J. Alumbaugh --- Documentation/mm/multigen_lru.rst | 16 +++ mm/vmscan.c | 180 +++++++++++++++--------------- 2 files changed, 108 insertions(+), 88 deletions(-) diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_lru.rst index bd988a142bc2..770b5d539856 100644 --- a/Documentation/mm/multigen_lru.rst +++ b/Documentation/mm/multigen_lru.rst @@ -170,6 +170,22 @@ promotes hot pages. If the scan was done cacheline efficiently, it adds the PMD entry pointing to the PTE table to the Bloom filter. This forms a feedback loop between the eviction and the aging. +Bloom Filters +------------- +Bloom filters are a space and memory efficient data structure for set +membership test, i.e., test if an element is not in the set or may be +in the set. + +In the eviction path, specifically, in ``lru_gen_look_around()``, if a +PMD has a sufficient number of hot pages, its address is placed in the +filter. In the aging path, set membership means that the PTE range +will be scanned for young pages. + +Note that Bloom filters are probabilistic on set membership. If a test +is false positive, the cost is an additional scan of a range of PTEs, +which may yield hot pages anyway. Parameters of the filter itself can +control the false positive rate in the limit. + Summary ------- The multi-gen LRU can be disassembled into the following parts: diff --git a/mm/vmscan.c b/mm/vmscan.c index eb9263bf6806..1be9120349f8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3233,6 +3233,98 @@ static bool __maybe_unused seq_is_valid(struct lruvec *lruvec) get_nr_gens(lruvec, LRU_GEN_ANON) <= MAX_NR_GENS; } +/****************************************************************************** + * Bloom filters + ******************************************************************************/ + +/* + * Bloom filters with m=1<<15, k=2 and the false positive rates of ~1/5 when + * n=10,000 and ~1/2 when n=20,000, where, conventionally, m is the number of + * bits in a bitmap, k is the number of hash functions and n is the number of + * inserted items. + * + * Page table walkers use one of the two filters to reduce their search space. + * To get rid of non-leaf entries that no longer have enough leaf entries, the + * aging uses the double-buffering technique to flip to the other filter each + * time it produces a new generation. For non-leaf entries that have enough + * leaf entries, the aging carries them over to the next generation in + * walk_pmd_range(); the eviction also report them when walking the rmap + * in lru_gen_look_around(). + * + * For future optimizations: + * 1. It's not necessary to keep both filters all the time. The spare one can be + * freed after the RCU grace period and reallocated if needed again. + * 2. And when reallocating, it's worth scaling its size according to the number + * of inserted entries in the other filter, to reduce the memory overhead on + * small systems and false positives on large systems. + * 3. Jenkins' hash function is an alternative to Knuth's. + */ +#define BLOOM_FILTER_SHIFT 15 + +static inline int filter_gen_from_seq(unsigned long seq) +{ + return seq % NR_BLOOM_FILTERS; +} + +static void get_item_key(void *item, int *key) +{ + u32 hash = hash_ptr(item, BLOOM_FILTER_SHIFT * 2); + + BUILD_BUG_ON(BLOOM_FILTER_SHIFT * 2 > BITS_PER_TYPE(u32)); + + key[0] = hash & (BIT(BLOOM_FILTER_SHIFT) - 1); + key[1] = hash >> BLOOM_FILTER_SHIFT; +} + +static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item) +{ + int key[2]; + unsigned long *filter; + int gen = filter_gen_from_seq(seq); + + filter = READ_ONCE(lruvec->mm_state.filters[gen]); + if (!filter) + return true; + + get_item_key(item, key); + + return test_bit(key[0], filter) && test_bit(key[1], filter); +} + +static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item) +{ + int key[2]; + unsigned long *filter; + int gen = filter_gen_from_seq(seq); + + filter = READ_ONCE(lruvec->mm_state.filters[gen]); + if (!filter) + return; + + get_item_key(item, key); + + if (!test_bit(key[0], filter)) + set_bit(key[0], filter); + if (!test_bit(key[1], filter)) + set_bit(key[1], filter); +} + +static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq) +{ + unsigned long *filter; + int gen = filter_gen_from_seq(seq); + + filter = lruvec->mm_state.filters[gen]; + if (filter) { + bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT)); + return; + } + + filter = bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT), + __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN); + WRITE_ONCE(lruvec->mm_state.filters[gen], filter); +} + /****************************************************************************** * mm_struct list ******************************************************************************/ @@ -3352,94 +3444,6 @@ void lru_gen_migrate_mm(struct mm_struct *mm) } #endif -/* - * Bloom filters with m=1<<15, k=2 and the false positive rates of ~1/5 when - * n=10,000 and ~1/2 when n=20,000, where, conventionally, m is the number of - * bits in a bitmap, k is the number of hash functions and n is the number of - * inserted items. - * - * Page table walkers use one of the two filters to reduce their search space. - * To get rid of non-leaf entries that no longer have enough leaf entries, the - * aging uses the double-buffering technique to flip to the other filter each - * time it produces a new generation. For non-leaf entries that have enough - * leaf entries, the aging carries them over to the next generation in - * walk_pmd_range(); the eviction also report them when walking the rmap - * in lru_gen_look_around(). - * - * For future optimizations: - * 1. It's not necessary to keep both filters all the time. The spare one can be - * freed after the RCU grace period and reallocated if needed again. - * 2. And when reallocating, it's worth scaling its size according to the number - * of inserted entries in the other filter, to reduce the memory overhead on - * small systems and false positives on large systems. - * 3. Jenkins' hash function is an alternative to Knuth's. - */ -#define BLOOM_FILTER_SHIFT 15 - -static inline int filter_gen_from_seq(unsigned long seq) -{ - return seq % NR_BLOOM_FILTERS; -} - -static void get_item_key(void *item, int *key) -{ - u32 hash = hash_ptr(item, BLOOM_FILTER_SHIFT * 2); - - BUILD_BUG_ON(BLOOM_FILTER_SHIFT * 2 > BITS_PER_TYPE(u32)); - - key[0] = hash & (BIT(BLOOM_FILTER_SHIFT) - 1); - key[1] = hash >> BLOOM_FILTER_SHIFT; -} - -static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq) -{ - unsigned long *filter; - int gen = filter_gen_from_seq(seq); - - filter = lruvec->mm_state.filters[gen]; - if (filter) { - bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT)); - return; - } - - filter = bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT), - __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN); - WRITE_ONCE(lruvec->mm_state.filters[gen], filter); -} - -static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item) -{ - int key[2]; - unsigned long *filter; - int gen = filter_gen_from_seq(seq); - - filter = READ_ONCE(lruvec->mm_state.filters[gen]); - if (!filter) - return; - - get_item_key(item, key); - - if (!test_bit(key[0], filter)) - set_bit(key[0], filter); - if (!test_bit(key[1], filter)) - set_bit(key[1], filter); -} - -static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item) -{ - int key[2]; - unsigned long *filter; - int gen = filter_gen_from_seq(seq); - - filter = READ_ONCE(lruvec->mm_state.filters[gen]); - if (!filter) - return true; - - get_item_key(item, key); - - return test_bit(key[0], filter) && test_bit(key[1], filter); -} - static void reset_mm_stats(struct lruvec *lruvec, struct lru_gen_mm_walk *walk, bool last) { int i;