From patchwork Sat Jul 25 12:00:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684963 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 445EA138C for ; Sat, 25 Jul 2020 12:02:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3612B206C1 for ; Sat, 25 Jul 2020 12:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726768AbgGYMCk (ORCPT ); Sat, 25 Jul 2020 08:02:40 -0400 Received: from mx2.suse.de ([195.135.220.15]:52026 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMCk (ORCPT ); Sat, 25 Jul 2020 08:02:40 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3450DAB55; Sat, 25 Jul 2020 12:02:47 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Jean Delvare , Coly Li , Kent Overstreet Subject: [PATCH 01/25] bcache: Fix typo in Kconfig name Date: Sat, 25 Jul 2020 20:00:15 +0800 Message-Id: <20200725120039.91071-2-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Jean Delvare registraion -> registration Signed-off-by: Jean Delvare Fixes: 0c8d3fceade2 ("bcache: configure the asynchronous registertion to be experimental") Reviewed-by: Coly Li Cc: Jens Axboe Cc: Kent Overstreet --- drivers/md/bcache/Kconfig | 2 +- drivers/md/bcache/super.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig index bf7dd96db9b3..d1ca4d059c20 100644 --- a/drivers/md/bcache/Kconfig +++ b/drivers/md/bcache/Kconfig @@ -27,7 +27,7 @@ config BCACHE_CLOSURES_DEBUG interface to list them, which makes it possible to see asynchronous operations that get stuck. -config BCACHE_ASYNC_REGISTRAION +config BCACHE_ASYNC_REGISTRATION bool "Asynchronous device registration (EXPERIMENTAL)" depends on BCACHE help diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 2014016f9a60..38d79f66fde5 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2782,7 +2782,7 @@ static int __init bcache_init(void) static const struct attribute *files[] = { &ksysfs_register.attr, &ksysfs_register_quiet.attr, -#ifdef CONFIG_BCACHE_ASYNC_REGISTRAION +#ifdef CONFIG_BCACHE_ASYNC_REGISTRATION &ksysfs_register_async.attr, #endif &ksysfs_pendings_cleanup.attr, From patchwork Sat Jul 25 12:00:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684965 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A7A7138C for ; Sat, 25 Jul 2020 12:02:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 38BE6206EB for ; Sat, 25 Jul 2020 12:02:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726777AbgGYMCn (ORCPT ); Sat, 25 Jul 2020 08:02:43 -0400 Received: from mx2.suse.de ([195.135.220.15]:52068 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMCm (ORCPT ); Sat, 25 Jul 2020 08:02:42 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5EA59AFBE; Sat, 25 Jul 2020 12:02:50 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , stable@vger.kernel.org Subject: [PATCH 02/25] bcache: allocate meta data pages as compound pages Date: Sat, 25 Jul 2020 20:00:16 +0800 Message-Id: <20200725120039.91071-3-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org There are some meta data of bcache are allocated by multiple pages, and they are used as bio bv_page for I/Os to the cache device. for example cache_set->uuids, cache->disk_buckets, journal_write->data, bset_tree->data. For such meta data memory, all the allocated pages should be treated as a single memory block. Then the memory management and underlying I/O code can treat them more clearly. This patch adds __GFP_COMP flag to all the location allocating >0 order pages for the above mentioned meta data. Then their pages are treated as compound pages now. Signed-off-by: Coly Li Cc: stable@vger.kernel.org --- drivers/md/bcache/bset.c | 2 +- drivers/md/bcache/btree.c | 2 +- drivers/md/bcache/journal.c | 4 ++-- drivers/md/bcache/super.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c index 4995fcaefe29..67a2c47f4201 100644 --- a/drivers/md/bcache/bset.c +++ b/drivers/md/bcache/bset.c @@ -322,7 +322,7 @@ int bch_btree_keys_alloc(struct btree_keys *b, b->page_order = page_order; - t->data = (void *) __get_free_pages(gfp, b->page_order); + t->data = (void *) __get_free_pages(__GFP_COMP|gfp, b->page_order); if (!t->data) goto err; diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 6548a601edf0..dd116c83de80 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -785,7 +785,7 @@ int bch_btree_cache_alloc(struct cache_set *c) mutex_init(&c->verify_lock); c->verify_ondisk = (void *) - __get_free_pages(GFP_KERNEL, ilog2(bucket_pages(c))); + __get_free_pages(GFP_KERNEL|__GFP_COMP, ilog2(bucket_pages(c))); c->verify_data = mca_bucket_alloc(c, &ZERO_KEY, GFP_KERNEL); diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 90aac4e2333f..d8586b6ccb76 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -999,8 +999,8 @@ int bch_journal_alloc(struct cache_set *c) j->w[1].c = c; if (!(init_fifo(&j->pin, JOURNAL_PIN, GFP_KERNEL)) || - !(j->w[0].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)) || - !(j->w[1].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS))) + !(j->w[0].data = (void *) __get_free_pages(GFP_KERNEL|__GFP_COMP, JSET_BITS)) || + !(j->w[1].data = (void *) __get_free_pages(GFP_KERNEL|__GFP_COMP, JSET_BITS))) return -ENOMEM; return 0; diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 38d79f66fde5..6db698b1739a 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1776,7 +1776,7 @@ void bch_cache_set_unregister(struct cache_set *c) } #define alloc_bucket_pages(gfp, c) \ - ((void *) __get_free_pages(__GFP_ZERO|gfp, ilog2(bucket_pages(c)))) + ((void *) __get_free_pages(__GFP_ZERO|__GFP_COMP|gfp, ilog2(bucket_pages(c)))) struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) { From patchwork Sat Jul 25 12:00:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684967 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 42084913 for ; Sat, 25 Jul 2020 12:02:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 320BD206C1 for ; Sat, 25 Jul 2020 12:02:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726942AbgGYMCq (ORCPT ); Sat, 25 Jul 2020 08:02:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:52110 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMCq (ORCPT ); Sat, 25 Jul 2020 08:02:46 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 97C25AB55; Sat, 25 Jul 2020 12:02:53 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Xu Wang , Coly Li Subject: [PATCH 03/25] bcache: journel: use for_each_clear_bit() to simplify the code Date: Sat, 25 Jul 2020 20:00:17 +0800 Message-Id: <20200725120039.91071-4-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Xu Wang Using for_each_clear_bit() to simplify the code. Signed-off-by: Xu Wang Signed-off-by: Coly Li --- drivers/md/bcache/journal.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index d8586b6ccb76..77fbfd52edcf 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -217,10 +217,7 @@ int bch_journal_read(struct cache_set *c, struct list_head *list) */ pr_debug("falling back to linear search\n"); - for (l = find_first_zero_bit(bitmap, ca->sb.njournal_buckets); - l < ca->sb.njournal_buckets; - l = find_next_zero_bit(bitmap, ca->sb.njournal_buckets, - l + 1)) + for_each_clear_bit(l, bitmap, ca->sb.njournal_buckets) if (read_bucket(l)) goto bsearch; From patchwork Sat Jul 25 12:00:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684969 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5E3A138C for ; Sat, 25 Jul 2020 12:02:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C7848206F6 for ; Sat, 25 Jul 2020 12:02:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726956AbgGYMCt (ORCPT ); Sat, 25 Jul 2020 08:02:49 -0400 Received: from mx2.suse.de ([195.135.220.15]:52136 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMCt (ORCPT ); Sat, 25 Jul 2020 08:02:49 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 36817B084; Sat, 25 Jul 2020 12:02:56 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Xu Wang , Coly Li Subject: [PATCH 04/25] bcache: writeback: Remove unneeded variable i Date: Sat, 25 Jul 2020 20:00:18 +0800 Message-Id: <20200725120039.91071-5-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Xu Wang Remove unneeded variable i in bch_dirty_init_thread(). Signed-off-by: Xu Wang Signed-off-by: Coly Li --- drivers/md/bcache/writeback.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 1cf1e5016cb9..71801c086b82 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -825,10 +825,8 @@ static int bch_dirty_init_thread(void *arg) struct btree_iter iter; struct bkey *k, *p; int cur_idx, prev_idx, skip_nr; - int i; k = p = NULL; - i = 0; cur_idx = prev_idx = 0; bch_btree_iter_init(&c->root->keys, &iter, NULL); From patchwork Sat Jul 25 12:00:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684971 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C822913 for ; Sat, 25 Jul 2020 12:02:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5D00B206F6 for ; Sat, 25 Jul 2020 12:02:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726972AbgGYMCv (ORCPT ); Sat, 25 Jul 2020 08:02:51 -0400 Received: from mx2.suse.de ([195.135.220.15]:52154 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMCv (ORCPT ); Sat, 25 Jul 2020 08:02:51 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id CBE54AFBE; Sat, 25 Jul 2020 12:02:58 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, "Gustavo A. R. Silva" , Coly Li Subject: [PATCH 05/25] bcache: movinggc: Use struct_size() helper in kzalloc() Date: Sat, 25 Jul 2020 20:00:19 +0800 Message-Id: <20200725120039.91071-6-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: "Gustavo A. R. Silva" Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva Signed-off-by: Coly Li --- drivers/md/bcache/movinggc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c index 7891fb512736..b7dd2d75f58c 100644 --- a/drivers/md/bcache/movinggc.c +++ b/drivers/md/bcache/movinggc.c @@ -145,8 +145,8 @@ static void read_moving(struct cache_set *c) continue; } - io = kzalloc(sizeof(struct moving_io) + sizeof(struct bio_vec) - * DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS), + io = kzalloc(struct_size(io, bio.bio.bi_inline_vecs, + DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS)), GFP_KERNEL); if (!io) goto err; From patchwork Sat Jul 25 12:00:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684973 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 07926913 for ; Sat, 25 Jul 2020 12:02:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ED5F62070C for ; Sat, 25 Jul 2020 12:02:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726973AbgGYMCy (ORCPT ); Sat, 25 Jul 2020 08:02:54 -0400 Received: from mx2.suse.de ([195.135.220.15]:52180 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMCy (ORCPT ); Sat, 25 Jul 2020 08:02:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 636D3AB55; Sat, 25 Jul 2020 12:03:01 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, "Gustavo A. R. Silva" , Coly Li Subject: [PATCH 06/25] bcache: Use struct_size() in kzalloc() Date: Sat, 25 Jul 2020 20:00:20 +0800 Message-Id: <20200725120039.91071-7-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: "Gustavo A. R. Silva" Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva Signed-off-by: Coly Li --- drivers/md/bcache/writeback.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 71801c086b82..5397a2c5d6cc 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -459,10 +459,8 @@ static void read_dirty(struct cached_dev *dc) for (i = 0; i < nk; i++) { w = keys[i]; - io = kzalloc(sizeof(struct dirty_io) + - sizeof(struct bio_vec) * - DIV_ROUND_UP(KEY_SIZE(&w->key), - PAGE_SECTORS), + io = kzalloc(struct_size(io, bio.bi_inline_vecs, + DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS)), GFP_KERNEL); if (!io) goto err; From patchwork Sat Jul 25 12:00:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684975 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 387F5913 for ; Sat, 25 Jul 2020 12:02:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2AE8A20759 for ; Sat, 25 Jul 2020 12:02:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726989AbgGYMC5 (ORCPT ); Sat, 25 Jul 2020 08:02:57 -0400 Received: from mx2.suse.de ([195.135.220.15]:52232 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMC4 (ORCPT ); Sat, 25 Jul 2020 08:02:56 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3CEF7AB55; Sat, 25 Jul 2020 12:03:04 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Ken Raeburn , stable@vger.kernel.org Subject: [PATCH 07/25] bcache: avoid nr_stripes overflow in bcache_device_init() Date: Sat, 25 Jul 2020 20:00:21 +0800 Message-Id: <20200725120039.91071-8-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org For some block devices which large capacity (e.g. 8TB) but small io_opt size (e.g. 8 sectors), in bcache_device_init() the stripes number calcu- lated by, DIV_ROUND_UP_ULL(sectors, d->stripe_size); might be overflow to the unsigned int bcache_device->nr_stripes. This patch uses the uint64_t variable to store DIV_ROUND_UP_ULL() and after the value is checked to be available in unsigned int range, sets it to bache_device->nr_stripes. Then the overflow is avoided. Reported-and-tested-by: Ken Raeburn Signed-off-by: Coly Li Link: https://bugzilla.redhat.com/show_bug.cgi?id=1783075 Cc: stable@vger.kernel.org --- drivers/md/bcache/super.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 6db698b1739a..05ad1cd9f329 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -826,19 +826,19 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size, struct request_queue *q; const size_t max_stripes = min_t(size_t, INT_MAX, SIZE_MAX / sizeof(atomic_t)); - size_t n; + uint64_t n; int idx; if (!d->stripe_size) d->stripe_size = 1 << 31; - d->nr_stripes = DIV_ROUND_UP_ULL(sectors, d->stripe_size); - - if (!d->nr_stripes || d->nr_stripes > max_stripes) { - pr_err("nr_stripes too large or invalid: %u (start sector beyond end of disk?)\n", - (unsigned int)d->nr_stripes); + n = DIV_ROUND_UP_ULL(sectors, d->stripe_size); + if (!n || n > max_stripes) { + pr_err("nr_stripes too large or invalid: %llu (start sector beyond end of disk?)\n", + n); return -ENOMEM; } + d->nr_stripes = n; n = d->nr_stripes * sizeof(atomic_t); d->stripe_sectors_dirty = kvzalloc(n, GFP_KERNEL); From patchwork Sat Jul 25 12:00:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684977 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15E81138C for ; Sat, 25 Jul 2020 12:03:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 08A412070C for ; Sat, 25 Jul 2020 12:03:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726994AbgGYMDA (ORCPT ); Sat, 25 Jul 2020 08:03:00 -0400 Received: from mx2.suse.de ([195.135.220.15]:52272 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMC7 (ORCPT ); Sat, 25 Jul 2020 08:02:59 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5373DAEAF; Sat, 25 Jul 2020 12:03:07 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Ken Raeburn , stable@vger.kernel.org Subject: [PATCH 08/25] bcache: fix overflow in offset_to_stripe() Date: Sat, 25 Jul 2020 20:00:22 +0800 Message-Id: <20200725120039.91071-9-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org offset_to_stripe() returns the stripe number (in type unsigned int) from an offset (in type uint64_t) by the following calculation, do_div(offset, d->stripe_size); For large capacity backing device (e.g. 18TB) with small stripe size (e.g. 4KB), the result is 4831838208 and exceeds UINT_MAX. The actual returned value which caller receives is 536870912, due to the overflow. Indeed in bcache_device_init(), bcache_device->nr_stripes is limited in range [1, INT_MAX]. Therefore all valid stripe numbers in bcache are in range [0, bcache_dev->nr_stripes - 1]. This patch adds a upper limition check in offset_to_stripe(): the max valid stripe number should be less than bcache_device->nr_stripes. If the calculated stripe number from do_div() is equal to or larger than bcache_device->nr_stripe, -EINVAL will be returned. (Normally nr_stripes is less than INT_MAX, exceeding upper limitation doesn't mean overflow, therefore -EOVERFLOW is not used as error code.) This patch also changes nr_stripes' type of struct bcache_device from 'unsigned int' to 'int', and return value type of offset_to_stripe() from 'unsigned int' to 'int', to match their exact data ranges. All locations where bcache_device->nr_stripes and offset_to_stripe() are referenced also get updated for the above type change. Reported-and-tested-by: Ken Raeburn Signed-off-by: Coly Li Link: https://bugzilla.redhat.com/show_bug.cgi?id=1783075 Cc: stable@vger.kernel.org --- drivers/md/bcache/bcache.h | 2 +- drivers/md/bcache/writeback.c | 14 +++++++++----- drivers/md/bcache/writeback.h | 19 +++++++++++++++++-- 3 files changed, 27 insertions(+), 8 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 221e0191b687..80e3c4813fb0 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -264,7 +264,7 @@ struct bcache_device { #define BCACHE_DEV_UNLINK_DONE 2 #define BCACHE_DEV_WB_RUNNING 3 #define BCACHE_DEV_RATE_DW_RUNNING 4 - unsigned int nr_stripes; + int nr_stripes; unsigned int stripe_size; atomic_t *stripe_sectors_dirty; unsigned long *full_dirty_stripes; diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 5397a2c5d6cc..4f4ad6b3d43a 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -521,15 +521,19 @@ void bcache_dev_sectors_dirty_add(struct cache_set *c, unsigned int inode, uint64_t offset, int nr_sectors) { struct bcache_device *d = c->devices[inode]; - unsigned int stripe_offset, stripe, sectors_dirty; + unsigned int stripe_offset, sectors_dirty; + int stripe; if (!d) return; + stripe = offset_to_stripe(d, offset); + if (stripe < 0) + return; + if (UUID_FLASH_ONLY(&c->uuids[inode])) atomic_long_add(nr_sectors, &c->flash_dev_dirty_sectors); - stripe = offset_to_stripe(d, offset); stripe_offset = offset & (d->stripe_size - 1); while (nr_sectors) { @@ -569,12 +573,12 @@ static bool dirty_pred(struct keybuf *buf, struct bkey *k) static void refill_full_stripes(struct cached_dev *dc) { struct keybuf *buf = &dc->writeback_keys; - unsigned int start_stripe, stripe, next_stripe; + unsigned int start_stripe, next_stripe; + int stripe; bool wrapped = false; stripe = offset_to_stripe(&dc->disk, KEY_OFFSET(&buf->last_scanned)); - - if (stripe >= dc->disk.nr_stripes) + if (stripe < 0) stripe = 0; start_stripe = stripe; diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h index b029843ce5b6..3f1230e22de0 100644 --- a/drivers/md/bcache/writeback.h +++ b/drivers/md/bcache/writeback.h @@ -52,10 +52,22 @@ static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d) return ret; } -static inline unsigned int offset_to_stripe(struct bcache_device *d, +static inline int offset_to_stripe(struct bcache_device *d, uint64_t offset) { do_div(offset, d->stripe_size); + + /* d->nr_stripes is in range [1, INT_MAX] */ + if (unlikely(offset >= d->nr_stripes)) { + pr_err("Invalid stripe %llu (>= nr_stripes %d).\n", + offset, d->nr_stripes); + return -EINVAL; + } + + /* + * Here offset is definitly smaller than INT_MAX, + * return it as int will never overflow. + */ return offset; } @@ -63,7 +75,10 @@ static inline bool bcache_dev_stripe_dirty(struct cached_dev *dc, uint64_t offset, unsigned int nr_sectors) { - unsigned int stripe = offset_to_stripe(&dc->disk, offset); + int stripe = offset_to_stripe(&dc->disk, offset); + + if (stripe < 0) + return false; while (1) { if (atomic_read(dc->disk.stripe_sectors_dirty + stripe)) From patchwork Sat Jul 25 12:00:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684979 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0491B913 for ; Sat, 25 Jul 2020 12:03:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EB4FB20714 for ; Sat, 25 Jul 2020 12:03:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727013AbgGYMDD (ORCPT ); Sat, 25 Jul 2020 08:03:03 -0400 Received: from mx2.suse.de ([195.135.220.15]:52300 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726997AbgGYMDD (ORCPT ); Sat, 25 Jul 2020 08:03:03 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 96049AB55; Sat, 25 Jul 2020 12:03:09 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li Subject: [PATCH 09/25] bcache: add read_super_common() to read major part of super block Date: Sat, 25 Jul 2020 20:00:23 +0800 Message-Id: <20200725120039.91071-10-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Later patches will introduce feature set bits to on-disk super block and increase super block version. Current code in read_super() which reads common part of super block for version BCACHE_SB_VERSION_CDEV and version BCACHE_SB_VERSION_CDEV_WITH_UUID will be shared with the new version. Therefore this patch moves the reusable part into read_super_common(), this preparation patch will make later patches more simplier and only focus on new feature set bits. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 111 +++++++++++++++++++++----------------- 1 file changed, 63 insertions(+), 48 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 05ad1cd9f329..b5b81b92b2ef 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -59,6 +59,67 @@ struct workqueue_struct *bch_journal_wq; /* Superblock */ +static const char *read_super_common(struct cache_sb *sb, struct block_device *bdev, + struct cache_sb_disk *s) +{ + const char *err; + unsigned int i; + + sb->nbuckets = le64_to_cpu(s->nbuckets); + sb->bucket_size = le16_to_cpu(s->bucket_size); + + sb->nr_in_set = le16_to_cpu(s->nr_in_set); + sb->nr_this_dev = le16_to_cpu(s->nr_this_dev); + + err = "Too many buckets"; + if (sb->nbuckets > LONG_MAX) + goto err; + + err = "Not enough buckets"; + if (sb->nbuckets < 1 << 7) + goto err; + + err = "Bad block/bucket size"; + if (!is_power_of_2(sb->block_size) || + sb->block_size > PAGE_SECTORS || + !is_power_of_2(sb->bucket_size) || + sb->bucket_size < PAGE_SECTORS) + goto err; + + err = "Invalid superblock: device too small"; + if (get_capacity(bdev->bd_disk) < + sb->bucket_size * sb->nbuckets) + goto err; + + err = "Bad UUID"; + if (bch_is_zero(sb->set_uuid, 16)) + goto err; + + err = "Bad cache device number in set"; + if (!sb->nr_in_set || + sb->nr_in_set <= sb->nr_this_dev || + sb->nr_in_set > MAX_CACHES_PER_SET) + goto err; + + err = "Journal buckets not sequential"; + for (i = 0; i < sb->keys; i++) + if (sb->d[i] != sb->first_bucket + i) + goto err; + + err = "Too many journal buckets"; + if (sb->first_bucket + sb->keys > sb->nbuckets) + goto err; + + err = "Invalid superblock: first bucket comes before end of super"; + if (sb->first_bucket * sb->bucket_size < 16) + goto err; + + err = NULL; +err: + return err; +} + + static const char *read_super(struct cache_sb *sb, struct block_device *bdev, struct cache_sb_disk **res) { @@ -133,55 +194,9 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, break; case BCACHE_SB_VERSION_CDEV: case BCACHE_SB_VERSION_CDEV_WITH_UUID: - sb->nbuckets = le64_to_cpu(s->nbuckets); - sb->bucket_size = le16_to_cpu(s->bucket_size); - - sb->nr_in_set = le16_to_cpu(s->nr_in_set); - sb->nr_this_dev = le16_to_cpu(s->nr_this_dev); - - err = "Too many buckets"; - if (sb->nbuckets > LONG_MAX) - goto err; - - err = "Not enough buckets"; - if (sb->nbuckets < 1 << 7) - goto err; - - err = "Bad block/bucket size"; - if (!is_power_of_2(sb->block_size) || - sb->block_size > PAGE_SECTORS || - !is_power_of_2(sb->bucket_size) || - sb->bucket_size < PAGE_SECTORS) - goto err; - - err = "Invalid superblock: device too small"; - if (get_capacity(bdev->bd_disk) < - sb->bucket_size * sb->nbuckets) - goto err; - - err = "Bad UUID"; - if (bch_is_zero(sb->set_uuid, 16)) - goto err; - - err = "Bad cache device number in set"; - if (!sb->nr_in_set || - sb->nr_in_set <= sb->nr_this_dev || - sb->nr_in_set > MAX_CACHES_PER_SET) - goto err; - - err = "Journal buckets not sequential"; - for (i = 0; i < sb->keys; i++) - if (sb->d[i] != sb->first_bucket + i) - goto err; - - err = "Too many journal buckets"; - if (sb->first_bucket + sb->keys > sb->nbuckets) - goto err; - - err = "Invalid superblock: first bucket comes before end of super"; - if (sb->first_bucket * sb->bucket_size < 16) + err = read_super_common(sb, bdev, s); + if (err) goto err; - break; default: err = "Unsupported superblock version"; From patchwork Sat Jul 25 12:00:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684981 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D8329913 for ; Sat, 25 Jul 2020 12:03:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0209206F6 for ; Sat, 25 Jul 2020 12:03:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727009AbgGYMDF (ORCPT ); Sat, 25 Jul 2020 08:03:05 -0400 Received: from mx2.suse.de ([195.135.220.15]:52316 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726997AbgGYMDE (ORCPT ); Sat, 25 Jul 2020 08:03:04 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 28B9AAB55; Sat, 25 Jul 2020 12:03:12 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 10/25] bcache: add more accurate error information in read_super_common() Date: Sat, 25 Jul 2020 20:00:24 +0800 Message-Id: <20200725120039.91071-11-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The improperly set bucket or block size will trigger error in read_super_common(). For large bucket size, a more accurate error message for invalid bucket or block size is necessary. This patch disassembles the combined if() checks into multiple single if() check, and provide more accurate error message for each check failure condition. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/super.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index b5b81b92b2ef..fd8c9ee4d4a6 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -79,11 +79,20 @@ static const char *read_super_common(struct cache_sb *sb, struct block_device * if (sb->nbuckets < 1 << 7) goto err; - err = "Bad block/bucket size"; - if (!is_power_of_2(sb->block_size) || - sb->block_size > PAGE_SECTORS || - !is_power_of_2(sb->bucket_size) || - sb->bucket_size < PAGE_SECTORS) + err = "Bad block size (not power of 2)"; + if (!is_power_of_2(sb->block_size)) + goto err; + + err = "Bad block size (larger than page size)"; + if (sb->block_size > PAGE_SECTORS) + goto err; + + err = "Bad bucket size (not power of 2)"; + if (!is_power_of_2(sb->bucket_size)) + goto err; + + err = "Bad bucket size (smaller than page size)"; + if (sb->bucket_size < PAGE_SECTORS) goto err; err = "Invalid superblock: device too small"; From patchwork Sat Jul 25 12:00:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684983 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E1EA913 for ; Sat, 25 Jul 2020 12:03:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 66E6C206F6 for ; Sat, 25 Jul 2020 12:03:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726997AbgGYMDH (ORCPT ); Sat, 25 Jul 2020 08:03:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:52340 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMDH (ORCPT ); Sat, 25 Jul 2020 08:03:07 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id AE2E0AEAF; Sat, 25 Jul 2020 12:03:14 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 11/25] bcache: disassemble the big if() checks in bch_cache_set_alloc() Date: Sat, 25 Jul 2020 20:00:25 +0800 Message-Id: <20200725120039.91071-12-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In bch_cache_set_alloc() there is a big if() checks combined by 11 items together. When this big if() statement fails, it is difficult to tell exactly which item fails indeed. This patch disassembles this big if() checks into 11 single if() checks, which makes code debug more easier. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/super.c | 52 ++++++++++++++++++++++++++++----------- 1 file changed, 37 insertions(+), 15 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index fd8c9ee4d4a6..7ad6bbabfc66 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1865,21 +1865,43 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) iter_size = (sb->bucket_size / sb->block_size + 1) * sizeof(struct btree_iter_set); - if (!(c->devices = kcalloc(c->nr_uuids, sizeof(void *), GFP_KERNEL)) || - mempool_init_slab_pool(&c->search, 32, bch_search_cache) || - mempool_init_kmalloc_pool(&c->bio_meta, 2, - sizeof(struct bbio) + sizeof(struct bio_vec) * - bucket_pages(c)) || - mempool_init_kmalloc_pool(&c->fill_iter, 1, iter_size) || - bioset_init(&c->bio_split, 4, offsetof(struct bbio, bio), - BIOSET_NEED_BVECS|BIOSET_NEED_RESCUER) || - !(c->uuids = alloc_bucket_pages(GFP_KERNEL, c)) || - !(c->moving_gc_wq = alloc_workqueue("bcache_gc", - WQ_MEM_RECLAIM, 0)) || - bch_journal_alloc(c) || - bch_btree_cache_alloc(c) || - bch_open_buckets_alloc(c) || - bch_bset_sort_state_init(&c->sort, ilog2(c->btree_pages))) + c->devices = kcalloc(c->nr_uuids, sizeof(void *), GFP_KERNEL); + if (!c->devices) + goto err; + + if (mempool_init_slab_pool(&c->search, 32, bch_search_cache)) + goto err; + + if (mempool_init_kmalloc_pool(&c->bio_meta, 2, + sizeof(struct bbio) + + sizeof(struct bio_vec) * bucket_pages(c))) + goto err; + + if (mempool_init_kmalloc_pool(&c->fill_iter, 1, iter_size)) + goto err; + + if (bioset_init(&c->bio_split, 4, offsetof(struct bbio, bio), + BIOSET_NEED_BVECS|BIOSET_NEED_RESCUER)) + goto err; + + c->uuids = alloc_bucket_pages(GFP_KERNEL, c); + if (!c->uuids) + goto err; + + c->moving_gc_wq = alloc_workqueue("bcache_gc", WQ_MEM_RECLAIM, 0); + if (!c->moving_gc_wq) + goto err; + + if (bch_journal_alloc(c)) + goto err; + + if (bch_btree_cache_alloc(c)) + goto err; + + if (bch_open_buckets_alloc(c)) + goto err; + + if (bch_bset_sort_state_init(&c->sort, ilog2(c->btree_pages))) goto err; c->congested_read_threshold_us = 2000; From patchwork Sat Jul 25 12:00:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684985 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 77DA9913 for ; Sat, 25 Jul 2020 12:03:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 68EDD206F6 for ; Sat, 25 Jul 2020 12:03:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727015AbgGYMDK (ORCPT ); Sat, 25 Jul 2020 08:03:10 -0400 Received: from mx2.suse.de ([195.135.220.15]:52382 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMDK (ORCPT ); Sat, 25 Jul 2020 08:03:10 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id CC295AEAF; Sat, 25 Jul 2020 12:03:17 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 12/25] bcache: fix super block seq numbers comparision in register_cache_set() Date: Sat, 25 Jul 2020 20:00:26 +0800 Message-Id: <20200725120039.91071-13-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In register_cache_set(), c is pointer to struct cache_set, and ca is pointer to struct cache, if ca->sb.seq > c->sb.seq, it means this registering cache has up to date version and other members, the in- memory version and other members should be updated to the newer value. But current implementation makes a cache set only has a single cache device, so the above assumption works well except for a special case. The execption is when a cache device new created and both ca->sb.seq and c->sb.seq are 0, because the super block is never flushed out yet. In the location for the following if() check, 2156 if (ca->sb.seq > c->sb.seq) { 2157 c->sb.version = ca->sb.version; 2158 memcpy(c->sb.set_uuid, ca->sb.set_uuid, 16); 2159 c->sb.flags = ca->sb.flags; 2160 c->sb.seq = ca->sb.seq; 2161 pr_debug("set version = %llu\n", c->sb.version); 2162 } c->sb.version is not initialized yet and valued 0. When ca->sb.seq is 0, the if() check will fail (because both values are 0), and the cache set version, set_uuid, flags and seq won't be updated. The above problem is hiden for current code, because the bucket size is compatible among different super block version. And the next time when running cache set again, ca->sb.seq will be larger than 0 and cache set super block version will be updated properly. But if the large bucket feature is enabled, sb->bucket_size is the low 16bits of the bucket size. For a power of 2 value, when the actual bucket size exceeds 16bit width, sb->bucket_size will always be 0. Then read_super_common() will fail because the if() check to is_power_of_2(sb->bucket_size) is false. This is how the long time hidden bug is triggered. This patch modifies the if() check to the following way, 2156 if (ca->sb.seq > c->sb.seq || c->sb.seq == 0) { Then cache set's version, set_uuid, flags and seq will always be updated corectly including for a new created cache device. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/super.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 7ad6bbabfc66..d5ed8f31e123 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2146,7 +2146,14 @@ static const char *register_cache_set(struct cache *ca) sysfs_create_link(&c->kobj, &ca->kobj, buf)) goto err; - if (ca->sb.seq > c->sb.seq) { + /* + * A special case is both ca->sb.seq and c->sb.seq are 0, + * such condition happens on a new created cache device whose + * super block is never flushed yet. In this case c->sb.version + * and other members should be updated too, otherwise we will + * have a mistaken super block version in cache set. + */ + if (ca->sb.seq > c->sb.seq || c->sb.seq == 0) { c->sb.version = ca->sb.version; memcpy(c->sb.set_uuid, ca->sb.set_uuid, 16); c->sb.flags = ca->sb.flags; From patchwork Sat Jul 25 12:00:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684987 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 85D82138C for ; Sat, 25 Jul 2020 12:03:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 73A20206F6 for ; Sat, 25 Jul 2020 12:03:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727026AbgGYMDP (ORCPT ); Sat, 25 Jul 2020 08:03:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:52408 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726583AbgGYMDP (ORCPT ); Sat, 25 Jul 2020 08:03:15 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 69464AEAF; Sat, 25 Jul 2020 12:03:20 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 13/25] bcache: increase super block version for cache device and backing device Date: Sat, 25 Jul 2020 20:00:27 +0800 Message-Id: <20200725120039.91071-14-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The new added super block version BCACHE_SB_VERSION_BDEV_WITH_FEATURES (5) BCACHE_SB_VERSION_CDEV_WITH_FEATURES value (6), is for the feature set bits. Devices have super block version equal to the new version will have three new members for feature set bits in the on-disk super block, __le64 feature_compat; __le64 feature_incompat; __le64 feature_ro_compat; They are used for further new features which may introduce on-disk format change, and avoid unncessary super block version increase. The very basic features handling code skeleton is also initialized in this patch. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/features.h | 78 ++++++++++++++++++++++++++++++++++++ drivers/md/bcache/super.c | 32 +++++++++++++-- include/uapi/linux/bcache.h | 29 ++++++++++---- 3 files changed, 128 insertions(+), 11 deletions(-) create mode 100644 drivers/md/bcache/features.h diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h new file mode 100644 index 000000000000..ae7df37b9862 --- /dev/null +++ b/drivers/md/bcache/features.h @@ -0,0 +1,78 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _BCACHE_FEATURES_H +#define _BCACHE_FEATURES_H + +#include +#include +#include + +#define BCH_FEATURE_COMPAT 0 +#define BCH_FEATURE_RO_COMPAT 1 +#define BCH_FEATURE_INCOMPAT 2 +#define BCH_FEATURE_TYPE_MASK 0x03 + +#define BCH_FEATURE_COMPAT_SUUP 0 +#define BCH_FEATURE_RO_COMPAT_SUUP 0 +#define BCH_FEATURE_INCOMPAT_SUUP 0 + +#define BCH_HAS_COMPAT_FEATURE(sb, mask) \ + ((sb)->feature_compat & (mask)) +#define BCH_HAS_RO_COMPAT_FEATURE(sb, mask) \ + ((sb)->feature_ro_compat & (mask)) +#define BCH_HAS_INCOMPAT_FEATURE(sb, mask) \ + ((sb)->feature_incompat & (mask)) + +/* Feature set definition */ + +#define BCH_FEATURE_COMPAT_FUNCS(name, flagname) \ +static inline int bch_has_feature_##name(struct cache_sb *sb) \ +{ \ + return (((sb)->feature_compat & \ + BCH##_FEATURE_COMPAT_##flagname) != 0); \ +} \ +static inline void bch_set_feature_##name(struct cache_sb *sb) \ +{ \ + (sb)->feature_compat |= \ + BCH##_FEATURE_COMPAT_##flagname; \ +} \ +static inline void bch_clear_feature_##name(struct cache_sb *sb) \ +{ \ + (sb)->feature_compat &= \ + ~BCH##_FEATURE_COMPAT_##flagname; \ +} + +#define BCH_FEATURE_RO_COMPAT_FUNCS(name, flagname) \ +static inline int bch_has_feature_##name(struct cache_sb *sb) \ +{ \ + return (((sb)->feature_ro_compat & \ + BCH##_FEATURE_RO_COMPAT_##flagname) != 0); \ +} \ +static inline void bch_set_feature_##name(struct cache_sb *sb) \ +{ \ + (sb)->feature_ro_compat |= \ + BCH##_FEATURE_RO_COMPAT_##flagname; \ +} \ +static inline void bch_clear_feature_##name(struct cache_sb *sb) \ +{ \ + (sb)->feature_ro_compat &= \ + ~BCH##_FEATURE_RO_COMPAT_##flagname; \ +} + +#define BCH_FEATURE_INCOMPAT_FUNCS(name, flagname) \ +static inline int bch_has_feature_##name(struct cache_sb *sb) \ +{ \ + return (((sb)->feature_incompat & \ + BCH##_FEATURE_INCOMPAT_##flagname) != 0); \ +} \ +static inline void bch_set_feature_##name(struct cache_sb *sb) \ +{ \ + (sb)->feature_incompat |= \ + BCH##_FEATURE_INCOMPAT_##flagname; \ +} \ +static inline void bch_clear_feature_##name(struct cache_sb *sb) \ +{ \ + (sb)->feature_incompat &= \ + ~BCH##_FEATURE_INCOMPAT_##flagname; \ +} + +#endif diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index d5ed8f31e123..62534f44c6dc 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -13,6 +13,7 @@ #include "extents.h" #include "request.h" #include "writeback.h" +#include "features.h" #include #include @@ -194,6 +195,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, sb->data_offset = BDEV_DATA_START_DEFAULT; break; case BCACHE_SB_VERSION_BDEV_WITH_OFFSET: + case BCACHE_SB_VERSION_BDEV_WITH_FEATURES: sb->data_offset = le64_to_cpu(s->data_offset); err = "Bad data offset"; @@ -207,6 +209,14 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, if (err) goto err; break; + case BCACHE_SB_VERSION_CDEV_WITH_FEATURES: + err = read_super_common(sb, bdev, s); + if (err) + goto err; + sb->feature_compat = le64_to_cpu(s->feature_compat); + sb->feature_incompat = le64_to_cpu(s->feature_incompat); + sb->feature_ro_compat = le64_to_cpu(s->feature_ro_compat); + break; default: err = "Unsupported superblock version"; goto err; @@ -241,7 +251,6 @@ static void __write_super(struct cache_sb *sb, struct cache_sb_disk *out, offset_in_page(out)); out->offset = cpu_to_le64(sb->offset); - out->version = cpu_to_le64(sb->version); memcpy(out->uuid, sb->uuid, 16); memcpy(out->set_uuid, sb->set_uuid, 16); @@ -257,6 +266,13 @@ static void __write_super(struct cache_sb *sb, struct cache_sb_disk *out, for (i = 0; i < sb->keys; i++) out->d[i] = cpu_to_le64(sb->d[i]); + if (sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES) { + out->feature_compat = cpu_to_le64(sb->feature_compat); + out->feature_incompat = cpu_to_le64(sb->feature_incompat); + out->feature_ro_compat = cpu_to_le64(sb->feature_ro_compat); + } + + out->version = cpu_to_le64(sb->version); out->csum = csum_set(out); pr_debug("ver %llu, flags %llu, seq %llu\n", @@ -313,17 +329,20 @@ void bcache_write_super(struct cache_set *c) { struct closure *cl = &c->sb_write; struct cache *ca; - unsigned int i; + unsigned int i, version = BCACHE_SB_VERSION_CDEV_WITH_UUID; down(&c->sb_write_mutex); closure_init(cl, &c->cl); c->sb.seq++; + if (c->sb.version > version) + version = c->sb.version; + for_each_cache(ca, c, i) { struct bio *bio = &ca->sb_bio; - ca->sb.version = BCACHE_SB_VERSION_CDEV_WITH_UUID; + ca->sb.version = version; ca->sb.seq = c->sb.seq; ca->sb.last_mount = c->sb.last_mount; @@ -1831,6 +1850,13 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) c->sb.bucket_size = sb->bucket_size; c->sb.nr_in_set = sb->nr_in_set; c->sb.last_mount = sb->last_mount; + c->sb.version = sb->version; + if (c->sb.version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES) { + c->sb.feature_compat = sb->feature_compat; + c->sb.feature_ro_compat = sb->feature_ro_compat; + c->sb.feature_incompat = sb->feature_incompat; + } + c->bucket_bits = ilog2(sb->bucket_size); c->block_bits = ilog2(sb->block_size); c->nr_uuids = bucket_bytes(c) / sizeof(struct uuid_entry); diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h index 9a1965c6c3d0..47df2db2e727 100644 --- a/include/uapi/linux/bcache.h +++ b/include/uapi/linux/bcache.h @@ -141,11 +141,13 @@ static inline struct bkey *bkey_idx(const struct bkey *k, unsigned int nr_keys) * Version 3: Cache device with new UUID format * Version 4: Backing device with data offset */ -#define BCACHE_SB_VERSION_CDEV 0 -#define BCACHE_SB_VERSION_BDEV 1 -#define BCACHE_SB_VERSION_CDEV_WITH_UUID 3 -#define BCACHE_SB_VERSION_BDEV_WITH_OFFSET 4 -#define BCACHE_SB_MAX_VERSION 4 +#define BCACHE_SB_VERSION_CDEV 0 +#define BCACHE_SB_VERSION_BDEV 1 +#define BCACHE_SB_VERSION_CDEV_WITH_UUID 3 +#define BCACHE_SB_VERSION_BDEV_WITH_OFFSET 4 +#define BCACHE_SB_VERSION_CDEV_WITH_FEATURES 5 +#define BCACHE_SB_VERSION_BDEV_WITH_FEATURES 6 +#define BCACHE_SB_MAX_VERSION 6 #define SB_SECTOR 8 #define SB_OFFSET (SB_SECTOR << SECTOR_SHIFT) @@ -173,7 +175,12 @@ struct cache_sb_disk { __le64 flags; __le64 seq; - __le64 pad[8]; + + __le64 feature_compat; + __le64 feature_incompat; + __le64 feature_ro_compat; + + __le64 pad[5]; union { struct { @@ -224,7 +231,12 @@ struct cache_sb { __u64 flags; __u64 seq; - __u64 pad[8]; + + __u64 feature_compat; + __u64 feature_incompat; + __u64 feature_ro_compat; + + __u64 pad[5]; union { struct { @@ -262,7 +274,8 @@ struct cache_sb { static inline _Bool SB_IS_BDEV(const struct cache_sb *sb) { return sb->version == BCACHE_SB_VERSION_BDEV - || sb->version == BCACHE_SB_VERSION_BDEV_WITH_OFFSET; + || sb->version == BCACHE_SB_VERSION_BDEV_WITH_OFFSET + || sb->version == BCACHE_SB_VERSION_BDEV_WITH_FEATURES; } BITMASK(CACHE_SYNC, struct cache_sb, flags, 0, 1); From patchwork Sat Jul 25 12:00:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684989 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD1AE913 for ; Sat, 25 Jul 2020 12:03:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C4F84206EB for ; Sat, 25 Jul 2020 12:03:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727032AbgGYMDP (ORCPT ); Sat, 25 Jul 2020 08:03:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:52442 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727023AbgGYMDP (ORCPT ); Sat, 25 Jul 2020 08:03:15 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 08845AF17; Sat, 25 Jul 2020 12:03:23 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 14/25] bcache: move bucket related code into read_super_common() Date: Sat, 25 Jul 2020 20:00:28 +0800 Message-Id: <20200725120039.91071-15-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Setting sb->first_bucket and checking sb->keys indeed are only for cache device, it does not make sense to do them in read_super() for backing device too. This patch moves the related code piece into read_super_common() explicitly for cache device and avoid the confusion. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/super.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 62534f44c6dc..214d50903375 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -66,12 +66,17 @@ static const char *read_super_common(struct cache_sb *sb, struct block_device * const char *err; unsigned int i; + sb->first_bucket= le16_to_cpu(s->first_bucket); sb->nbuckets = le64_to_cpu(s->nbuckets); sb->bucket_size = le16_to_cpu(s->bucket_size); sb->nr_in_set = le16_to_cpu(s->nr_in_set); sb->nr_this_dev = le16_to_cpu(s->nr_this_dev); + err = "Too many journal buckets"; + if (sb->keys > SB_JOURNAL_BUCKETS) + goto err; + err = "Too many buckets"; if (sb->nbuckets > LONG_MAX) goto err; @@ -155,7 +160,6 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, sb->flags = le64_to_cpu(s->flags); sb->seq = le64_to_cpu(s->seq); sb->last_mount = le32_to_cpu(s->last_mount); - sb->first_bucket = le16_to_cpu(s->first_bucket); sb->keys = le16_to_cpu(s->keys); for (i = 0; i < SB_JOURNAL_BUCKETS; i++) @@ -172,10 +176,6 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, if (memcmp(sb->magic, bcache_magic, 16)) goto err; - err = "Too many journal buckets"; - if (sb->keys > SB_JOURNAL_BUCKETS) - goto err; - err = "Bad checksum"; if (s->csum != csum_set(s)) goto err; From patchwork Sat Jul 25 12:00:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684991 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4A15138C for ; Sat, 25 Jul 2020 12:03:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BCE9E206F6 for ; Sat, 25 Jul 2020 12:03:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727034AbgGYMDS (ORCPT ); Sat, 25 Jul 2020 08:03:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:52462 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727023AbgGYMDR (ORCPT ); Sat, 25 Jul 2020 08:03:17 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 94CB2AEAF; Sat, 25 Jul 2020 12:03:25 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 15/25] bcache: struct cache_sb is only for in-memory super block now Date: Sat, 25 Jul 2020 20:00:29 +0800 Message-Id: <20200725120039.91071-16-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org We have struct cache_sb_disk for on-disk super block already, it is unnecessary to keep the in-memory super block format exactly mapping to the on-disk struct layout. This patch adds code comments to notice that struct cache_sb is not exactly mapping to cache_sb_disk, and removes the useless member csum and pad[5]. Although struct cache_sb does not belong to uapi, but there are still some on-disk format related macros reference it and it is unncessary to get rid of such dependency now. So struct cache_sb will continue to stay in include/uapi/linux/bache.h for now. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- include/uapi/linux/bcache.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h index 47df2db2e727..0ef984ea515a 100644 --- a/include/uapi/linux/bcache.h +++ b/include/uapi/linux/bcache.h @@ -215,8 +215,13 @@ struct cache_sb_disk { __le64 d[SB_JOURNAL_BUCKETS]; /* journal buckets */ }; +/* + * This is for in-memory bcache super block. + * NOTE: cache_sb is NOT exactly mapping to cache_sb_disk, the member + * size, ordering and even whole struct size may be different + * from cache_sb_disk. + */ struct cache_sb { - __u64 csum; __u64 offset; /* sector where this sb was written */ __u64 version; @@ -236,8 +241,6 @@ struct cache_sb { __u64 feature_incompat; __u64 feature_ro_compat; - __u64 pad[5]; - union { struct { /* Cache devices */ @@ -245,7 +248,6 @@ struct cache_sb { __u16 block_size; /* sectors */ __u16 bucket_size; /* sectors */ - __u16 nr_in_set; __u16 nr_this_dev; }; From patchwork Sat Jul 25 12:00:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684993 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5D85A138C for ; Sat, 25 Jul 2020 12:03:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4F051206F6 for ; Sat, 25 Jul 2020 12:03:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727040AbgGYMDV (ORCPT ); Sat, 25 Jul 2020 08:03:21 -0400 Received: from mx2.suse.de ([195.135.220.15]:52496 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727023AbgGYMDV (ORCPT ); Sat, 25 Jul 2020 08:03:21 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 282B9AEAF; Sat, 25 Jul 2020 12:03:28 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 16/25] bcache: introduce meta_bucket_pages() related helper routines Date: Sat, 25 Jul 2020 20:00:30 +0800 Message-Id: <20200725120039.91071-17-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently the in-memory meta data like c->uuids or c->disk_buckets are allocated by alloc_bucket_pages(). The macro alloc_bucket_pages() calls __get_free_pages() to allocated continuous pages with order indicated by ilog2(bucket_pages(c)), #define alloc_bucket_pages(gfp, c) \ ((void *) __get_free_pages(__GFP_ZERO|gfp, ilog2(bucket_pages(c)))) The maximum order is defined as MAX_ORDER, the default value is 11 (and can be overwritten by CONFIG_FORCE_MAX_ZONEORDER). In bcache code the maximum bucket size width is 16bits, this is restricted both by KEY_SIZE size and bucket_size size from struct cache_sb_disk. The maximum 16bits width and power-of-2 value is (1<<15) in unit of sector (512byte). It means the maximum value of bucket size in bytes is (1<<24) bytes a.k.a 4096 pages. When the bucket size is set to maximum permitted value, ilog2(4096) is 12, which exceeds the default maximum order __get_free_pages() can accepted, the failed pages allocation will fail cache set registration procedure and print a kernel oops message for the exceeded pages order. This patch introduces meta_bucket_pages(), meta_bucket_bytes(), and alloc_bucket_pages() helper routines. meta_bucket_pages() indicates the maximum pages can be allocated to meta data bucket, meta_bucket_bytes() indicates the according maximum bytes, and alloc_bucket_pages() does the pages allocation for meta bucket. Because meta_bucket_pages() chooses the smaller value among the bucket size and MAX_ORDER_NR_PAGES, it still works when MAX_ORDER overwritten by CONFIG_FORCE_MAX_ZONEORDER. Following patches will use these helper routines to decide maximum pages can be allocated for different meta data buckets. If the bucket size is larger than meta_bucket_bytes(), the bcache registration can continue to success, just the space more than meta_bucket_bytes() inside the bucket is wasted. Comparing bcache failed for large bucket size, wasting some space for meta data buckets is acceptable at this moment. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/bcache.h | 20 ++++++++++++++++++++ drivers/md/bcache/super.c | 3 +++ 2 files changed, 23 insertions(+) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 80e3c4813fb0..972f1aff0f70 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -762,6 +762,26 @@ struct bbio { #define bucket_bytes(c) ((c)->sb.bucket_size << 9) #define block_bytes(c) ((c)->sb.block_size << 9) +static inline unsigned int meta_bucket_pages(struct cache_sb *sb) +{ + unsigned int n, max_pages; + + max_pages = min_t(unsigned int, + __rounddown_pow_of_two(USHRT_MAX) / PAGE_SECTORS, + MAX_ORDER_NR_PAGES); + + n = sb->bucket_size / PAGE_SECTORS; + if (n > max_pages) + n = max_pages; + + return n; +} + +static inline unsigned int meta_bucket_bytes(struct cache_sb *sb) +{ + return meta_bucket_pages(sb) << PAGE_SHIFT; +} + #define prios_per_bucket(c) \ ((bucket_bytes(c) - sizeof(struct prio_set)) / \ sizeof(struct bucket_disk)) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 214d50903375..d0bdbbc3ff5c 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1821,6 +1821,9 @@ void bch_cache_set_unregister(struct cache_set *c) #define alloc_bucket_pages(gfp, c) \ ((void *) __get_free_pages(__GFP_ZERO|__GFP_COMP|gfp, ilog2(bucket_pages(c)))) +#define alloc_meta_bucket_pages(gfp, sb) \ + ((void *) __get_free_pages(__GFP_ZERO|__GFP_COMP|gfp, ilog2(meta_bucket_pages(sb)))) + struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) { int iter_size; From patchwork Sat Jul 25 12:00:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684995 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6EFC8913 for ; Sat, 25 Jul 2020 12:03:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 56934206EB for ; Sat, 25 Jul 2020 12:03:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727043AbgGYMDX (ORCPT ); Sat, 25 Jul 2020 08:03:23 -0400 Received: from mx2.suse.de ([195.135.220.15]:52510 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727023AbgGYMDX (ORCPT ); Sat, 25 Jul 2020 08:03:23 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A671FAEAF; Sat, 25 Jul 2020 12:03:30 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 17/25] bcache: handle c->uuids properly for bucket size > 8MB Date: Sat, 25 Jul 2020 20:00:31 +0800 Message-Id: <20200725120039.91071-18-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Bcache allocates a whole bucket to store c->uuids on cache device, and allocates continuous pages to store it in-memory. When the bucket size exceeds maximum allocable continuous pages, bch_cache_set_alloc() will fail and cache device registration will fail. This patch allocates c->uuids by alloc_meta_bucket_pages(), and uses ilog2(meta_bucket_pages(c)) to indicate order of c->uuids pages when free it. When writing c->uuids to cache device, its size is decided by meta_bucket_pages(c) * PAGE_SECTORS. Now c->uuids is properly handled for bucket size > 8MB. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/super.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index d0bdbbc3ff5c..a19f1baa8664 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -466,6 +466,7 @@ static int __uuid_write(struct cache_set *c) BKEY_PADDED(key) k; struct closure cl; struct cache *ca; + unsigned int size; closure_init_stack(&cl); lockdep_assert_held(&bch_register_lock); @@ -473,7 +474,8 @@ static int __uuid_write(struct cache_set *c) if (bch_bucket_alloc_set(c, RESERVE_BTREE, &k.key, 1, true)) return 1; - SET_KEY_SIZE(&k.key, c->sb.bucket_size); + size = meta_bucket_pages(&c->sb) * PAGE_SECTORS; + SET_KEY_SIZE(&k.key, size); uuid_io(c, REQ_OP_WRITE, 0, &k.key, &cl); closure_sync(&cl); @@ -1656,7 +1658,7 @@ static void cache_set_free(struct closure *cl) } bch_bset_sort_state_free(&c->sort); - free_pages((unsigned long) c->uuids, ilog2(bucket_pages(c))); + free_pages((unsigned long) c->uuids, ilog2(meta_bucket_pages(&c->sb))); if (c->moving_gc_wq) destroy_workqueue(c->moving_gc_wq); @@ -1862,7 +1864,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) c->bucket_bits = ilog2(sb->bucket_size); c->block_bits = ilog2(sb->block_size); - c->nr_uuids = bucket_bytes(c) / sizeof(struct uuid_entry); + c->nr_uuids = meta_bucket_bytes(&c->sb) / sizeof(struct uuid_entry); c->devices_max_used = 0; atomic_set(&c->attached_dev_nr, 0); c->btree_pages = bucket_pages(c); @@ -1913,7 +1915,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) BIOSET_NEED_BVECS|BIOSET_NEED_RESCUER)) goto err; - c->uuids = alloc_bucket_pages(GFP_KERNEL, c); + c->uuids = alloc_meta_bucket_pages(GFP_KERNEL, &c->sb); if (!c->uuids) goto err; From patchwork Sat Jul 25 12:00:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684997 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 394C6913 for ; Sat, 25 Jul 2020 12:03:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2B347206EB for ; Sat, 25 Jul 2020 12:03:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727045AbgGYMD0 (ORCPT ); Sat, 25 Jul 2020 08:03:26 -0400 Received: from mx2.suse.de ([195.135.220.15]:52538 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727023AbgGYMD0 (ORCPT ); Sat, 25 Jul 2020 08:03:26 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 46885AB55; Sat, 25 Jul 2020 12:03:33 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 18/25] bcache: handle cache prio_buckets and disk_buckets properly for bucket size > 8MB Date: Sat, 25 Jul 2020 20:00:32 +0800 Message-Id: <20200725120039.91071-19-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Similar to c->uuids, struct cache's prio_buckets and disk_buckets also have the potential memory allocation failure during cache registration if the bucket size > 8MB. ca->prio_buckets can be stored on cache device in multiple buckets, its in-memory space is allocated by kzalloc() interface but normally allocated by alloc_pages() because the size > KMALLOC_MAX_CACHE_SIZE. So allocation of ca->prio_buckets has the MAX_ORDER restriction too. If the bucket size > 8MB, by default the page allocator will fail because the page order > 11 (default MAX_ORDER value). ca->prio_buckets should also use meta_bucket_bytes(), meta_bucket_pages() to decide its memory size and use alloc_meta_bucket_pages() to allocate pages, to avoid the allocation failure during cache set registration when bucket size > 8MB. ca->disk_buckets is a single bucket size memory buffer, it is used to iterate each bucket of ca->prio_buckets, and compose the bio based on memory of ca->disk_buckets, then write ca->disk_buckets memory to cache disk one-by-one for each bucket of ca->prio_buckets. ca->disk_buckets should have in-memory size exact to the meta_bucket_pages(), this is the size that ca->prio_buckets will be stored into each on-disk bucket. This patch fixes the above issues and handle cache's prio_buckets and disk_buckets properly for bucket size larger than 8MB. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/bcache.h | 9 +++++---- drivers/md/bcache/super.c | 10 +++++----- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 972f1aff0f70..0ebfda284866 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -782,11 +782,12 @@ static inline unsigned int meta_bucket_bytes(struct cache_sb *sb) return meta_bucket_pages(sb) << PAGE_SHIFT; } -#define prios_per_bucket(c) \ - ((bucket_bytes(c) - sizeof(struct prio_set)) / \ +#define prios_per_bucket(ca) \ + ((meta_bucket_bytes(&(ca)->sb) - sizeof(struct prio_set)) / \ sizeof(struct bucket_disk)) -#define prio_buckets(c) \ - DIV_ROUND_UP((size_t) (c)->sb.nbuckets, prios_per_bucket(c)) + +#define prio_buckets(ca) \ + DIV_ROUND_UP((size_t) (ca)->sb.nbuckets, prios_per_bucket(ca)) static inline size_t sector_to_bucket(struct cache_set *c, sector_t s) { diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index a19f1baa8664..1c4a4c3557f7 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -563,7 +563,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, int op, bio->bi_iter.bi_sector = bucket * ca->sb.bucket_size; bio_set_dev(bio, ca->bdev); - bio->bi_iter.bi_size = bucket_bytes(ca); + bio->bi_iter.bi_size = meta_bucket_bytes(&ca->sb); bio->bi_end_io = prio_endio; bio->bi_private = ca; @@ -621,7 +621,7 @@ int bch_prio_write(struct cache *ca, bool wait) p->next_bucket = ca->prio_buckets[i + 1]; p->magic = pset_magic(&ca->sb); - p->csum = bch_crc64(&p->magic, bucket_bytes(ca) - 8); + p->csum = bch_crc64(&p->magic, meta_bucket_bytes(&ca->sb) - 8); bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait); BUG_ON(bucket == -1); @@ -674,7 +674,7 @@ static int prio_read(struct cache *ca, uint64_t bucket) prio_io(ca, bucket, REQ_OP_READ, 0); if (p->csum != - bch_crc64(&p->magic, bucket_bytes(ca) - 8)) { + bch_crc64(&p->magic, meta_bucket_bytes(&ca->sb) - 8)) { pr_warn("bad csum reading priorities\n"); goto out; } @@ -2222,7 +2222,7 @@ void bch_cache_release(struct kobject *kobj) ca->set->cache[ca->sb.nr_this_dev] = NULL; } - free_pages((unsigned long) ca->disk_buckets, ilog2(bucket_pages(ca))); + free_pages((unsigned long) ca->disk_buckets, ilog2(meta_bucket_pages(&ca->sb))); kfree(ca->prio_buckets); vfree(ca->buckets); @@ -2319,7 +2319,7 @@ static int cache_alloc(struct cache *ca) goto err_prio_buckets_alloc; } - ca->disk_buckets = alloc_bucket_pages(GFP_KERNEL, ca); + ca->disk_buckets = alloc_meta_bucket_pages(GFP_KERNEL, &ca->sb); if (!ca->disk_buckets) { err = "ca->disk_buckets alloc failed"; goto err_disk_buckets_alloc; From patchwork Sat Jul 25 12:00:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11684999 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AD85B138C for ; Sat, 25 Jul 2020 12:03:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 96774206F6 for ; Sat, 25 Jul 2020 12:03:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727051AbgGYMD2 (ORCPT ); Sat, 25 Jul 2020 08:03:28 -0400 Received: from mx2.suse.de ([195.135.220.15]:52562 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727023AbgGYMD2 (ORCPT ); Sat, 25 Jul 2020 08:03:28 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id C9627AB55; Sat, 25 Jul 2020 12:03:35 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 19/25] bcache: handle cache set verify_ondisk properly for bucket size > 8MB Date: Sat, 25 Jul 2020 20:00:33 +0800 Message-Id: <20200725120039.91071-20-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In bch_btree_cache_alloc() when CONFIG_BCACHE_DEBUG is configured, allocate memory for c->verify_ondisk may fail if the bucket size > 8MB, which will require __get_free_pages() to allocate continuous pages with order > 11 (the default MAX_ORDER of Linux buddy allocator). Such over size allocation will fail, and cause 2 problems, - When CONFIG_BCACHE_DEBUG is configured, bch_btree_verify() does not work, because c->verify_ondisk is NULL and bch_btree_verify() returns immediately. - bch_btree_cache_alloc() will fail due to c->verify_ondisk allocation failed, then the whole cache device registration fails. And because of this failure, the first problem of bch_btree_verify() has no chance to be triggered. This patch fixes the above problem by two means, 1) If pages allocation of c->verify_ondisk fails, set it to NULL and returns bch_btree_cache_alloc() with -ENOMEM. 2) When calling __get_free_pages() to allocate c->verify_ondisk pages, use ilog2(meta_bucket_pages(&c->sb)) to make sure ilog2() will always generate a pages order <= MAX_ORDER (or CONFIG_FORCE_MAX_ZONEORDER). Then the buddy system won't directly reject the allocation request. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/btree.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index dd116c83de80..79716ac9fb5d 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -738,7 +738,7 @@ void bch_btree_cache_free(struct cache_set *c) if (c->verify_data) list_move(&c->verify_data->list, &c->btree_cache); - free_pages((unsigned long) c->verify_ondisk, ilog2(bucket_pages(c))); + free_pages((unsigned long) c->verify_ondisk, ilog2(meta_bucket_pages(&c->sb))); #endif list_splice(&c->btree_cache_freeable, @@ -785,7 +785,15 @@ int bch_btree_cache_alloc(struct cache_set *c) mutex_init(&c->verify_lock); c->verify_ondisk = (void *) - __get_free_pages(GFP_KERNEL|__GFP_COMP, ilog2(bucket_pages(c))); + __get_free_pages(GFP_KERNEL|__GFP_COMP, ilog2(meta_bucket_pages(&c->sb))); + if (!c->verify_ondisk) { + /* + * Don't worry about the mca_rereserve buckets + * allocated in previous for-loop, they will be + * handled properly in bch_cache_set_unregister(). + */ + return -ENOMEM; + } c->verify_data = mca_bucket_alloc(c, &ZERO_KEY, GFP_KERNEL); From patchwork Sat Jul 25 12:00:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11685001 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2028E913 for ; Sat, 25 Jul 2020 12:03:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1186A206F6 for ; Sat, 25 Jul 2020 12:03:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727079AbgGYMDc (ORCPT ); Sat, 25 Jul 2020 08:03:32 -0400 Received: from mx2.suse.de ([195.135.220.15]:52594 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727023AbgGYMDb (ORCPT ); Sat, 25 Jul 2020 08:03:31 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6A534AB55; Sat, 25 Jul 2020 12:03:38 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 20/25] bcache: handle btree node memory allocation properly for bucket size > 8MB Date: Sat, 25 Jul 2020 20:00:34 +0800 Message-Id: <20200725120039.91071-21-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently the bcache internal btree node occupies a whole bucket. When loading the btree node from cache device into memory, mca_data_alloc() will call bch_btree_keys_alloc() to allocate memory for the whole bucket size, ilog2(b->c->btree_pages) is send to bch_btree_keys_alloc() as the parameter 'page_order'. c->btree_pages is set as bucket_pages() in bch_cache_set_alloc(), for bucket size > 8MB, ilog2(b->c->btree_pages) is 12 for 4KB page size. By default the maximum page order __get_free_pages() accepts is MAX_ORDER (11), in this condition bch_btree_keys_alloc() will always fail. Because of other over-page-order allocation failure fails the cache device registration, such btree node allocation failure wasn't observed during runtime. After other blocking page allocation failures for bucket size > 8MB, this btree node allocation issue may trigger potentical risk e.g. infinite dead-loop to retry btree node allocation after failure. This patch fixes the potential problem by setting c->btree_pages to meta_bucket_pages() in bch_cache_set_alloc(). In the condition that bucket size > 8MB, meta_bucket_pages() will always return a number which won't exceed the maximum page order of the buddy allocator. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 1c4a4c3557f7..172246ff5ec2 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1867,7 +1867,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) c->nr_uuids = meta_bucket_bytes(&c->sb) / sizeof(struct uuid_entry); c->devices_max_used = 0; atomic_set(&c->attached_dev_nr, 0); - c->btree_pages = bucket_pages(c); + c->btree_pages = meta_bucket_pages(&c->sb); if (c->btree_pages > BTREE_MAX_PAGES) c->btree_pages = max_t(int, c->btree_pages / 4, BTREE_MAX_PAGES); From patchwork Sat Jul 25 12:00:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11685003 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4950138C for ; Sat, 25 Jul 2020 12:03:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B22D420714 for ; Sat, 25 Jul 2020 12:03:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727087AbgGYMDf (ORCPT ); Sat, 25 Jul 2020 08:03:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:52616 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727048AbgGYMDf (ORCPT ); Sat, 25 Jul 2020 08:03:35 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id B056DAEAF; Sat, 25 Jul 2020 12:03:40 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li Subject: [PATCH 21/25] bcache: add bucket_size_hi into struct cache_sb_disk for large bucket Date: Sat, 25 Jul 2020 20:00:35 +0800 Message-Id: <20200725120039.91071-22-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The large bucket feature is to extend bucket_size from 16bit to 32bit. When create cache device on zoned device (e.g. zoned NVMe SSD), making a single bucket cover one or more zones of the zoned device is the simplest way to support zoned device as cache by bcache. But current maximum bucket size is 16MB and a typical zone size of zoned device is 256MB, this is the major motiviation to extend bucket size to a larger bit width. This patch is the basic and first change to support large bucket size, the major changes it makes are, - Add BCH_FEATURE_INCOMPAT_LARGE_BUCKET for the large bucket feature, INCOMPAT means it introduces incompatible on-disk format change. - Add BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LARGE_BUCKET) routines. - Adds __le16 bucket_size_hi into struct cache_sb_disk at offset 0x8d0 for the on-disk super block format. - For the in-memory super block struct cache_sb, member bucket_size is extended from __u16 to __32. - Add get_bucket_size() to combine the bucket_size and bucket_size_hi from struct cache_sb_disk into an unsigned int value. Since we already have large bucket size helpers meta_bucket_pages(), meta_bucket_bytes() and alloc_meta_bucket_pages(), they make sure when bucket size > 8MB, the memory allocation for bcache meta data bucket won't fail no matter how large the bucket size extended. So these meta data buckets are handled properly when the bucket size width increase from 16bit to 32bit, we don't need to worry about them. Signed-off-by: Coly Li --- drivers/md/bcache/alloc.c | 2 +- drivers/md/bcache/features.c | 22 ++++++++++++++++++++++ drivers/md/bcache/features.h | 9 ++++++--- drivers/md/bcache/movinggc.c | 4 ++-- drivers/md/bcache/super.c | 23 +++++++++++++++++++---- include/uapi/linux/bcache.h | 3 ++- 6 files changed, 52 insertions(+), 11 deletions(-) create mode 100644 drivers/md/bcache/features.c diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c index a1df0d95151c..52035a78d836 100644 --- a/drivers/md/bcache/alloc.c +++ b/drivers/md/bcache/alloc.c @@ -87,7 +87,7 @@ void bch_rescale_priorities(struct cache_set *c, int sectors) { struct cache *ca; struct bucket *b; - unsigned int next = c->nbuckets * c->sb.bucket_size / 1024; + unsigned long next = c->nbuckets * c->sb.bucket_size / 1024; unsigned int i; int r; diff --git a/drivers/md/bcache/features.c b/drivers/md/bcache/features.c new file mode 100644 index 000000000000..ba53944bb390 --- /dev/null +++ b/drivers/md/bcache/features.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Feature set bits and string conversion. + * Inspired by ext4's features compat/incompat/ro_compat related code. + * + * Copyright 2020 Coly Li + * + */ +#include +#include "bcache.h" + +struct feature { + int compat; + unsigned int mask; + const char *string; +}; + +static struct feature feature_list[] = { + {BCH_FEATURE_INCOMPAT, BCH_FEATURE_INCOMPAT_LARGE_BUCKET, + "large_bucket"}, + {0, 0, 0 }, +}; diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h index ae7df37b9862..dca052cf5203 100644 --- a/drivers/md/bcache/features.h +++ b/drivers/md/bcache/features.h @@ -11,9 +11,13 @@ #define BCH_FEATURE_INCOMPAT 2 #define BCH_FEATURE_TYPE_MASK 0x03 +/* Feature set definition */ +/* Incompat feature set */ +#define BCH_FEATURE_INCOMPAT_LARGE_BUCKET 0x0001 /* 32bit bucket size */ + #define BCH_FEATURE_COMPAT_SUUP 0 #define BCH_FEATURE_RO_COMPAT_SUUP 0 -#define BCH_FEATURE_INCOMPAT_SUUP 0 +#define BCH_FEATURE_INCOMPAT_SUUP BCH_FEATURE_INCOMPAT_LARGE_BUCKET #define BCH_HAS_COMPAT_FEATURE(sb, mask) \ ((sb)->feature_compat & (mask)) @@ -22,8 +26,6 @@ #define BCH_HAS_INCOMPAT_FEATURE(sb, mask) \ ((sb)->feature_incompat & (mask)) -/* Feature set definition */ - #define BCH_FEATURE_COMPAT_FUNCS(name, flagname) \ static inline int bch_has_feature_##name(struct cache_sb *sb) \ { \ @@ -75,4 +77,5 @@ static inline void bch_clear_feature_##name(struct cache_sb *sb) \ ~BCH##_FEATURE_INCOMPAT_##flagname; \ } +BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LARGE_BUCKET); #endif diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c index b7dd2d75f58c..5872d6470470 100644 --- a/drivers/md/bcache/movinggc.c +++ b/drivers/md/bcache/movinggc.c @@ -206,8 +206,8 @@ void bch_moving_gc(struct cache_set *c) mutex_lock(&c->bucket_lock); for_each_cache(ca, c, i) { - unsigned int sectors_to_move = 0; - unsigned int reserve_sectors = ca->sb.bucket_size * + unsigned long sectors_to_move = 0; + unsigned long reserve_sectors = ca->sb.bucket_size * fifo_used(&ca->free[RESERVE_MOVINGGC]); ca->heap.used = 0; diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 172246ff5ec2..b7ef11da026c 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -60,6 +60,17 @@ struct workqueue_struct *bch_journal_wq; /* Superblock */ +static unsigned int get_bucket_size(struct cache_sb *sb, struct cache_sb_disk *s) +{ + unsigned int bucket_size = le16_to_cpu(s->bucket_size); + + if (sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES && + bch_has_feature_large_bucket(sb)) + bucket_size |= le16_to_cpu(s->bucket_size_hi) << 16; + + return bucket_size; +} + static const char *read_super_common(struct cache_sb *sb, struct block_device *bdev, struct cache_sb_disk *s) { @@ -68,7 +79,7 @@ static const char *read_super_common(struct cache_sb *sb, struct block_device * sb->first_bucket= le16_to_cpu(s->first_bucket); sb->nbuckets = le64_to_cpu(s->nbuckets); - sb->bucket_size = le16_to_cpu(s->bucket_size); + sb->bucket_size = get_bucket_size(sb, s); sb->nr_in_set = le16_to_cpu(s->nr_in_set); sb->nr_this_dev = le16_to_cpu(s->nr_this_dev); @@ -210,12 +221,16 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, goto err; break; case BCACHE_SB_VERSION_CDEV_WITH_FEATURES: - err = read_super_common(sb, bdev, s); - if (err) - goto err; + /* + * Feature bits are needed in read_super_common(), + * convert them firstly. + */ sb->feature_compat = le64_to_cpu(s->feature_compat); sb->feature_incompat = le64_to_cpu(s->feature_incompat); sb->feature_ro_compat = le64_to_cpu(s->feature_ro_compat); + err = read_super_common(sb, bdev, s); + if (err) + goto err; break; default: err = "Unsupported superblock version"; diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h index 0ef984ea515a..52e8bcb33981 100644 --- a/include/uapi/linux/bcache.h +++ b/include/uapi/linux/bcache.h @@ -213,6 +213,7 @@ struct cache_sb_disk { __le16 keys; }; __le64 d[SB_JOURNAL_BUCKETS]; /* journal buckets */ + __le16 bucket_size_hi; }; /* @@ -247,9 +248,9 @@ struct cache_sb { __u64 nbuckets; /* device size */ __u16 block_size; /* sectors */ - __u16 bucket_size; /* sectors */ __u16 nr_in_set; __u16 nr_this_dev; + __u32 bucket_size; /* sectors */ }; struct { /* Backing devices */ From patchwork Sat Jul 25 12:00:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11685005 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C6A74138C for ; Sat, 25 Jul 2020 12:03:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B8A462070C for ; Sat, 25 Jul 2020 12:03:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727089AbgGYMDg (ORCPT ); Sat, 25 Jul 2020 08:03:36 -0400 Received: from mx2.suse.de ([195.135.220.15]:52624 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727086AbgGYMDg (ORCPT ); Sat, 25 Jul 2020 08:03:36 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 0A4C1AB55; Sat, 25 Jul 2020 12:03:43 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li Subject: [PATCH 22/25] bcache: add sysfs file to display feature sets information of cache set Date: Sat, 25 Jul 2020 20:00:36 +0800 Message-Id: <20200725120039.91071-23-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The following three sysfs files are created to display according feature set information of bcache: /sys/fs/bcache//internal/feature_compat /sys/fs/bcache//internal/feature_ro_compat /sys/fs/bcache//internal/feature_incompat is added by this patch, to display feature sets information of the cache set. Now only an incompat feature 'large_bucket' added in bcache, the sysfs file content is: [large_bucket] string large_bucket means the running bcache drive supports incompat feature 'large_bucket', the wrapping [] means the 'large_bucket' feature is currently enabled on this cache set. This patch is ready to display compat and ro_compat features, in future once bcache code implements such feature sets, the according feature strings will be displayed in their sysfs files too. Signed-off-by: Coly Li --- drivers/md/bcache/Makefile | 2 +- drivers/md/bcache/features.c | 53 ++++++++++++++++++++++++++++++++++++ drivers/md/bcache/features.h | 5 ++++ drivers/md/bcache/sysfs.c | 14 ++++++++++ 4 files changed, 73 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile index fd714628da6a..5b87e59676b8 100644 --- a/drivers/md/bcache/Makefile +++ b/drivers/md/bcache/Makefile @@ -4,4 +4,4 @@ obj-$(CONFIG_BCACHE) += bcache.o bcache-y := alloc.o bset.o btree.o closure.o debug.o extents.o\ io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\ - util.o writeback.o + util.o writeback.o features.o diff --git a/drivers/md/bcache/features.c b/drivers/md/bcache/features.c index ba53944bb390..4442df48d28c 100644 --- a/drivers/md/bcache/features.c +++ b/drivers/md/bcache/features.c @@ -8,6 +8,7 @@ */ #include #include "bcache.h" +#include "features.h" struct feature { int compat; @@ -20,3 +21,55 @@ static struct feature feature_list[] = { "large_bucket"}, {0, 0, 0 }, }; + +#define compose_feature_string(type) \ +({ \ + struct feature *f; \ + bool first = true; \ + \ + for (f = &feature_list[0]; f->compat != 0; f++) { \ + if (f->compat != BCH_FEATURE_ ## type) \ + continue; \ + if (BCH_HAS_ ## type ## _FEATURE(&c->sb, f->mask)) { \ + if (first) { \ + out += snprintf(out, buf + size - out, \ + "["); \ + } else { \ + out += snprintf(out, buf + size - out, \ + " ["); \ + } \ + } else if (!first) { \ + out += snprintf(out, buf + size - out, " "); \ + } \ + \ + out += snprintf(out, buf + size - out, "%s", f->string);\ + \ + if (BCH_HAS_ ## type ## _FEATURE(&c->sb, f->mask)) \ + out += snprintf(out, buf + size - out, "]"); \ + \ + first = false; \ + } \ + if (!first) \ + out += snprintf(out, buf + size - out, "\n"); \ +}) + +int bch_print_cache_set_feature_compat(struct cache_set *c, char *buf, int size) +{ + char *out = buf; + compose_feature_string(COMPAT); + return out - buf; +} + +int bch_print_cache_set_feature_ro_compat(struct cache_set *c, char *buf, int size) +{ + char *out = buf; + compose_feature_string(RO_COMPAT); + return out - buf; +} + +int bch_print_cache_set_feature_incompat(struct cache_set *c, char *buf, int size) +{ + char *out = buf; + compose_feature_string(INCOMPAT); + return out - buf; +} diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h index dca052cf5203..a1653c478041 100644 --- a/drivers/md/bcache/features.h +++ b/drivers/md/bcache/features.h @@ -78,4 +78,9 @@ static inline void bch_clear_feature_##name(struct cache_sb *sb) \ } BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LARGE_BUCKET); + +int bch_print_cache_set_feature_compat(struct cache_set *c, char *buf, int size); +int bch_print_cache_set_feature_ro_compat(struct cache_set *c, char *buf, int size); +int bch_print_cache_set_feature_incompat(struct cache_set *c, char *buf, int size); + #endif diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index 0dadec5a78f6..ac06c0bc3c0a 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -11,6 +11,7 @@ #include "btree.h" #include "request.h" #include "writeback.h" +#include "features.h" #include #include @@ -88,6 +89,9 @@ read_attribute(btree_used_percent); read_attribute(average_key_size); read_attribute(dirty_data); read_attribute(bset_tree_stats); +read_attribute(feature_compat); +read_attribute(feature_ro_compat); +read_attribute(feature_incompat); read_attribute(state); read_attribute(cache_read_races); @@ -779,6 +783,13 @@ SHOW(__bch_cache_set) if (attr == &sysfs_bset_tree_stats) return bch_bset_print_stats(c, buf); + if (attr == &sysfs_feature_compat) + return bch_print_cache_set_feature_compat(c, buf, PAGE_SIZE); + if (attr == &sysfs_feature_ro_compat) + return bch_print_cache_set_feature_ro_compat(c, buf, PAGE_SIZE); + if (attr == &sysfs_feature_incompat) + return bch_print_cache_set_feature_incompat(c, buf, PAGE_SIZE); + return 0; } SHOW_LOCKED(bch_cache_set) @@ -987,6 +998,9 @@ static struct attribute *bch_cache_set_internal_files[] = { &sysfs_io_disable, &sysfs_cutoff_writeback, &sysfs_cutoff_writeback_sync, + &sysfs_feature_compat, + &sysfs_feature_ro_compat, + &sysfs_feature_incompat, NULL }; KTYPE(bch_cache_set_internal); From patchwork Sat Jul 25 12:00:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11685007 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31978913 for ; Sat, 25 Jul 2020 12:03:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2368F2070C for ; Sat, 25 Jul 2020 12:03:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727091AbgGYMDi (ORCPT ); Sat, 25 Jul 2020 08:03:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:52650 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727086AbgGYMDh (ORCPT ); Sat, 25 Jul 2020 08:03:37 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 88B2AAB55; Sat, 25 Jul 2020 12:03:45 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 23/25] bcache: avoid extra memory allocation from mempool c->fill_iter Date: Sat, 25 Jul 2020 20:00:37 +0800 Message-Id: <20200725120039.91071-24-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Mempool c->fill_iter is used to allocate memory for struct btree_iter in bch_btree_node_read_done() to iterate all keys of a read-in btree node. The allocation size is defined in bch_cache_set_alloc() by, mempool_init_kmalloc_pool(&c->fill_iter, 1, iter_size)) where iter_size is defined by a calculation, (sb->bucket_size / sb->block_size + 1) * sizeof(struct btree_iter_set) For 16bit width bucket_size the calculation is OK, but now the bucket size is extended to 32bit, the bucket size can be 2GB. By the above calculation, iter_size can be 2048 pages (order 11 is still accepted by buddy allocator). But the actual size holds the bkeys in meta data bucket is limited to meta_bucket_pages() already, which is 16MB. By the above calculation, if replace sb->bucket_size by meta_bucket_pages() * PAGE_SECTORS, the result is 16 pages. This is the size large enough for the mempool allocation to struct btree_iter. Therefore in worst case every time mempool c->fill_iter allocates, at most 4080 pages are wasted and won't be used. Therefore this patch uses meta_bucket_pages() * PAGE_SECTORS to calculate the iter size in bch_cache_set_alloc(), to avoid extra memory allocation from mempool c->fill_iter. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index b7ef11da026c..d86b31722b41 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1908,7 +1908,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) INIT_LIST_HEAD(&c->btree_cache_freed); INIT_LIST_HEAD(&c->data_buckets); - iter_size = (sb->bucket_size / sb->block_size + 1) * + iter_size = ((meta_bucket_pages(sb) * PAGE_SECTORS) / sb->block_size + 1) * sizeof(struct btree_iter_set); c->devices = kcalloc(c->nr_uuids, sizeof(void *), GFP_KERNEL); From patchwork Sat Jul 25 12:00:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11685009 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C28C5913 for ; Sat, 25 Jul 2020 12:03:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B4D4F206EB for ; Sat, 25 Jul 2020 12:03:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727095AbgGYMDm (ORCPT ); Sat, 25 Jul 2020 08:03:42 -0400 Received: from mx2.suse.de ([195.135.220.15]:52674 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727086AbgGYMDl (ORCPT ); Sat, 25 Jul 2020 08:03:41 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 2E05CAB55; Sat, 25 Jul 2020 12:03:48 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Hannes Reinecke Subject: [PATCH 24/25] bcache: avoid extra memory consumption in struct bbio for large bucket size Date: Sat, 25 Jul 2020 20:00:38 +0800 Message-Id: <20200725120039.91071-25-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Bcache uses struct bbio to do I/Os for meta data pages like uuids, disk_buckets, prio_buckets, and btree nodes. Example writing a btree node onto cache device, the process is, - Allocate a struct bbio from mempool c->bio_meta. - Inside struct bbio embedded a struct bio, initialize bi_inline_vecs for this embedded bio. - Call bch_bio_map() to map each meta data page to each bv from the inlined bi_io_vec table. - Call bch_submit_bbio() to submit the bio into underlying block layer. - When the I/O completed, only release the struct bbio, don't touch the reference counter of the meta data pages. The struct bbio is defined as, 738 struct bbio { 739 unsigned int submit_time_us; [snipped] 748 struct bio bio; 749 }; Because struct bio is embedded at the end of struct bbio, therefore the actual size of struct bbio is sizeof(struct bio) + size of the embedded bio->bi_inline_vecs. Now all the meta data bucket size are limited to meta_bucket_pages(), if the bucket size is large than meta_bucket_pages()*PAGE_SECTORS, rested space in the bucket is unused. Therefore the most used space in meta bucket is (1<bio_meta as, mempool_init_kmalloc_pool(&c->bio_meta, 2, sizeof(struct bbio) + sizeof(struct bio_vec) * bucket_pages(c)) It is too large, neither the Linux buddy allocator cannot allocate so much continuous pages, nor the extra allocated pages are wasted. This patch replace bucket_pages() to meta_bucket_pages() in two places, - In bch_cache_set_alloc(), when initialize mempool c->bio_meta, uses sizeof(struct bbio) + sizeof(struct bio_vec) * bucket_pages(c) to set the allocating object size. - In bch_bbio_alloc(), when calling bio_init() to set inline bvec talbe bi_inline_bvecs, uses meta_bucket_pages() to indicate number of the inline bio vencs number. Now the maximum size of embedded bio inside struct bbio exactly matches the limit of meta_bucket_pages(), no extra page wasted. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke --- drivers/md/bcache/io.c | 2 +- drivers/md/bcache/super.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c index b25ee33b0d0b..a14a445618b4 100644 --- a/drivers/md/bcache/io.c +++ b/drivers/md/bcache/io.c @@ -26,7 +26,7 @@ struct bio *bch_bbio_alloc(struct cache_set *c) struct bbio *b = mempool_alloc(&c->bio_meta, GFP_NOIO); struct bio *bio = &b->bio; - bio_init(bio, bio->bi_inline_vecs, bucket_pages(c)); + bio_init(bio, bio->bi_inline_vecs, meta_bucket_pages(&c->sb)); return bio; } diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index d86b31722b41..5eba0b930e27 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1920,7 +1920,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) if (mempool_init_kmalloc_pool(&c->bio_meta, 2, sizeof(struct bbio) + - sizeof(struct bio_vec) * bucket_pages(c))) + sizeof(struct bio_vec) * meta_bucket_pages(&c->sb))) goto err; if (mempool_init_kmalloc_pool(&c->fill_iter, 1, iter_size)) From patchwork Sat Jul 25 12:00:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11685011 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B4B0138C for ; Sat, 25 Jul 2020 12:03:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 79046206EB for ; Sat, 25 Jul 2020 12:03:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727097AbgGYMDq (ORCPT ); Sat, 25 Jul 2020 08:03:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:52864 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727085AbgGYMDq (ORCPT ); Sat, 25 Jul 2020 08:03:46 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 96484AB55; Sat, 25 Jul 2020 12:03:53 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Coly Li , Christoph Hellwig , stable@vger.kernel.org Subject: [PATCH 25/25] bcache: fix bio_{start,end}_io_acct with proper device Date: Sat, 25 Jul 2020 20:00:39 +0800 Message-Id: <20200725120039.91071-26-colyli@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200725120039.91071-1-colyli@suse.de> References: <20200725120039.91071-1-colyli@suse.de> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Commit 85750aeb748f ("bcache: use bio_{start,end}_io_acct") moves the io account code to the location after bio_set_dev(bio, dc->bdev) in cached_dev_make_request(). Then the account is performed incorrectly on backing device, indeed the I/O should be counted to bcache device like /dev/bcache0. With the mistaken I/O account, iostat does not display I/O counts for bcache device and all the numbers go to backing device. In writeback mode, the hard drive may have 340K+ IOPS which is impossible and wrong for spinning disk. This patch introduces bch_bio_start_io_acct() and bch_bio_end_io_acct(), which switches bio->bi_disk to bcache device before calling bio_start_io_acct() or bio_end_io_acct(). Now the I/Os are counted to bcache device, and bcache device, cache device and backing device have their correct I/O count information back. Fixes: 85750aeb748f ("bcache: use bio_{start,end}_io_acct") Signed-off-by: Coly Li Cc: Christoph Hellwig Cc: stable@vger.kernel.org --- drivers/md/bcache/request.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 7acf024e99f3..8ea0f079c1d0 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -617,6 +617,28 @@ static void cache_lookup(struct closure *cl) /* Common code for the make_request functions */ +static inline void bch_bio_start_io_acct(struct gendisk *acct_bi_disk, + struct bio *bio, + unsigned long *start_time) +{ + struct gendisk *saved_bi_disk = bio->bi_disk; + + bio->bi_disk = acct_bi_disk; + *start_time = bio_start_io_acct(bio); + bio->bi_disk = saved_bi_disk; +} + +static inline void bch_bio_end_io_acct(struct gendisk *acct_bi_disk, + struct bio *bio, + unsigned long start_time) +{ + struct gendisk *saved_bi_disk = bio->bi_disk; + + bio->bi_disk = acct_bi_disk; + bio_end_io_acct(bio, start_time); + bio->bi_disk = saved_bi_disk; +} + static void request_endio(struct bio *bio) { struct closure *cl = bio->bi_private; @@ -668,7 +690,7 @@ static void backing_request_endio(struct bio *bio) static void bio_complete(struct search *s) { if (s->orig_bio) { - bio_end_io_acct(s->orig_bio, s->start_time); + bch_bio_end_io_acct(s->d->disk, s->orig_bio, s->start_time); trace_bcache_request_end(s->d, s->orig_bio); s->orig_bio->bi_status = s->iop.status; bio_endio(s->orig_bio); @@ -728,7 +750,7 @@ static inline struct search *search_alloc(struct bio *bio, s->recoverable = 1; s->write = op_is_write(bio_op(bio)); s->read_dirty_data = 0; - s->start_time = bio_start_io_acct(bio); + bch_bio_start_io_acct(d->disk, bio, &s->start_time); s->iop.c = d->c; s->iop.bio = NULL; @@ -1080,7 +1102,7 @@ static void detached_dev_end_io(struct bio *bio) bio->bi_end_io = ddip->bi_end_io; bio->bi_private = ddip->bi_private; - bio_end_io_acct(bio, ddip->start_time); + bch_bio_end_io_acct(ddip->d->disk, bio, ddip->start_time); if (bio->bi_status) { struct cached_dev *dc = container_of(ddip->d, @@ -1105,7 +1127,8 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio) */ ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO); ddip->d = d; - ddip->start_time = bio_start_io_acct(bio); + bch_bio_start_io_acct(d->disk, bio, &ddip->start_time); + ddip->bi_end_io = bio->bi_end_io; ddip->bi_private = bio->bi_private; bio->bi_end_io = detached_dev_end_io;