From patchwork Thu Jan 23 17:01:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348749 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D8F492A for ; Thu, 23 Jan 2020 17:02:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 05F912071E for ; Thu, 23 Jan 2020 17:02:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729907AbgAWRCD (ORCPT ); Thu, 23 Jan 2020 12:02:03 -0500 Received: from mx2.suse.de ([195.135.220.15]:51212 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728899AbgAWRCD (ORCPT ); Thu, 23 Jan 2020 12:02:03 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id A1556B1D5; Thu, 23 Jan 2020 17:02:01 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Liang Chen , Christoph Hellwig , Coly Li Subject: [PATCH 01/17] bcache: cached_dev_free needs to put the sb page Date: Fri, 24 Jan 2020 01:01:26 +0800 Message-Id: <20200123170142.98974-2-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Liang Chen Same as cache device, the buffer page needs to be put while freeing cached_dev. Otherwise a page would be leaked every time a cached_dev is stopped. Signed-off-by: Liang Chen Signed-off-by: Christoph Hellwig Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 77e9869345e7..a573ce1d85aa 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1275,6 +1275,9 @@ static void cached_dev_free(struct closure *cl) mutex_unlock(&bch_register_lock); + if (dc->sb_bio.bi_inline_vecs[0].bv_page) + put_page(bio_first_page_all(&dc->sb_bio)); + if (!IS_ERR_OR_NULL(dc->bdev)) blkdev_put(dc->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); From patchwork Thu Jan 23 17:01:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348751 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 222A292A for ; Thu, 23 Jan 2020 17:02:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0A48921569 for ; Thu, 23 Jan 2020 17:02:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729253AbgAWRCG (ORCPT ); Thu, 23 Jan 2020 12:02:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:51306 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728899AbgAWRCG (ORCPT ); Thu, 23 Jan 2020 12:02:06 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 7FE84AD73; Thu, 23 Jan 2020 17:02:04 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , Coly Li Subject: [PATCH 02/17] bcache: use a separate data structure for the on-disk super block Date: Fri, 24 Jan 2020 01:01:27 +0800 Message-Id: <20200123170142.98974-3-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Christoph Hellwig Split out an on-disk version struct cache_sb with the proper endianness annotations. This fixes a fair chunk of sparse warnings, but there are some left due to the way the checksum is defined. Signed-off-by: Christoph Hellwig Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 6 +++--- include/uapi/linux/bcache.h | 51 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index a573ce1d85aa..3045f27e0d67 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -63,14 +63,14 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, struct page **res) { const char *err; - struct cache_sb *s; + struct cache_sb_disk *s; struct buffer_head *bh = __bread(bdev, 1, SB_SIZE); unsigned int i; if (!bh) return "IO error"; - s = (struct cache_sb *) bh->b_data; + s = (struct cache_sb_disk *)bh->b_data; sb->offset = le64_to_cpu(s->offset); sb->version = le64_to_cpu(s->version); @@ -209,7 +209,7 @@ static void write_bdev_super_endio(struct bio *bio) static void __write_super(struct cache_sb *sb, struct bio *bio) { - struct cache_sb *out = page_address(bio_first_page_all(bio)); + struct cache_sb_disk *out = page_address(bio_first_page_all(bio)); unsigned int i; bio->bi_iter.bi_sector = SB_SECTOR; diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h index 5d4f58e059fd..1d8b3a9fc080 100644 --- a/include/uapi/linux/bcache.h +++ b/include/uapi/linux/bcache.h @@ -156,6 +156,57 @@ static inline struct bkey *bkey_idx(const struct bkey *k, unsigned int nr_keys) #define BDEV_DATA_START_DEFAULT 16 /* sectors */ +struct cache_sb_disk { + __le64 csum; + __le64 offset; /* sector where this sb was written */ + __le64 version; + + __u8 magic[16]; + + __u8 uuid[16]; + union { + __u8 set_uuid[16]; + __le64 set_magic; + }; + __u8 label[SB_LABEL_SIZE]; + + __le64 flags; + __le64 seq; + __le64 pad[8]; + + union { + struct { + /* Cache devices */ + __le64 nbuckets; /* device size */ + + __le16 block_size; /* sectors */ + __le16 bucket_size; /* sectors */ + + __le16 nr_in_set; + __le16 nr_this_dev; + }; + struct { + /* Backing devices */ + __le64 data_offset; + + /* + * block_size from the cache device section is still used by + * backing devices, so don't add anything here until we fix + * things to not need it for backing devices anymore + */ + }; + }; + + __le32 last_mount; /* time overflow in y2106 */ + + __le16 first_bucket; + union { + __le16 njournal_buckets; + __le16 keys; + }; + __le64 d[SB_JOURNAL_BUCKETS]; /* journal buckets */ +}; + struct cache_sb { __u64 csum; __u64 offset; /* sector where this sb was written */ From patchwork Thu Jan 23 17:01:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348753 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AEDE192A for ; Thu, 23 Jan 2020 17:02:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 975F821569 for ; Thu, 23 Jan 2020 17:02:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729999AbgAWRCJ (ORCPT ); Thu, 23 Jan 2020 12:02:09 -0500 Received: from mx2.suse.de ([195.135.220.15]:51344 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730014AbgAWRCJ (ORCPT ); Thu, 23 Jan 2020 12:02:09 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 032D0AC50; Thu, 23 Jan 2020 17:02:07 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , Coly Li Subject: [PATCH 03/17] bcache: rework error unwinding in register_bcache Date: Fri, 24 Jan 2020 01:01:28 +0800 Message-Id: <20200123170142.98974-4-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Christoph Hellwig Split the successful and error return path, and use one goto label for each resource to unwind. This also fixes some small errors like leaking the module reference count in the reboot case (which seems entirely harmless) or printing the wrong warning messages for early failures. Signed-off-by: Christoph Hellwig Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++------------------- 1 file changed, 45 insertions(+), 30 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 3045f27e0d67..e8013e1b0a14 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2375,29 +2375,33 @@ static bool bch_is_open(struct block_device *bdev) static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, const char *buffer, size_t size) { - ssize_t ret = -EINVAL; - const char *err = "cannot allocate memory"; - char *path = NULL; - struct cache_sb *sb = NULL; + const char *err; + char *path; + struct cache_sb *sb; struct block_device *bdev = NULL; - struct page *sb_page = NULL; + struct page *sb_page; + ssize_t ret; + ret = -EBUSY; if (!try_module_get(THIS_MODULE)) - return -EBUSY; + goto out; /* For latest state of bcache_is_reboot */ smp_mb(); if (bcache_is_reboot) - return -EBUSY; + goto out_module_put; + ret = -ENOMEM; + err = "cannot allocate memory"; path = kstrndup(buffer, size, GFP_KERNEL); if (!path) - goto err; + goto out_module_put; sb = kmalloc(sizeof(struct cache_sb), GFP_KERNEL); if (!sb) - goto err; + goto out_free_path; + ret = -EINVAL; err = "failed to open device"; bdev = blkdev_get_by_path(strim(path), FMODE_READ|FMODE_WRITE|FMODE_EXCL, @@ -2414,57 +2418,68 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, if (!IS_ERR(bdev)) bdput(bdev); if (attr == &ksysfs_register_quiet) - goto quiet_out; + goto done; } - goto err; + goto out_free_sb; } err = "failed to set blocksize"; if (set_blocksize(bdev, 4096)) - goto err_close; + goto out_blkdev_put; err = read_super(sb, bdev, &sb_page); if (err) - goto err_close; + goto out_blkdev_put; err = "failed to register device"; if (SB_IS_BDEV(sb)) { struct cached_dev *dc = kzalloc(sizeof(*dc), GFP_KERNEL); if (!dc) - goto err_close; + goto out_put_sb_page; mutex_lock(&bch_register_lock); ret = register_bdev(sb, sb_page, bdev, dc); mutex_unlock(&bch_register_lock); /* blkdev_put() will be called in cached_dev_free() */ - if (ret < 0) - goto err; + if (ret < 0) { + bdev = NULL; + goto out_put_sb_page; + } } else { struct cache *ca = kzalloc(sizeof(*ca), GFP_KERNEL); if (!ca) - goto err_close; + goto out_put_sb_page; /* blkdev_put() will be called in bch_cache_release() */ - if (register_cache(sb, sb_page, bdev, ca) != 0) - goto err; + if (register_cache(sb, sb_page, bdev, ca) != 0) { + bdev = NULL; + goto out_put_sb_page; + } } -quiet_out: - ret = size; -out: - if (sb_page) - put_page(sb_page); + + put_page(sb_page); +done: kfree(sb); kfree(path); module_put(THIS_MODULE); - return ret; - -err_close: - blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); -err: + return size; + +out_put_sb_page: + put_page(sb_page); +out_blkdev_put: + if (bdev) + blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); +out_free_sb: + kfree(sb); +out_free_path: + kfree(path); +out_module_put: + module_put(THIS_MODULE); +out: pr_info("error %s: %s", path, err); - goto out; + return ret; } From patchwork Thu Jan 23 17:01:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348755 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 54A191398 for ; Thu, 23 Jan 2020 17:02:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 308DC21569 for ; Thu, 23 Jan 2020 17:02:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730031AbgAWRCM (ORCPT ); Thu, 23 Jan 2020 12:02:12 -0500 Received: from mx2.suse.de ([195.135.220.15]:51400 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730014AbgAWRCM (ORCPT ); Thu, 23 Jan 2020 12:02:12 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id CABB5AC50; Thu, 23 Jan 2020 17:02:10 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 04/17] bcache: properly initialize 'path' and 'err' in register_bcache() Date: Fri, 24 Jan 2020 01:01:29 +0800 Message-Id: <20200123170142.98974-5-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Coly Li Patch "bcache: rework error unwinding in register_bcache" from Christoph Hellwig changes the local variables 'path' and 'err' in undefined initial state. If the code in register_bcache() jumps to label 'out:' or 'out_module_put:' by goto, these two variables might be reference with undefined value by the following line, out_module_put: module_put(THIS_MODULE); out: pr_info("error %s: %s", path, err); return ret; Therefore this patch initializes these two local variables properly in register_bcache() to avoid such issue. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index e8013e1b0a14..ee7c87f38d0d 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2376,18 +2376,20 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, const char *buffer, size_t size) { const char *err; - char *path; + char *path = NULL; struct cache_sb *sb; struct block_device *bdev = NULL; struct page *sb_page; ssize_t ret; ret = -EBUSY; + err = "failed to reference bcache module"; if (!try_module_get(THIS_MODULE)) goto out; /* For latest state of bcache_is_reboot */ smp_mb(); + err = "bcache is in reboot"; if (bcache_is_reboot) goto out_module_put; From patchwork Thu Jan 23 17:01:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348757 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 761081398 for ; Thu, 23 Jan 2020 17:02:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5471121569 for ; Thu, 23 Jan 2020 17:02:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730014AbgAWRCO (ORCPT ); Thu, 23 Jan 2020 12:02:14 -0500 Received: from mx2.suse.de ([195.135.220.15]:51432 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729641AbgAWRCO (ORCPT ); Thu, 23 Jan 2020 12:02:14 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id DB5D2AC50; Thu, 23 Jan 2020 17:02:12 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , Christoph Hellwig Subject: [PATCH 05/17] bcache: fix use-after-free in register_bcache() Date: Fri, 24 Jan 2020 01:01:30 +0800 Message-Id: <20200123170142.98974-6-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Coly Li The patch "bcache: rework error unwinding in register_bcache" introduces a use-after-free regression in register_bcache(). Here are current code, 2510 out_free_path: 2511 kfree(path); 2512 out_module_put: 2513 module_put(THIS_MODULE); 2514 out: 2515 pr_info("error %s: %s", path, err); 2516 return ret; If some error happens and the above code path is executed, at line 2511 path is released, but referenced at line 2515. Then KASAN reports a use- after-free error message. This patch changes line 2515 in the following way to fix the problem, 2515 pr_info("error %s: %s", path?path:"", err); Signed-off-by: Coly Li Cc: Christoph Hellwig --- drivers/md/bcache/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index ee7c87f38d0d..fad9c6cbee5e 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2477,10 +2477,11 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, kfree(sb); out_free_path: kfree(path); + path = NULL; out_module_put: module_put(THIS_MODULE); out: - pr_info("error %s: %s", path, err); + pr_info("error %s: %s", path?path:"", err); return ret; } From patchwork Thu Jan 23 17:01:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348761 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 09C2992A for ; Thu, 23 Jan 2020 17:02:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DC0B42071E for ; Thu, 23 Jan 2020 17:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729639AbgAWRCm (ORCPT ); Thu, 23 Jan 2020 12:02:42 -0500 Received: from mx2.suse.de ([195.135.220.15]:51634 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729335AbgAWRCm (ORCPT ); Thu, 23 Jan 2020 12:02:42 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 7B867AE61; Thu, 23 Jan 2020 17:02:40 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , Coly Li Subject: [PATCH 06/17] bcache: transfer the sb_page reference to register_{bdev,cache} Date: Fri, 24 Jan 2020 01:01:31 +0800 Message-Id: <20200123170142.98974-7-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Christoph Hellwig Avoid an extra reference count roundtrip by transferring the sb_page ownership to the lower level register helpers. Signed-off-by: Christoph Hellwig Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index fad9c6cbee5e..cf2cafef381f 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1368,8 +1368,6 @@ static int register_bdev(struct cache_sb *sb, struct page *sb_page, bio_init(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1); bio_first_bvec_all(&dc->sb_bio)->bv_page = sb_page; - get_page(sb_page); - if (cached_dev_init(dc, sb->block_size << 9)) goto err; @@ -2275,7 +2273,6 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page, bio_init(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1); bio_first_bvec_all(&ca->sb_bio)->bv_page = sb_page; - get_page(sb_page); if (blk_queue_discard(bdev_get_queue(bdev))) ca->discard = CACHE_DISCARD(&ca->sb); @@ -2378,7 +2375,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, const char *err; char *path = NULL; struct cache_sb *sb; - struct block_device *bdev = NULL; + struct block_device *bdev; struct page *sb_page; ssize_t ret; @@ -2444,10 +2441,8 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, ret = register_bdev(sb, sb_page, bdev, dc); mutex_unlock(&bch_register_lock); /* blkdev_put() will be called in cached_dev_free() */ - if (ret < 0) { - bdev = NULL; - goto out_put_sb_page; - } + if (ret < 0) + goto out_free_sb; } else { struct cache *ca = kzalloc(sizeof(*ca), GFP_KERNEL); @@ -2455,13 +2450,10 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, goto out_put_sb_page; /* blkdev_put() will be called in bch_cache_release() */ - if (register_cache(sb, sb_page, bdev, ca) != 0) { - bdev = NULL; - goto out_put_sb_page; - } + if (register_cache(sb, sb_page, bdev, ca) != 0) + goto out_free_sb; } - put_page(sb_page); done: kfree(sb); kfree(path); @@ -2471,8 +2463,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, out_put_sb_page: put_page(sb_page); out_blkdev_put: - if (bdev) - blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); + blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); out_free_sb: kfree(sb); out_free_path: From patchwork Thu Jan 23 17:01:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348763 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 33F5A1398 for ; Thu, 23 Jan 2020 17:02:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1B41D21569 for ; Thu, 23 Jan 2020 17:02:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729641AbgAWRCp (ORCPT ); Thu, 23 Jan 2020 12:02:45 -0500 Received: from mx2.suse.de ([195.135.220.15]:51668 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729335AbgAWRCp (ORCPT ); Thu, 23 Jan 2020 12:02:45 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id E8DB8AE2A; Thu, 23 Jan 2020 17:02:42 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , Coly Li Subject: [PATCH 07/17] bcache: return a pointer to the on-disk sb from read_super Date: Fri, 24 Jan 2020 01:01:32 +0800 Message-Id: <20200123170142.98974-8-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Christoph Hellwig Returning the properly typed actual data structure insteaf of the containing struct page will save the callers some work going forward. Signed-off-by: Christoph Hellwig Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index cf2cafef381f..b30e4c6011d2 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -60,7 +60,7 @@ struct workqueue_struct *bch_journal_wq; /* Superblock */ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, - struct page **res) + struct cache_sb_disk **res) { const char *err; struct cache_sb_disk *s; @@ -191,7 +191,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, err = NULL; get_page(bh->b_page); - *res = bh->b_page; + *res = s; err: put_bh(bh); return err; @@ -1353,7 +1353,7 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) /* Cached device - bcache superblock */ -static int register_bdev(struct cache_sb *sb, struct page *sb_page, +static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk, struct block_device *bdev, struct cached_dev *dc) { @@ -1367,7 +1367,7 @@ static int register_bdev(struct cache_sb *sb, struct page *sb_page, dc->bdev->bd_holder = dc; bio_init(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1); - bio_first_bvec_all(&dc->sb_bio)->bv_page = sb_page; + bio_first_bvec_all(&dc->sb_bio)->bv_page = virt_to_page(sb_disk); if (cached_dev_init(dc, sb->block_size << 9)) goto err; @@ -2260,7 +2260,7 @@ static int cache_alloc(struct cache *ca) return ret; } -static int register_cache(struct cache_sb *sb, struct page *sb_page, +static int register_cache(struct cache_sb *sb, struct cache_sb_disk *sb_disk, struct block_device *bdev, struct cache *ca) { const char *err = NULL; /* must be set for any error case */ @@ -2272,7 +2272,7 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page, ca->bdev->bd_holder = ca; bio_init(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1); - bio_first_bvec_all(&ca->sb_bio)->bv_page = sb_page; + bio_first_bvec_all(&ca->sb_bio)->bv_page = virt_to_page(sb_disk); if (blk_queue_discard(bdev_get_queue(bdev))) ca->discard = CACHE_DISCARD(&ca->sb); @@ -2375,8 +2375,8 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, const char *err; char *path = NULL; struct cache_sb *sb; + struct cache_sb_disk *sb_disk; struct block_device *bdev; - struct page *sb_page; ssize_t ret; ret = -EBUSY; @@ -2426,7 +2426,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, if (set_blocksize(bdev, 4096)) goto out_blkdev_put; - err = read_super(sb, bdev, &sb_page); + err = read_super(sb, bdev, &sb_disk); if (err) goto out_blkdev_put; @@ -2438,7 +2438,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, goto out_put_sb_page; mutex_lock(&bch_register_lock); - ret = register_bdev(sb, sb_page, bdev, dc); + ret = register_bdev(sb, sb_disk, bdev, dc); mutex_unlock(&bch_register_lock); /* blkdev_put() will be called in cached_dev_free() */ if (ret < 0) @@ -2450,7 +2450,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, goto out_put_sb_page; /* blkdev_put() will be called in bch_cache_release() */ - if (register_cache(sb, sb_page, bdev, ca) != 0) + if (register_cache(sb, sb_disk, bdev, ca) != 0) goto out_free_sb; } @@ -2461,7 +2461,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, return size; out_put_sb_page: - put_page(sb_page); + put_page(virt_to_page(sb_disk)); out_blkdev_put: blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL); out_free_sb: From patchwork Thu Jan 23 17:01:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348765 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3BF2F1398 for ; Thu, 23 Jan 2020 17:02:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 244C721569 for ; Thu, 23 Jan 2020 17:02:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729335AbgAWRCr (ORCPT ); Thu, 23 Jan 2020 12:02:47 -0500 Received: from mx2.suse.de ([195.135.220.15]:51702 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729715AbgAWRCr (ORCPT ); Thu, 23 Jan 2020 12:02:47 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 0C452AC50; Thu, 23 Jan 2020 17:02:45 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , Coly Li Subject: [PATCH 08/17] bcache: store a pointer to the on-disk sb in the cache and cached_dev structures Date: Fri, 24 Jan 2020 01:01:33 +0800 Message-Id: <20200123170142.98974-9-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Christoph Hellwig This allows to properly build the superblock bio including the offset in the page using the normal bio helpers. This fixes writing the superblock for page sizes larger than 4k where the sb write bio would need an offset in the bio_vec. Signed-off-by: Christoph Hellwig Signed-off-by: Coly Li --- drivers/md/bcache/bcache.h | 2 ++ drivers/md/bcache/super.c | 34 +++++++++++++++------------------- 2 files changed, 17 insertions(+), 19 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 9198c1b480d9..adf26a21fcd1 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -301,6 +301,7 @@ struct cached_dev { struct block_device *bdev; struct cache_sb sb; + struct cache_sb_disk *sb_disk; struct bio sb_bio; struct bio_vec sb_bv[1]; struct closure sb_write; @@ -403,6 +404,7 @@ enum alloc_reserve { struct cache { struct cache_set *set; struct cache_sb sb; + struct cache_sb_disk *sb_disk; struct bio sb_bio; struct bio_vec sb_bv[1]; diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index b30e4c6011d2..5797c03f993e 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -207,15 +207,15 @@ static void write_bdev_super_endio(struct bio *bio) closure_put(&dc->sb_write); } -static void __write_super(struct cache_sb *sb, struct bio *bio) +static void __write_super(struct cache_sb *sb, struct cache_sb_disk *out, + struct bio *bio) { - struct cache_sb_disk *out = page_address(bio_first_page_all(bio)); unsigned int i; + bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_META; bio->bi_iter.bi_sector = SB_SECTOR; - bio->bi_iter.bi_size = SB_SIZE; - bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC|REQ_META); - bch_bio_map(bio, NULL); + __bio_add_page(bio, virt_to_page(out), SB_SIZE, + offset_in_page(out)); out->offset = cpu_to_le64(sb->offset); out->version = cpu_to_le64(sb->version); @@ -257,14 +257,14 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent) down(&dc->sb_write_mutex); closure_init(cl, parent); - bio_reset(bio); + bio_init(bio, dc->sb_bv, 1); bio_set_dev(bio, dc->bdev); bio->bi_end_io = write_bdev_super_endio; bio->bi_private = dc; closure_get(cl); /* I/O request sent to backing device */ - __write_super(&dc->sb, bio); + __write_super(&dc->sb, dc->sb_disk, bio); closure_return_with_destructor(cl, bch_write_bdev_super_unlock); } @@ -306,13 +306,13 @@ void bcache_write_super(struct cache_set *c) SET_CACHE_SYNC(&ca->sb, CACHE_SYNC(&c->sb)); - bio_reset(bio); + bio_init(bio, ca->sb_bv, 1); bio_set_dev(bio, ca->bdev); bio->bi_end_io = write_super_endio; bio->bi_private = ca; closure_get(cl); - __write_super(&ca->sb, bio); + __write_super(&ca->sb, ca->sb_disk, bio); } closure_return_with_destructor(cl, bcache_write_super_unlock); @@ -1275,8 +1275,8 @@ static void cached_dev_free(struct closure *cl) mutex_unlock(&bch_register_lock); - if (dc->sb_bio.bi_inline_vecs[0].bv_page) - put_page(bio_first_page_all(&dc->sb_bio)); + if (dc->sb_disk) + put_page(virt_to_page(dc->sb_disk)); if (!IS_ERR_OR_NULL(dc->bdev)) blkdev_put(dc->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); @@ -1365,9 +1365,7 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk, memcpy(&dc->sb, sb, sizeof(struct cache_sb)); dc->bdev = bdev; dc->bdev->bd_holder = dc; - - bio_init(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1); - bio_first_bvec_all(&dc->sb_bio)->bv_page = virt_to_page(sb_disk); + dc->sb_disk = sb_disk; if (cached_dev_init(dc, sb->block_size << 9)) goto err; @@ -2137,8 +2135,8 @@ void bch_cache_release(struct kobject *kobj) for (i = 0; i < RESERVE_NR; i++) free_fifo(&ca->free[i]); - if (ca->sb_bio.bi_inline_vecs[0].bv_page) - put_page(bio_first_page_all(&ca->sb_bio)); + if (ca->sb_disk) + put_page(virt_to_page(ca->sb_disk)); if (!IS_ERR_OR_NULL(ca->bdev)) blkdev_put(ca->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); @@ -2270,9 +2268,7 @@ static int register_cache(struct cache_sb *sb, struct cache_sb_disk *sb_disk, memcpy(&ca->sb, sb, sizeof(struct cache_sb)); ca->bdev = bdev; ca->bdev->bd_holder = ca; - - bio_init(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1); - bio_first_bvec_all(&ca->sb_bio)->bv_page = virt_to_page(sb_disk); + ca->sb_disk = sb_disk; if (blk_queue_discard(bdev_get_queue(bdev))) ca->discard = CACHE_DISCARD(&ca->sb); From patchwork Thu Jan 23 17:01:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348767 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E325592A for ; Thu, 23 Jan 2020 17:02:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C275821734 for ; Thu, 23 Jan 2020 17:02:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729715AbgAWRCv (ORCPT ); Thu, 23 Jan 2020 12:02:51 -0500 Received: from mx2.suse.de ([195.135.220.15]:51742 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729396AbgAWRCv (ORCPT ); Thu, 23 Jan 2020 12:02:51 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 6023EAD78; Thu, 23 Jan 2020 17:02:49 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , Coly Li Subject: [PATCH 09/17] bcache: use read_cache_page_gfp to read the superblock Date: Fri, 24 Jan 2020 01:01:34 +0800 Message-Id: <20200123170142.98974-10-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Christoph Hellwig Avoid a pointless dependency on buffer heads in bcache by simply open coding reading a single page. Also add a SB_OFFSET define for the byte offset of the superblock instead of using magic numbers. Signed-off-by: Christoph Hellwig Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 16 +++++++--------- include/uapi/linux/bcache.h | 1 + 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 5797c03f993e..3dea1d5acd5c 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -15,7 +15,6 @@ #include "writeback.h" #include -#include #include #include #include @@ -64,13 +63,14 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, { const char *err; struct cache_sb_disk *s; - struct buffer_head *bh = __bread(bdev, 1, SB_SIZE); + struct page *page; unsigned int i; - if (!bh) + page = read_cache_page_gfp(bdev->bd_inode->i_mapping, + SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL); + if (IS_ERR(page)) return "IO error"; - - s = (struct cache_sb_disk *)bh->b_data; + s = page_address(page) + offset_in_page(SB_OFFSET); sb->offset = le64_to_cpu(s->offset); sb->version = le64_to_cpu(s->version); @@ -188,12 +188,10 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, } sb->last_mount = (u32)ktime_get_real_seconds(); - err = NULL; - - get_page(bh->b_page); *res = s; + return NULL; err: - put_bh(bh); + put_page(page); return err; } diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h index 1d8b3a9fc080..9a1965c6c3d0 100644 --- a/include/uapi/linux/bcache.h +++ b/include/uapi/linux/bcache.h @@ -148,6 +148,7 @@ static inline struct bkey *bkey_idx(const struct bkey *k, unsigned int nr_keys) #define BCACHE_SB_MAX_VERSION 4 #define SB_SECTOR 8 +#define SB_OFFSET (SB_SECTOR << SECTOR_SHIFT) #define SB_SIZE 4096 #define SB_LABEL_SIZE 32 #define SB_JOURNAL_BUCKETS 256U From patchwork Thu Jan 23 17:01:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348769 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E01881398 for ; Thu, 23 Jan 2020 17:02:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C827B21734 for ; Thu, 23 Jan 2020 17:02:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729829AbgAWRCy (ORCPT ); Thu, 23 Jan 2020 12:02:54 -0500 Received: from mx2.suse.de ([195.135.220.15]:51762 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729396AbgAWRCy (ORCPT ); Thu, 23 Jan 2020 12:02:54 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 7E02AAD73; Thu, 23 Jan 2020 17:02:52 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, "Ben Dooks (Codethink)" , Coly Li Subject: [PATCH 10/17] lib: crc64: include for 'crc64_be' Date: Fri, 24 Jan 2020 01:01:35 +0800 Message-Id: <20200123170142.98974-11-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: "Ben Dooks (Codethink)" The crc64_be() is declared in so include this where the symbol is defined to avoid the following warning: lib/crc64.c:43:12: warning: symbol 'crc64_be' was not declared. Should it be static? Signed-off-by: Ben Dooks (Codethink) Reviewed-by: Andy Shevchenko Signed-off-by: Coly Li --- lib/crc64.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/crc64.c b/lib/crc64.c index 0ef8ae6ac047..f8928ce28280 100644 --- a/lib/crc64.c +++ b/lib/crc64.c @@ -28,6 +28,7 @@ #include #include +#include #include "crc64table.h" MODULE_DESCRIPTION("CRC64 calculations"); From patchwork Thu Jan 23 17:01:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348771 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1EC3E92A for ; Thu, 23 Jan 2020 17:02:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 073C621569 for ; Thu, 23 Jan 2020 17:02:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730025AbgAWRC5 (ORCPT ); Thu, 23 Jan 2020 12:02:57 -0500 Received: from mx2.suse.de ([195.135.220.15]:51782 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729396AbgAWRC5 (ORCPT ); Thu, 23 Jan 2020 12:02:57 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id D2F27AC50; Thu, 23 Jan 2020 17:02:55 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 11/17] bcache: add code comments for state->pool in __btree_sort() Date: Fri, 24 Jan 2020 01:01:36 +0800 Message-Id: <20200123170142.98974-12-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Coly Li To explain the pages allocated from mempool state->pool can be swapped in __btree_sort(), because state->pool is a page pool, which allocates pages by alloc_pages() indeed. Signed-off-by: Coly Li --- drivers/md/bcache/bset.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c index cffcdc9feefb..4385303836d8 100644 --- a/drivers/md/bcache/bset.c +++ b/drivers/md/bcache/bset.c @@ -1257,6 +1257,11 @@ static void __btree_sort(struct btree_keys *b, struct btree_iter *iter, * Our temporary buffer is the same size as the btree node's * buffer, we can just swap buffers instead of doing a big * memcpy() + * + * Don't worry event 'out' is allocated from mempool, it can + * still be swapped here. Because state->pool is a page mempool + * creaated by by mempool_init_page_pool(), which allocates + * pages by alloc_pages() indeed. */ out->magic = b->set->data->magic; From patchwork Thu Jan 23 17:01:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348775 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 68EFE92A for ; Thu, 23 Jan 2020 17:03:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4754721734 for ; Thu, 23 Jan 2020 17:03:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730035AbgAWRDC (ORCPT ); Thu, 23 Jan 2020 12:03:02 -0500 Received: from mx2.suse.de ([195.135.220.15]:51806 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729396AbgAWRDC (ORCPT ); Thu, 23 Jan 2020 12:03:02 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id CFF4AAC50; Thu, 23 Jan 2020 17:02:59 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 12/17] bcache: avoid unnecessary btree nodes flushing in btree_flush_write() Date: Fri, 24 Jan 2020 01:01:37 +0800 Message-Id: <20200123170142.98974-13-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Coly Li the commit 91be66e1318f ("bcache: performance improvement for btree_flush_write()") was an effort to flushing btree node with oldest btree node faster in following methods, - Only iterate dirty btree nodes in c->btree_cache, avoid scanning a lot of clean btree nodes. - Take c->btree_cache as a LRU-like list, aggressively flushing all dirty nodes from tail of c->btree_cache util the btree node with oldest journal entry is flushed. This is to reduce the time of holding c->bucket_lock. Guoju Fang and Shuang Li reported that they observe unexptected extra write I/Os on cache device after applying the above patch. Guoju Fang provideed more detailed diagnose information that the aggressive btree nodes flushing may cause 10x more btree nodes to flush in his workload. He points out when system memory is large enough to hold all btree nodes in memory, c->btree_cache is not a LRU-like list any more. Then the btree node with oldest journal entry is very probably not- close to the tail of c->btree_cache list. In such situation much more dirty btree nodes will be aggressively flushed before the target node is flushed. When slow SATA SSD is used as cache device, such over- aggressive flushing behavior will cause performance regression. After spending a lot of time on debug and diagnose, I find the real condition is more complicated, aggressive flushing dirty btree nodes from tail of c->btree_cache list is not a good solution. - When all btree nodes are cached in memory, c->btree_cache is not a LRU-like list, the btree nodes with oldest journal entry won't be close to the tail of the list. - There can be hundreds dirty btree nodes reference the oldest journal entry, before flushing all the nodes the oldest journal entry cannot be reclaimed. When the above two conditions mixed together, a simply flushing from tail of c->btree_cache list is really NOT a good idea. Fortunately there is still chance to make btree_flush_write() work better. Here is how this patch avoids unnecessary btree nodes flushing, - Only acquire c->journal.lock when getting oldest journal entry of fifo c->journal.pin. In rested locations check the journal entries locklessly, so their values can be changed on other cores in parallel. - In loop list_for_each_entry_safe_reverse(), checking latest front point of fifo c->journal.pin. If it is different from the original point which we get with locking c->journal.lock, it means the oldest journal entry is reclaim on other cores. At this moment, all selected dirty nodes recorded in array btree_nodes[] are all flushed and clean on other CPU cores, it is unncessary to iterate c->btree_cache any longer. Just quit the list_for_each_entry_safe_reverse() loop and the following for-loop will skip all the selected clean nodes. - Find a proper time to quit the list_for_each_entry_safe_reverse() loop. Check the refcount value of orignial fifo front point, if the value is larger than selected node number of btree_nodes[], it means more matching btree nodes should be scanned. Otherwise it means no more matching btee nodes in rest of c->btree_cache list, the loop can be quit. If the original oldest journal entry is reclaimed and fifo front point is updated, the refcount of original fifo front point will be 0, then the loop will be quit too. - Not hold c->bucket_lock too long time. c->bucket_lock is also required for space allocation for cached data, hold it for too long time will block regular I/O requests. When iterating list c->btree_cache, even there are a lot of maching btree nodes, in order to not holding c->bucket_lock for too long time, only BTREE_FLUSH_NR nodes are selected and to flush in following for-loop. With this patch, only btree nodes referencing oldest journal entry are flushed to cache device, no aggressive flushing for unnecessary btree node any more. And in order to avoid blocking regluar I/O requests, each time when btree_flush_write() called, at most only BTREE_FLUSH_NR btree nodes are selected to flush, even there are more maching btree nodes in list c->btree_cache. At last, one more thing to explain: Why it is safe to read front point of c->journal.pin without holding c->journal.lock inside the list_for_each_entry_safe_reverse() loop ? Here is my answer: When reading the front point of fifo c->journal.pin, we don't need to know the exact value of front point, we just want to check whether the value is different from the original front point (which is accurate value because we get it while c->jouranl.lock is held). For such purpose, it works as expected without holding c->journal.lock. Even the front point is changed on other CPU core and not updated to local core, and current iterating btree node has identical journal entry local as original fetched fifo front point, it is still safe. Because after holding mutex b->write_lock (with memory barrier) this btree node can be found as clean and skipped, the loop will quite latter when iterate on next node of list c->btree_cache. Fixes: 91be66e1318f ("bcache: performance improvement for btree_flush_write()") Reported-by: Guoju Fang Reported-by: Shuang Li Signed-off-by: Coly Li --- drivers/md/bcache/journal.c | 80 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 75 insertions(+), 5 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index be2a2a201603..33ddc5269e8d 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -417,10 +417,14 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list) /* Journalling */ +#define nr_to_fifo_front(p, front_p, mask) (((p) - (front_p)) & (mask)) + static void btree_flush_write(struct cache_set *c) { struct btree *b, *t, *btree_nodes[BTREE_FLUSH_NR]; - unsigned int i, n; + unsigned int i, nr, ref_nr; + atomic_t *fifo_front_p, *now_fifo_front_p; + size_t mask; if (c->journal.btree_flushing) return; @@ -433,12 +437,50 @@ static void btree_flush_write(struct cache_set *c) c->journal.btree_flushing = true; spin_unlock(&c->journal.flush_write_lock); + /* get the oldest journal entry and check its refcount */ + spin_lock(&c->journal.lock); + fifo_front_p = &fifo_front(&c->journal.pin); + ref_nr = atomic_read(fifo_front_p); + if (ref_nr <= 0) { + /* + * do nothing if no btree node references + * the oldest journal entry + */ + spin_unlock(&c->journal.lock); + goto out; + } + spin_unlock(&c->journal.lock); + + mask = c->journal.pin.mask; + nr = 0; atomic_long_inc(&c->flush_write); memset(btree_nodes, 0, sizeof(btree_nodes)); - n = 0; mutex_lock(&c->bucket_lock); list_for_each_entry_safe_reverse(b, t, &c->btree_cache, list) { + /* + * It is safe to get now_fifo_front_p without holding + * c->journal.lock here, because we don't need to know + * the exactly accurate value, just check whether the + * front pointer of c->journal.pin is changed. + */ + now_fifo_front_p = &fifo_front(&c->journal.pin); + /* + * If the oldest journal entry is reclaimed and front + * pointer of c->journal.pin changes, it is unnecessary + * to scan c->btree_cache anymore, just quit the loop and + * flush out what we have already. + */ + if (now_fifo_front_p != fifo_front_p) + break; + /* + * quit this loop if all matching btree nodes are + * scanned and record in btree_nodes[] already. + */ + ref_nr = atomic_read(fifo_front_p); + if (nr >= ref_nr) + break; + if (btree_node_journal_flush(b)) pr_err("BUG: flush_write bit should not be set here!"); @@ -454,17 +496,44 @@ static void btree_flush_write(struct cache_set *c) continue; } + /* + * Only select the btree node which exactly references + * the oldest journal entry. + * + * If the journal entry pointed by fifo_front_p is + * reclaimed in parallel, don't worry: + * - the list_for_each_xxx loop will quit when checking + * next now_fifo_front_p. + * - If there are matched nodes recorded in btree_nodes[], + * they are clean now (this is why and how the oldest + * journal entry can be reclaimed). These selected nodes + * will be ignored and skipped in the folowing for-loop. + */ + if (nr_to_fifo_front(btree_current_write(b)->journal, + fifo_front_p, + mask) != 0) { + mutex_unlock(&b->write_lock); + continue; + } + set_btree_node_journal_flush(b); mutex_unlock(&b->write_lock); - btree_nodes[n++] = b; - if (n == BTREE_FLUSH_NR) + btree_nodes[nr++] = b; + /* + * To avoid holding c->bucket_lock too long time, + * only scan for BTREE_FLUSH_NR matched btree nodes + * at most. If there are more btree nodes reference + * the oldest journal entry, try to flush them next + * time when btree_flush_write() is called. + */ + if (nr == BTREE_FLUSH_NR) break; } mutex_unlock(&c->bucket_lock); - for (i = 0; i < n; i++) { + for (i = 0; i < nr; i++) { b = btree_nodes[i]; if (!b) { pr_err("BUG: btree_nodes[%d] is NULL", i); @@ -497,6 +566,7 @@ static void btree_flush_write(struct cache_set *c) mutex_unlock(&b->write_lock); } +out: spin_lock(&c->journal.flush_write_lock); c->journal.btree_flushing = false; spin_unlock(&c->journal.flush_write_lock); From patchwork Thu Jan 23 17:01:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348777 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 007AB1398 for ; Thu, 23 Jan 2020 17:03:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DD53921734 for ; Thu, 23 Jan 2020 17:03:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729396AbgAWRDE (ORCPT ); Thu, 23 Jan 2020 12:03:04 -0500 Received: from mx2.suse.de ([195.135.220.15]:51832 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730037AbgAWRDE (ORCPT ); Thu, 23 Jan 2020 12:03:04 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id A4FB5AD73; Thu, 23 Jan 2020 17:03:02 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Guoju Fang , Coly Li Subject: [PATCH 13/17] bcache: print written and keys in trace_bcache_btree_write Date: Fri, 24 Jan 2020 01:01:38 +0800 Message-Id: <20200123170142.98974-14-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Guoju Fang It's useful to dump written block and keys on btree write, this patch add them into trace_bcache_btree_write. Signed-off-by: Guoju Fang Signed-off-by: Coly Li --- include/trace/events/bcache.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/trace/events/bcache.h b/include/trace/events/bcache.h index e4526f85c19d..0bddea663b3b 100644 --- a/include/trace/events/bcache.h +++ b/include/trace/events/bcache.h @@ -275,7 +275,8 @@ TRACE_EVENT(bcache_btree_write, __entry->keys = b->keys.set[b->keys.nsets].data->keys; ), - TP_printk("bucket %zu", __entry->bucket) + TP_printk("bucket %zu written block %u + %u", + __entry->bucket, __entry->block, __entry->keys) ); DEFINE_EVENT(btree_node, bcache_btree_node_alloc, From patchwork Thu Jan 23 17:01:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348779 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B17C51398 for ; Thu, 23 Jan 2020 17:03:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9A0C721569 for ; Thu, 23 Jan 2020 17:03:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730040AbgAWRDG (ORCPT ); Thu, 23 Jan 2020 12:03:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:51858 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730037AbgAWRDG (ORCPT ); Thu, 23 Jan 2020 12:03:06 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 87F6CAC50; Thu, 23 Jan 2020 17:03:04 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , stable@vger.kernel.org Subject: [PATCH 14/17] bcache: back to cache all readahead I/Os Date: Fri, 24 Jan 2020 01:01:39 +0800 Message-Id: <20200123170142.98974-15-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Coly Li In year 2007 high performance SSD was still expensive, in order to save more space for real workload or meta data, the readahead I/Os for non-meta data was bypassed and not cached on SSD. In now days, SSD price drops a lot and people can find larger size SSD with more comfortable price. It is unncessary to bypass normal readahead I/Os to save SSD space for now. This patch removes the code which checks REQ_RAHEAD tag of bio in check_should_bypass(), then all readahead I/Os will be cached on SSD. NOTE: this patch still keeps the checking of "REQ_META|REQ_PRIO" in should_writeback(), because we still want to cache meta data I/Os even they are asynchronized. Cc: stable@vger.kernel.org Signed-off-by: Coly Li Acked-by: Eric Wheeler --- drivers/md/bcache/request.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 73478a91a342..acc07c4f27ae 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -378,15 +378,6 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) op_is_write(bio_op(bio)))) goto skip; - /* - * Flag for bypass if the IO is for read-ahead or background, - * unless the read-ahead request is for metadata - * (eg, for gfs2 or xfs). - */ - if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) && - !(bio->bi_opf & (REQ_META|REQ_PRIO))) - goto skip; - if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) || bio_sectors(bio) & (c->sb.block_size - 1)) { pr_debug("skipping unaligned io"); From patchwork Thu Jan 23 17:01:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348781 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D64B492A for ; Thu, 23 Jan 2020 17:03:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B506421D7D for ; Thu, 23 Jan 2020 17:03:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729827AbgAWRDI (ORCPT ); Thu, 23 Jan 2020 12:03:08 -0500 Received: from mx2.suse.de ([195.135.220.15]:51882 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730037AbgAWRDH (ORCPT ); Thu, 23 Jan 2020 12:03:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 06026AD73; Thu, 23 Jan 2020 17:03:06 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 15/17] bcache: remove member accessed from struct btree Date: Fri, 24 Jan 2020 01:01:40 +0800 Message-Id: <20200123170142.98974-16-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Coly Li The member 'accessed' of struct btree is used in bch_mca_scan() when shrinking btree node caches. The original idea is, if b->accessed is set, clean it and look at next btree node cache from c->btree_cache list, and only shrink the caches whose b->accessed is cleaned. Then only cold btree node cache will be shrunk. But when I/O pressure is high, it is very probably that b->accessed of a btree node cache will be set again in bch_btree_node_get() before bch_mca_scan() selects it again. Then there is no chance for bch_mca_scan() to shrink enough memory back to slub or slab system. This patch removes member accessed from struct btree, then once a btree node ache is selected, it will be immediately shunk. By this change, bch_mca_scan() may release btree node cahce more efficiently. Signed-off-by: Coly Li --- drivers/md/bcache/btree.c | 8 ++------ drivers/md/bcache/btree.h | 2 -- 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 14d6c33b0957..357535a5c89c 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -754,14 +754,12 @@ static unsigned long bch_mca_scan(struct shrinker *shrink, b = list_first_entry(&c->btree_cache, struct btree, list); list_rotate_left(&c->btree_cache); - if (!b->accessed && - !mca_reap(b, 0, false)) { + if (!mca_reap(b, 0, false)) { mca_bucket_free(b); mca_data_free(b); rw_unlock(true, b); freed++; - } else - b->accessed = 0; + } } out: mutex_unlock(&c->bucket_lock); @@ -1069,7 +1067,6 @@ struct btree *bch_btree_node_get(struct cache_set *c, struct btree_op *op, BUG_ON(!b->written); b->parent = parent; - b->accessed = 1; for (; i <= b->keys.nsets && b->keys.set[i].size; i++) { prefetch(b->keys.set[i].tree); @@ -1160,7 +1157,6 @@ struct btree *__bch_btree_node_alloc(struct cache_set *c, struct btree_op *op, goto retry; } - b->accessed = 1; b->parent = parent; bch_bset_init_next(&b->keys, b->keys.set->data, bset_magic(&b->c->sb)); diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h index 76cfd121a486..f4dcca449391 100644 --- a/drivers/md/bcache/btree.h +++ b/drivers/md/bcache/btree.h @@ -121,8 +121,6 @@ struct btree { /* Key/pointer for this btree node */ BKEY_PADDED(key); - /* Single bit - set when accessed, cleared by shrinker */ - unsigned long accessed; unsigned long seq; struct rw_semaphore lock; struct cache_set *c; From patchwork Thu Jan 23 17:01:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348783 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6093892A for ; Thu, 23 Jan 2020 17:03:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3F1AB21734 for ; Thu, 23 Jan 2020 17:03:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730051AbgAWRDL (ORCPT ); Thu, 23 Jan 2020 12:03:11 -0500 Received: from mx2.suse.de ([195.135.220.15]:51950 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730037AbgAWRDL (ORCPT ); Thu, 23 Jan 2020 12:03:11 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 08BC5AD73; Thu, 23 Jan 2020 17:03:10 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 16/17] bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan() Date: Fri, 24 Jan 2020 01:01:41 +0800 Message-Id: <20200123170142.98974-17-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Coly Li In order to skip the most recently freed btree node cahce, currently in bch_mca_scan() the first 3 caches in c->btree_cache_freeable list are skipped when shrinking bcache node caches in bch_mca_scan(). The related code in bch_mca_scan() is, 737 list_for_each_entry_safe(b, t, &c->btree_cache_freeable, list) { 738 if (nr <= 0) 739 goto out; 740 741 if (++i > 3 && 742 !mca_reap(b, 0, false)) { lines free cache memory 746 } 747 nr--; 748 } The problem is, if virtual memory code calls bch_mca_scan() and the calculated 'nr' is 1 or 2, then in the above loop, nothing will be shunk. In such case, if slub/slab manager calls bch_mca_scan() for many times with small scan number, it does not help to shrink cache memory and just wasts CPU cycles. This patch just selects btree node caches from tail of the c->btree_cache_freeable list, then the newly freed host cache can still be allocated by mca_alloc(), and at least 1 node can be shunk. Signed-off-by: Coly Li --- drivers/md/bcache/btree.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 357535a5c89c..c3a314deb09d 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -734,17 +734,17 @@ static unsigned long bch_mca_scan(struct shrinker *shrink, i = 0; btree_cache_used = c->btree_cache_used; - list_for_each_entry_safe(b, t, &c->btree_cache_freeable, list) { + list_for_each_entry_safe_reverse(b, t, &c->btree_cache_freeable, list) { if (nr <= 0) goto out; - if (++i > 3 && - !mca_reap(b, 0, false)) { + if (!mca_reap(b, 0, false)) { mca_data_free(b); rw_unlock(true, b); freed++; } nr--; + i++; } for (; (nr--) && i < btree_cache_used; i++) { From patchwork Thu Jan 23 17:01:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11348785 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5245514B4 for ; Thu, 23 Jan 2020 17:03:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3112121D7E for ; Thu, 23 Jan 2020 17:03:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730054AbgAWRDO (ORCPT ); Thu, 23 Jan 2020 12:03:14 -0500 Received: from mx2.suse.de ([195.135.220.15]:51984 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730037AbgAWRDO (ORCPT ); Thu, 23 Jan 2020 12:03:14 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id F08BFAD73; Thu, 23 Jan 2020 17:03:12 +0000 (UTC) From: colyli@suse.de To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 17/17] bcache: reap from tail of c->btree_cache in bch_mca_scan() Date: Fri, 24 Jan 2020 01:01:42 +0800 Message-Id: <20200123170142.98974-18-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20200123170142.98974-1-colyli@suse.de> References: <20200123170142.98974-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Coly Li When shrink btree node cache from c->btree_cache in bch_mca_scan(), no matter the selected node is reaped or not, it will be rotated from the head to the tail of c->btree_cache list. But in bcache journal code, when flushing the btree nodes with oldest journal entry, btree nodes are iterated and slected from the tail of c->btree_cache list in btree_flush_write(). The list_rotate_left() in bch_mca_scan() will make btree_flush_write() iterate more nodes in c->btree_list in reverse order. This patch just reaps the selected btree node cache, and not move it from the head to the tail of c->btree_cache list. Then bch_mca_scan() will not mess up c->btree_cache list to btree_flush_write(). Signed-off-by: Coly Li --- drivers/md/bcache/btree.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index c3a314deb09d..fa872df4e770 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -747,19 +747,19 @@ static unsigned long bch_mca_scan(struct shrinker *shrink, i++; } - for (; (nr--) && i < btree_cache_used; i++) { - if (list_empty(&c->btree_cache)) + list_for_each_entry_safe_reverse(b, t, &c->btree_cache, list) { + if (nr <= 0 || i >= btree_cache_used) goto out; - b = list_first_entry(&c->btree_cache, struct btree, list); - list_rotate_left(&c->btree_cache); - if (!mca_reap(b, 0, false)) { mca_bucket_free(b); mca_data_free(b); rw_unlock(true, b); freed++; } + + nr--; + i++; } out: mutex_unlock(&c->bucket_lock);