From patchwork Fri Jun 28 11:59:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022023 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6452414C0 for ; Fri, 28 Jun 2019 12:00:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 548D026AE3 for ; Fri, 28 Jun 2019 12:00:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 489C6287E5; Fri, 28 Jun 2019 12:00:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EAEB826AE3 for ; Fri, 28 Jun 2019 12:00:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726883AbfF1MAP (ORCPT ); Fri, 28 Jun 2019 08:00:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:54064 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726646AbfF1MAO (ORCPT ); Fri, 28 Jun 2019 08:00:14 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 49841B631; Fri, 28 Jun 2019 12:00:13 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 01/37] bcache: don't set max writeback rate if gc is running Date: Fri, 28 Jun 2019 19:59:24 +0800 Message-Id: <20190628120000.40753-2-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When gc is running, user space I/O processes may wait inside bcache code, so no new I/O coming. Indeed this is not a real idle time, maximum writeback rate should not be set in such situation. Otherwise a faster writeback thread may compete locks with gc thread and makes garbage collection slower, which results a longer I/O freeze period. This patch checks c->gc_mark_valid in set_at_max_writeback_rate(). If c->gc_mark_valid is 0 (gc running), set_at_max_writeback_rate() returns false, then update_writeback_rate() will not set writeback rate to maximum value even c->idle_counter reaches an idle threshold. Now writeback thread won't interfere gc thread performance. Signed-off-by: Coly Li --- drivers/md/bcache/writeback.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 73f0efac2b9f..262f7ef20992 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -122,6 +122,9 @@ static void __update_writeback_rate(struct cached_dev *dc) static bool set_at_max_writeback_rate(struct cache_set *c, struct cached_dev *dc) { + /* Don't set max writeback rate if gc is running */ + if (!c->gc_mark_valid) + return false; /* * Idle_counter is increased everytime when update_writeback_rate() is * called. If all backing devices attached to the same cache set have From patchwork Fri Jun 28 11:59:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022025 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D065C14C0 for ; Fri, 28 Jun 2019 12:00:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C15EF26AE3 for ; Fri, 28 Jun 2019 12:00:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B5C93287E5; Fri, 28 Jun 2019 12:00:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 128B426AE3 for ; Fri, 28 Jun 2019 12:00:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726646AbfF1MAU (ORCPT ); Fri, 28 Jun 2019 08:00:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:54106 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726640AbfF1MAU (ORCPT ); Fri, 28 Jun 2019 08:00:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B63ECB631; Fri, 28 Jun 2019 12:00:18 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 02/37] bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush() Date: Fri, 28 Jun 2019 19:59:25 +0800 Message-Id: <20190628120000.40753-3-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When system memory is in heavy pressure, bch_gc_thread_start() from run_cache_set() may fail due to out of memory. In such condition, c->gc_thread is assigned to -ENOMEM, not NULL pointer. Then in following failure code path bch_cache_set_error(), when cache_set_flush() gets called, the code piece to stop c->gc_thread is broken, if (!IS_ERR_OR_NULL(c->gc_thread)) kthread_stop(c->gc_thread); And KASAN catches such NULL pointer deference problem, with the warning information: [ 561.207881] ================================================================== [ 561.207900] BUG: KASAN: null-ptr-deref in kthread_stop+0x3b/0x440 [ 561.207904] Write of size 4 at addr 000000000000001c by task kworker/15:1/313 [ 561.207913] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G W 5.0.0-vanilla+ #3 [ 561.207916] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019 [ 561.207935] Workqueue: events cache_set_flush [bcache] [ 561.207940] Call Trace: [ 561.207948] dump_stack+0x9a/0xeb [ 561.207955] ? kthread_stop+0x3b/0x440 [ 561.207960] ? kthread_stop+0x3b/0x440 [ 561.207965] kasan_report+0x176/0x192 [ 561.207973] ? kthread_stop+0x3b/0x440 [ 561.207981] kthread_stop+0x3b/0x440 [ 561.207995] cache_set_flush+0xd4/0x6d0 [bcache] [ 561.208008] process_one_work+0x856/0x1620 [ 561.208015] ? find_held_lock+0x39/0x1d0 [ 561.208028] ? drain_workqueue+0x380/0x380 [ 561.208048] worker_thread+0x87/0xb80 [ 561.208058] ? __kthread_parkme+0xb6/0x180 [ 561.208067] ? process_one_work+0x1620/0x1620 [ 561.208072] kthread+0x326/0x3e0 [ 561.208079] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 561.208090] ret_from_fork+0x3a/0x50 [ 561.208110] ================================================================== [ 561.208113] Disabling lock debugging due to kernel taint [ 561.208115] irq event stamp: 11800231 [ 561.208126] hardirqs last enabled at (11800231): [] do_syscall_64+0x18/0x410 [ 561.208127] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c [ 561.208129] #PF error: [WRITE] [ 561.312253] hardirqs last disabled at (11800230): [] trace_hardirqs_off_thunk+0x1a/0x1c [ 561.312259] softirqs last enabled at (11799832): [] __do_softirq+0x5c7/0x8c3 [ 561.405975] PGD 0 P4D 0 [ 561.442494] softirqs last disabled at (11799821): [] irq_exit+0x1ac/0x1e0 [ 561.791359] Oops: 0002 [#1] SMP KASAN NOPTI [ 561.791362] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G B W 5.0.0-vanilla+ #3 [ 561.791363] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019 [ 561.791371] Workqueue: events cache_set_flush [bcache] [ 561.791374] RIP: 0010:kthread_stop+0x3b/0x440 [ 561.791376] Code: 00 00 65 8b 05 26 d5 e0 7c 89 c0 48 0f a3 05 ec aa df 02 0f 82 dc 02 00 00 4c 8d 63 20 be 04 00 00 00 4c 89 e7 e8 65 c5 53 00 ff 43 20 48 8d 7b 24 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 [ 561.791377] RSP: 0018:ffff88872fc8fd10 EFLAGS: 00010286 [ 561.838895] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838916] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838934] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838948] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838966] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838979] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838996] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 563.067028] RAX: 0000000000000000 RBX: fffffffffffffffc RCX: ffffffff832dd314 [ 563.067030] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000297 [ 563.067032] RBP: ffff88872fc8fe88 R08: fffffbfff0b8213d R09: fffffbfff0b8213d [ 563.067034] R10: 0000000000000001 R11: fffffbfff0b8213c R12: 000000000000001c [ 563.408618] R13: ffff88dc61cc0f68 R14: ffff888102b94900 R15: ffff88dc61cc0f68 [ 563.408620] FS: 0000000000000000(0000) GS:ffff888f7dc00000(0000) knlGS:0000000000000000 [ 563.408622] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 563.408623] CR2: 000000000000001c CR3: 0000000f48a1a004 CR4: 00000000007606e0 [ 563.408625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 563.408627] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 563.904795] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 563.915796] PKRU: 55555554 [ 563.915797] Call Trace: [ 563.915807] cache_set_flush+0xd4/0x6d0 [bcache] [ 563.915812] process_one_work+0x856/0x1620 [ 564.001226] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 564.033563] ? find_held_lock+0x39/0x1d0 [ 564.033567] ? drain_workqueue+0x380/0x380 [ 564.033574] worker_thread+0x87/0xb80 [ 564.062823] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 564.118042] ? __kthread_parkme+0xb6/0x180 [ 564.118046] ? process_one_work+0x1620/0x1620 [ 564.118048] kthread+0x326/0x3e0 [ 564.118050] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 564.167066] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 564.252441] ret_from_fork+0x3a/0x50 [ 564.252447] Modules linked in: msr rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib i40iw configfs iw_cm ib_cm libiscsi scsi_transport_iscsi mlx4_ib ib_uverbs mlx4_en ib_core nls_iso8859_1 nls_cp437 vfat fat intel_rapl skx_edac x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ses raid0 aesni_intel cdc_ether enclosure usbnet ipmi_ssif joydev aes_x86_64 i40e scsi_transport_sas mii bcache md_mod crypto_simd mei_me ioatdma crc64 ptp cryptd pcspkr i2c_i801 mlx4_core glue_helper pps_core mei lpc_ich dca wmi ipmi_si ipmi_devintf nd_pmem dax_pmem nd_btt ipmi_msghandler device_dax pcc_cpufreq button hid_generic usbhid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt fb_sys_fops xhci_hcd ttm megaraid_sas drm usbcore nfit libnvdimm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs [ 564.299390] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 564.348360] CR2: 000000000000001c [ 564.348362] ---[ end trace b7f0e5cc7b2103b0 ]--- Therefore, it is not enough to only check whether c->gc_thread is NULL, we should use IS_ERR_OR_NULL() to check both NULL pointer and error value. This patch changes the above buggy code piece in this way, if (!IS_ERR_OR_NULL(c->gc_thread)) kthread_stop(c->gc_thread); Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 1b63ac876169..64d9de89a63f 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1564,7 +1564,7 @@ static void cache_set_flush(struct closure *cl) kobject_put(&c->internal); kobject_del(&c->kobj); - if (c->gc_thread) + if (!IS_ERR_OR_NULL(c->gc_thread)) kthread_stop(c->gc_thread); if (!IS_ERR_OR_NULL(c->root)) From patchwork Fri Jun 28 11:59:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022027 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7125D924 for ; Fri, 28 Jun 2019 12:00:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6452826AE3 for ; Fri, 28 Jun 2019 12:00:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 54B2D287E5; Fri, 28 Jun 2019 12:00:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EBBD226AE3 for ; Fri, 28 Jun 2019 12:00:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726709AbfF1MAX (ORCPT ); Fri, 28 Jun 2019 08:00:23 -0400 Received: from mx2.suse.de ([195.135.220.15]:54120 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726640AbfF1MAX (ORCPT ); Fri, 28 Jun 2019 08:00:23 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2FEFDB632; Fri, 28 Jun 2019 12:00:22 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 03/37] bcache: fix return value error in bch_journal_read() Date: Fri, 28 Jun 2019 19:59:26 +0800 Message-Id: <20190628120000.40753-4-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When everything is OK in bch_journal_read(), finally the return value is returned by, return ret; which assumes ret will be 0 here. This assumption is wrong when all journal buckets as are full and filled with valid journal entries. In such cache the last location referencess read_bucket() sets 'ret' to 1, which means new jset added into jset list. The jset list is list 'journal' in caller run_cache_set(). Return 1 to run_cache_set() means something wrong and the cache set won't start, but indeed everything is OK. This patch changes the line at end of bch_journal_read() to directly return 0 since everything if verything is good. Then a bogus error is fixed. Signed-off-by: Coly Li --- drivers/md/bcache/journal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 12dae9348147..4e5fc05720fc 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -268,7 +268,7 @@ int bch_journal_read(struct cache_set *c, struct list_head *list) struct journal_replay, list)->j.seq; - return ret; + return 0; #undef read_bucket } From patchwork Fri Jun 28 11:59:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022029 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 731C514C0 for ; Fri, 28 Jun 2019 12:00:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 632E12793A for ; Fri, 28 Jun 2019 12:00:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 53DFF285CC; Fri, 28 Jun 2019 12:00:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FF7E2793A for ; Fri, 28 Jun 2019 12:00:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726828AbfF1MA2 (ORCPT ); Fri, 28 Jun 2019 08:00:28 -0400 Received: from mx2.suse.de ([195.135.220.15]:54138 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726837AbfF1MA2 (ORCPT ); Fri, 28 Jun 2019 08:00:28 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 23730B631; Fri, 28 Jun 2019 12:00:27 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , stable@vger.kernel.org Subject: [PATCH 04/37] Revert "bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()" Date: Fri, 28 Jun 2019 19:59:27 +0800 Message-Id: <20190628120000.40753-5-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This reverts commit 6147305c73e4511ca1a975b766b97a779d442567. Although this patch helps the failed bcache device to stop faster when too many I/O errors detected on corresponding cached device, setting CACHE_SET_IO_DISABLE bit to cache set c->flags was not a good idea. This operation will disable all I/Os on cache set, which means other attached bcache devices won't work neither. Without this patch, the failed bcache device can also be stopped eventually if internal I/O accomplished (e.g. writeback). Therefore here I revert it. Fixes: 6147305c73e4 ("bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()") Reported-by: Yong Li Signed-off-by: Coly Li Cc: stable@vger.kernel.org --- drivers/md/bcache/super.c | 17 ----------------- 1 file changed, 17 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 64d9de89a63f..ba2ad093bc80 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1437,8 +1437,6 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size) bool bch_cached_dev_error(struct cached_dev *dc) { - struct cache_set *c; - if (!dc || test_bit(BCACHE_DEV_CLOSING, &dc->disk.flags)) return false; @@ -1449,21 +1447,6 @@ bool bch_cached_dev_error(struct cached_dev *dc) pr_err("stop %s: too many IO errors on backing device %s\n", dc->disk.disk->disk_name, dc->backing_dev_name); - /* - * If the cached device is still attached to a cache set, - * even dc->io_disable is true and no more I/O requests - * accepted, cache device internal I/O (writeback scan or - * garbage collection) may still prevent bcache device from - * being stopped. So here CACHE_SET_IO_DISABLE should be - * set to c->flags too, to make the internal I/O to cache - * device rejected and stopped immediately. - * If c is NULL, that means the bcache device is not attached - * to any cache set, then no CACHE_SET_IO_DISABLE bit to set. - */ - c = dc->disk.c; - if (c && test_and_set_bit(CACHE_SET_IO_DISABLE, &c->flags)) - pr_info("CACHE_SET_IO_DISABLE already set"); - bcache_device_stop(&dc->disk); return true; } From patchwork Fri Jun 28 11:59:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022031 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 817CA14C0 for ; Fri, 28 Jun 2019 12:00:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 734452793A for ; Fri, 28 Jun 2019 12:00:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 674D6285CC; Fri, 28 Jun 2019 12:00:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C2FC2793A for ; Fri, 28 Jun 2019 12:00:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726894AbfF1MAb (ORCPT ); Fri, 28 Jun 2019 08:00:31 -0400 Received: from mx2.suse.de ([195.135.220.15]:54150 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726837AbfF1MAb (ORCPT ); Fri, 28 Jun 2019 08:00:31 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2941BB631; Fri, 28 Jun 2019 12:00:30 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 05/37] bcache: avoid flushing btree node in cache_set_flush() if io disabled Date: Fri, 28 Jun 2019 19:59:28 +0800 Message-Id: <20190628120000.40753-6-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When cache_set_flush() is called for too many I/O errors detected on cache device and the cache set is retiring, inside the function it doesn't make sense to flushing cached btree nodes from c->btree_cache because CACHE_SET_IO_DISABLE is set on c->flags already and all I/Os onto cache device will be rejected. This patch checks in cache_set_flush() that whether CACHE_SET_IO_DISABLE is set. If yes, then avoids to flush the cached btree nodes to reduce more time and make cache set retiring more faster. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index ba2ad093bc80..dc6702c2c4b6 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1553,13 +1553,17 @@ static void cache_set_flush(struct closure *cl) if (!IS_ERR_OR_NULL(c->root)) list_add(&c->root->list, &c->btree_cache); - /* Should skip this if we're unregistering because of an error */ - list_for_each_entry(b, &c->btree_cache, list) { - mutex_lock(&b->write_lock); - if (btree_node_dirty(b)) - __bch_btree_node_write(b, NULL); - mutex_unlock(&b->write_lock); - } + /* + * Avoid flushing cached nodes if cache set is retiring + * due to too many I/O errors detected. + */ + if (!test_bit(CACHE_SET_IO_DISABLE, &c->flags)) + list_for_each_entry(b, &c->btree_cache, list) { + mutex_lock(&b->write_lock); + if (btree_node_dirty(b)) + __bch_btree_node_write(b, NULL); + mutex_unlock(&b->write_lock); + } for_each_cache(ca, c, i) if (ca->alloc_thread) From patchwork Fri Jun 28 11:59:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022033 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CFCCB924 for ; Fri, 28 Jun 2019 12:00:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BE5EE2793A for ; Fri, 28 Jun 2019 12:00:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B224D285CC; Fri, 28 Jun 2019 12:00:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 011312793A for ; Fri, 28 Jun 2019 12:00:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726901AbfF1MAf (ORCPT ); Fri, 28 Jun 2019 08:00:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:54192 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726837AbfF1MAf (ORCPT ); Fri, 28 Jun 2019 08:00:35 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0ECE1B631; Fri, 28 Jun 2019 12:00:34 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , stable@vger.kernel.org Subject: [PATCH 06/37] bcache: ignore read-ahead request failure on backing device Date: Fri, 28 Jun 2019 19:59:29 +0800 Message-Id: <20190628120000.40753-7-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When md raid device (e.g. raid456) is used as backing device, read-ahead requests on a degrading and recovering md raid device might be failured immediately by md raid code, but indeed this md raid array can still be read or write for normal I/O requests. Therefore such failed read-ahead request are not real hardware failure. Further more, after degrading and recovering accomplished, read-ahead requests will be handled by md raid array again. For such condition, I/O failures of read-ahead requests don't indicate real health status (because normal I/O still be served), they should not be counted into I/O error counter dc->io_errors. Since there is no simple way to detect whether the backing divice is a md raid device, this patch simply ignores I/O failures for read-ahead bios on backing device, to avoid bogus backing device failure on a degrading md raid array. Suggested-and-tested-by: Thorsten Knabe Signed-off-by: Coly Li Cc: stable@vger.kernel.org --- drivers/md/bcache/io.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c index c25097968319..4d93f07f63e5 100644 --- a/drivers/md/bcache/io.c +++ b/drivers/md/bcache/io.c @@ -58,6 +58,18 @@ void bch_count_backing_io_errors(struct cached_dev *dc, struct bio *bio) WARN_ONCE(!dc, "NULL pointer of struct cached_dev"); + /* + * Read-ahead requests on a degrading and recovering md raid + * (e.g. raid6) device might be failured immediately by md + * raid code, which is not a real hardware media failure. So + * we shouldn't count failed REQ_RAHEAD bio to dc->io_errors. + */ + if (bio->bi_opf & REQ_RAHEAD) { + pr_warn_ratelimited("%s: Read-ahead I/O failed on backing device, ignore", + dc->backing_dev_name); + return; + } + errors = atomic_add_return(1, &dc->io_errors); if (errors < dc->error_limit) pr_err("%s: IO error on backing device, unrecoverable", From patchwork Fri Jun 28 11:59:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022035 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 93729924 for ; Fri, 28 Jun 2019 12:00:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 789C7285CC for ; Fri, 28 Jun 2019 12:00:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6C84228437; Fri, 28 Jun 2019 12:00:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EFD0428437 for ; Fri, 28 Jun 2019 12:00:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726909AbfF1MAj (ORCPT ); Fri, 28 Jun 2019 08:00:39 -0400 Received: from mx2.suse.de ([195.135.220.15]:54244 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726837AbfF1MAj (ORCPT ); Fri, 28 Jun 2019 08:00:39 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2C4A9B631; Fri, 28 Jun 2019 12:00:38 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 07/37] bcache: add io error counting in write_bdev_super_endio() Date: Fri, 28 Jun 2019 19:59:30 +0800 Message-Id: <20190628120000.40753-8-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When backing device super block is written by bch_write_bdev_super(), the bio complete callback write_bdev_super_endio() simply ignores I/O status. Indeed such write request also contribute to backing device health status if the request failed. This patch checkes bio->bi_status in write_bdev_super_endio(), if there is error, bch_count_backing_io_errors() will be called to count an I/O error to dc->io_errors. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index dc6702c2c4b6..73466bda12a7 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -197,7 +197,9 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev, static void write_bdev_super_endio(struct bio *bio) { struct cached_dev *dc = bio->bi_private; - /* XXX: error checking */ + + if (bio->bi_status) + bch_count_backing_io_errors(dc, bio); closure_put(&dc->sb_write); } From patchwork Fri Jun 28 11:59:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022037 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BA41214C0 for ; Fri, 28 Jun 2019 12:00:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC1812793A for ; Fri, 28 Jun 2019 12:00:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A08D928437; Fri, 28 Jun 2019 12:00:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B62E285CC for ; Fri, 28 Jun 2019 12:00:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726916AbfF1MAn (ORCPT ); Fri, 28 Jun 2019 08:00:43 -0400 Received: from mx2.suse.de ([195.135.220.15]:54272 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726837AbfF1MAn (ORCPT ); Fri, 28 Jun 2019 08:00:43 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id E6AD9B627; Fri, 28 Jun 2019 12:00:41 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 08/37] bcache: remove unnecessary prefetch() in bset_search_tree() Date: Fri, 28 Jun 2019 19:59:31 +0800 Message-Id: <20190628120000.40753-9-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In function bset_search_tree(), when p >= t->size, t->tree[0] will be prefetched by the following code piece, 974 unsigned int p = n << 4; 975 976 p &= ((int) (p - t->size)) >> 31; 977 978 prefetch(&t->tree[p]); The purpose of the above code is to avoid a branch instruction, but when p >= t->size, prefetch(&t->tree[0]) has no positive performance contribution at all. This patch avoids the unncessary prefetch by only calling prefetch() when p < t->size. Signed-off-by: Coly Li --- drivers/md/bcache/bset.c | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c index 268f1b685084..e36a108d3648 100644 --- a/drivers/md/bcache/bset.c +++ b/drivers/md/bcache/bset.c @@ -970,22 +970,10 @@ static struct bset_search_iter bset_search_tree(struct bset_tree *t, unsigned int inorder, j, n = 1; do { - /* - * A bit trick here. - * If p < t->size, (int)(p - t->size) is a minus value and - * the most significant bit is set, right shifting 31 bits - * gets 1. If p >= t->size, the most significant bit is - * not set, right shifting 31 bits gets 0. - * So the following 2 lines equals to - * if (p >= t->size) - * p = 0; - * but a branch instruction is avoided. - */ unsigned int p = n << 4; - p &= ((int) (p - t->size)) >> 31; - - prefetch(&t->tree[p]); + if (p < t->size) + prefetch(&t->tree[p]); j = n; f = &t->tree[j]; From patchwork Fri Jun 28 11:59:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022039 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E4C9E14C0 for ; Fri, 28 Jun 2019 12:00:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D562E2793A for ; Fri, 28 Jun 2019 12:00:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C96AD28609; Fri, 28 Jun 2019 12:00:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 73BCB2793A for ; Fri, 28 Jun 2019 12:00:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726920AbfF1MAr (ORCPT ); Fri, 28 Jun 2019 08:00:47 -0400 Received: from mx2.suse.de ([195.135.220.15]:54296 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726837AbfF1MAr (ORCPT ); Fri, 28 Jun 2019 08:00:47 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 966C7B627; Fri, 28 Jun 2019 12:00:46 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Alexandru Ardelean , Coly Li Subject: [PATCH 09/37] bcache: use sysfs_match_string() instead of __sysfs_match_string() Date: Fri, 28 Jun 2019 19:59:32 +0800 Message-Id: <20190628120000.40753-10-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Alexandru Ardelean The arrays (of strings) that are passed to __sysfs_match_string() are static, so use sysfs_match_string() which does an implicit ARRAY_SIZE() over these arrays. Functionally, this doesn't change anything. The change is more cosmetic. It only shrinks the static arrays by 1 byte each. Signed-off-by: Alexandru Ardelean Signed-off-by: Coly Li --- drivers/md/bcache/sysfs.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index bfb437ffb13c..760cf8951338 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -21,28 +21,24 @@ static const char * const bch_cache_modes[] = { "writethrough", "writeback", "writearound", - "none", - NULL + "none" }; /* Default is 0 ("auto") */ static const char * const bch_stop_on_failure_modes[] = { "auto", - "always", - NULL + "always" }; static const char * const cache_replacement_policies[] = { "lru", "fifo", - "random", - NULL + "random" }; static const char * const error_actions[] = { "unregister", - "panic", - NULL + "panic" }; write_attribute(attach); @@ -333,7 +329,7 @@ STORE(__cached_dev) bch_cached_dev_run(dc); if (attr == &sysfs_cache_mode) { - v = __sysfs_match_string(bch_cache_modes, -1, buf); + v = sysfs_match_string(bch_cache_modes, buf); if (v < 0) return v; @@ -344,7 +340,7 @@ STORE(__cached_dev) } if (attr == &sysfs_stop_when_cache_set_failed) { - v = __sysfs_match_string(bch_stop_on_failure_modes, -1, buf); + v = sysfs_match_string(bch_stop_on_failure_modes, buf); if (v < 0) return v; @@ -799,7 +795,7 @@ STORE(__bch_cache_set) 0, UINT_MAX); if (attr == &sysfs_errors) { - v = __sysfs_match_string(error_actions, -1, buf); + v = sysfs_match_string(error_actions, buf); if (v < 0) return v; @@ -1063,7 +1059,7 @@ STORE(__bch_cache) } if (attr == &sysfs_cache_replacement_policy) { - v = __sysfs_match_string(cache_replacement_policies, -1, buf); + v = sysfs_match_string(cache_replacement_policies, buf); if (v < 0) return v; From patchwork Fri Jun 28 11:59:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022041 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E478924 for ; Fri, 28 Jun 2019 12:00:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2DA522793A for ; Fri, 28 Jun 2019 12:00:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B48A285CC; Fri, 28 Jun 2019 12:00:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05A002793A for ; Fri, 28 Jun 2019 12:00:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726942AbfF1MAx (ORCPT ); Fri, 28 Jun 2019 08:00:53 -0400 Received: from mx2.suse.de ([195.135.220.15]:54318 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726837AbfF1MAw (ORCPT ); Fri, 28 Jun 2019 08:00:52 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id EB082B627; Fri, 28 Jun 2019 12:00:50 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 10/37] bcache: add return value check to bch_cached_dev_run() Date: Fri, 28 Jun 2019 19:59:33 +0800 Message-Id: <20190628120000.40753-11-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch adds return value check to bch_cached_dev_run(), now if there is error happens inside bch_cached_dev_run(), it can be catched. Signed-off-by: Coly Li --- drivers/md/bcache/bcache.h | 2 +- drivers/md/bcache/super.c | 33 ++++++++++++++++++++++++++------- drivers/md/bcache/sysfs.c | 7 +++++-- 3 files changed, 32 insertions(+), 10 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index fdf75352e16a..73a97586a2ef 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -1006,7 +1006,7 @@ int bch_flash_dev_create(struct cache_set *c, uint64_t size); int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c, uint8_t *set_uuid); void bch_cached_dev_detach(struct cached_dev *dc); -void bch_cached_dev_run(struct cached_dev *dc); +int bch_cached_dev_run(struct cached_dev *dc); void bcache_device_stop(struct bcache_device *d); void bch_cache_set_unregister(struct cache_set *c); diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 73466bda12a7..0abee44092bf 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -910,7 +910,7 @@ static int cached_dev_status_update(void *arg) } -void bch_cached_dev_run(struct cached_dev *dc) +int bch_cached_dev_run(struct cached_dev *dc) { struct bcache_device *d = &dc->disk; char *buf = kmemdup_nul(dc->sb.label, SB_LABEL_SIZE, GFP_KERNEL); @@ -921,11 +921,14 @@ void bch_cached_dev_run(struct cached_dev *dc) NULL, }; + if (dc->io_disable) + return -EIO; + if (atomic_xchg(&dc->running, 1)) { kfree(env[1]); kfree(env[2]); kfree(buf); - return; + return -EBUSY; } if (!d->c && @@ -951,8 +954,11 @@ void bch_cached_dev_run(struct cached_dev *dc) kfree(buf); if (sysfs_create_link(&d->kobj, &disk_to_dev(d->disk)->kobj, "dev") || - sysfs_create_link(&disk_to_dev(d->disk)->kobj, &d->kobj, "bcache")) + sysfs_create_link(&disk_to_dev(d->disk)->kobj, + &d->kobj, "bcache")) { pr_debug("error creating sysfs link"); + return -ENOMEM; + } dc->status_update_thread = kthread_run(cached_dev_status_update, dc, "bcache_status_update"); @@ -961,6 +967,8 @@ void bch_cached_dev_run(struct cached_dev *dc) "continue to run without monitoring backing " "device status"); } + + return 0; } /* @@ -1056,6 +1064,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c, uint32_t rtime = cpu_to_le32((u32)ktime_get_real_seconds()); struct uuid_entry *u; struct cached_dev *exist_dc, *t; + int ret = 0; if ((set_uuid && memcmp(set_uuid, c->sb.set_uuid, 16)) || (!set_uuid && memcmp(dc->sb.set_uuid, c->sb.set_uuid, 16))) @@ -1165,7 +1174,12 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c, bch_sectors_dirty_init(&dc->disk); - bch_cached_dev_run(dc); + ret = bch_cached_dev_run(dc); + if (ret && (ret != -EBUSY)) { + up_write(&dc->writeback_lock); + return ret; + } + bcache_device_link(&dc->disk, c, "bdev"); atomic_inc(&c->attached_dev_nr); @@ -1292,6 +1306,7 @@ static int register_bdev(struct cache_sb *sb, struct page *sb_page, { const char *err = "cannot allocate memory"; struct cache_set *c; + int ret = -ENOMEM; bdevname(bdev, dc->backing_dev_name); memcpy(&dc->sb, sb, sizeof(struct cache_sb)); @@ -1321,14 +1336,18 @@ static int register_bdev(struct cache_sb *sb, struct page *sb_page, bch_cached_dev_attach(dc, c, NULL); if (BDEV_STATE(&dc->sb) == BDEV_STATE_NONE || - BDEV_STATE(&dc->sb) == BDEV_STATE_STALE) - bch_cached_dev_run(dc); + BDEV_STATE(&dc->sb) == BDEV_STATE_STALE) { + err = "failed to run cached device"; + ret = bch_cached_dev_run(dc); + if (ret) + goto err; + } return 0; err: pr_notice("error %s: %s", dc->backing_dev_name, err); bcache_device_stop(&dc->disk); - return -EIO; + return ret; } /* Flash only volumes */ diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index 760cf8951338..eb678e43ac00 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -325,8 +325,11 @@ STORE(__cached_dev) bch_cache_accounting_clear(&dc->accounting); if (attr == &sysfs_running && - strtoul_or_return(buf)) - bch_cached_dev_run(dc); + strtoul_or_return(buf)) { + v = bch_cached_dev_run(dc); + if (v) + return v; + } if (attr == &sysfs_cache_mode) { v = sysfs_match_string(bch_cache_modes, buf); From patchwork Fri Jun 28 11:59:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022043 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 234F814C0 for ; Fri, 28 Jun 2019 12:00:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 15C012793A for ; Fri, 28 Jun 2019 12:00:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 09ECC285CC; Fri, 28 Jun 2019 12:00:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A34DD2793A for ; Fri, 28 Jun 2019 12:00:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726946AbfF1MA4 (ORCPT ); Fri, 28 Jun 2019 08:00:56 -0400 Received: from mx2.suse.de ([195.135.220.15]:54364 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726514AbfF1MA4 (ORCPT ); Fri, 28 Jun 2019 08:00:56 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 87C7AB627; Fri, 28 Jun 2019 12:00:55 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 11/37] bcache: remove unncessary code in bch_btree_keys_init() Date: Fri, 28 Jun 2019 19:59:34 +0800 Message-Id: <20190628120000.40753-12-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Function bch_btree_keys_init() initializes b->set[].size and b->set[].data to zero. As the code comments indicates, these code indeed is unncessary, because both struct btree_keys and struct bset_tree are nested embedded into struct btree, when struct btree is filled with 0 bits by kzalloc() in mca_bucket_alloc(), b->set[].size and b->set[].data are initialized to 0 (a.k.a NULL) already. This patch removes the redundant code, and add comments in bch_btree_keys_init() and mca_bucket_alloc() to explain why it's safe. Signed-off-by: Coly Li --- drivers/md/bcache/bset.c | 15 ++++++--------- drivers/md/bcache/btree.c | 4 ++++ 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c index e36a108d3648..8af9509e78bd 100644 --- a/drivers/md/bcache/bset.c +++ b/drivers/md/bcache/bset.c @@ -347,22 +347,19 @@ EXPORT_SYMBOL(bch_btree_keys_alloc); void bch_btree_keys_init(struct btree_keys *b, const struct btree_keys_ops *ops, bool *expensive_debug_checks) { - unsigned int i; - b->ops = ops; b->expensive_debug_checks = expensive_debug_checks; b->nsets = 0; b->last_set_unwritten = 0; - /* XXX: shouldn't be needed */ - for (i = 0; i < MAX_BSETS; i++) - b->set[i].size = 0; /* - * Second loop starts at 1 because b->keys[0]->data is the memory we - * allocated + * struct btree_keys in embedded in struct btree, and struct + * bset_tree is embedded into struct btree_keys. They are all + * initialized as 0 by kzalloc() in mca_bucket_alloc(), and + * b->set[0].data is allocated in bch_btree_keys_alloc(), so we + * don't have to initiate b->set[].size and b->set[].data here + * any more. */ - for (i = 1; i < MAX_BSETS; i++) - b->set[i].data = NULL; } EXPORT_SYMBOL(bch_btree_keys_init); diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 773f5fdad25f..cf38a1b031fa 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -613,6 +613,10 @@ static void mca_data_alloc(struct btree *b, struct bkey *k, gfp_t gfp) static struct btree *mca_bucket_alloc(struct cache_set *c, struct bkey *k, gfp_t gfp) { + /* + * kzalloc() is necessary here for initialization, + * see code comments in bch_btree_keys_init(). + */ struct btree *b = kzalloc(sizeof(struct btree), gfp); if (!b) From patchwork Fri Jun 28 11:59:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022061 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E9BC14C0 for ; Fri, 28 Jun 2019 12:01:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6134328437 for ; Fri, 28 Jun 2019 12:01:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 55C3F287E5; Fri, 28 Jun 2019 12:01:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0231D287E3 for ; Fri, 28 Jun 2019 12:01:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726749AbfF1MBC (ORCPT ); Fri, 28 Jun 2019 08:01:02 -0400 Received: from mx2.suse.de ([195.135.220.15]:54388 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726524AbfF1MBA (ORCPT ); Fri, 28 Jun 2019 08:01:00 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 3A2A5B62B; Fri, 28 Jun 2019 12:00:59 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 12/37] bcache: check CACHE_SET_IO_DISABLE in allocator code Date: Fri, 28 Jun 2019 19:59:35 +0800 Message-Id: <20190628120000.40753-13-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If CACHE_SET_IO_DISABLE of a cache set flag is set by too many I/O errors, currently allocator routines can still continue allocate space which may introduce inconsistent metadata state. This patch checkes CACHE_SET_IO_DISABLE bit in following allocator routines, - bch_bucket_alloc() - __bch_bucket_alloc_set() Once CACHE_SET_IO_DISABLE is set on cache set, the allocator routines may reject allocation request earlier to avoid potential inconsistent metadata. Signed-off-by: Coly Li --- drivers/md/bcache/alloc.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c index f8986effcb50..6f776823b9ba 100644 --- a/drivers/md/bcache/alloc.c +++ b/drivers/md/bcache/alloc.c @@ -393,6 +393,11 @@ long bch_bucket_alloc(struct cache *ca, unsigned int reserve, bool wait) struct bucket *b; long r; + + /* No allocation if CACHE_SET_IO_DISABLE bit is set */ + if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &ca->set->flags))) + return -1; + /* fastpath */ if (fifo_pop(&ca->free[RESERVE_NONE], r) || fifo_pop(&ca->free[reserve], r)) @@ -484,6 +489,10 @@ int __bch_bucket_alloc_set(struct cache_set *c, unsigned int reserve, { int i; + /* No allocation if CACHE_SET_IO_DISABLE bit is set */ + if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags))) + return -1; + lockdep_assert_held(&c->bucket_lock); BUG_ON(!n || n > c->caches_loaded || n > MAX_CACHES_PER_SET); From patchwork Fri Jun 28 11:59:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022065 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D2AA8924 for ; Fri, 28 Jun 2019 12:01:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C48AE28437 for ; Fri, 28 Jun 2019 12:01:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B8D8F287E5; Fri, 28 Jun 2019 12:01:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6348D28437 for ; Fri, 28 Jun 2019 12:01:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726970AbfF1MBE (ORCPT ); Fri, 28 Jun 2019 08:01:04 -0400 Received: from mx2.suse.de ([195.135.220.15]:54408 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726957AbfF1MBD (ORCPT ); Fri, 28 Jun 2019 08:01:03 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C30C7B627; Fri, 28 Jun 2019 12:01:02 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 13/37] bcache: check CACHE_SET_IO_DISABLE bit in bch_journal() Date: Fri, 28 Jun 2019 19:59:36 +0800 Message-Id: <20190628120000.40753-14-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When too many I/O errors happen on cache set and CACHE_SET_IO_DISABLE bit is set, bch_journal() may continue to work because the journaling bkey might be still in write set yet. The caller of bch_journal() may believe the journal still work but the truth is in-memory journal write set won't be written into cache device any more. This behavior may introduce potential inconsistent metadata status. This patch checks CACHE_SET_IO_DISABLE bit at the head of bch_journal(), if the bit is set, bch_journal() returns NULL immediately to notice caller to know journal does not work. Signed-off-by: Coly Li --- drivers/md/bcache/journal.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 4e5fc05720fc..54f8886b6177 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -811,6 +811,10 @@ atomic_t *bch_journal(struct cache_set *c, struct journal_write *w; atomic_t *ret; + /* No journaling if CACHE_SET_IO_DISABLE set already */ + if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags))) + return NULL; + if (!CACHE_SYNC(&c->sb)) return NULL; From patchwork Fri Jun 28 11:59:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022067 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4955A14C0 for ; Fri, 28 Jun 2019 12:01:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3BD8828437 for ; Fri, 28 Jun 2019 12:01:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 302C5287E5; Fri, 28 Jun 2019 12:01:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D8BA128437 for ; Fri, 28 Jun 2019 12:01:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726974AbfF1MBI (ORCPT ); Fri, 28 Jun 2019 08:01:08 -0400 Received: from mx2.suse.de ([195.135.220.15]:54428 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726957AbfF1MBI (ORCPT ); Fri, 28 Jun 2019 08:01:08 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id EEA91B62B; Fri, 28 Jun 2019 12:01:06 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 14/37] bcache: more detailed error message to bcache_device_link() Date: Fri, 28 Jun 2019 19:59:37 +0800 Message-Id: <20190628120000.40753-15-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch adds more accurate error message for specific ssyfs_create_link() call, to help debugging failure during bcache device start tup. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 0abee44092bf..d4d8d1300faf 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -693,6 +693,7 @@ static void bcache_device_link(struct bcache_device *d, struct cache_set *c, { unsigned int i; struct cache *ca; + int ret; for_each_cache(ca, d->c, i) bd_link_disk_holder(ca->bdev, d->disk); @@ -700,9 +701,13 @@ static void bcache_device_link(struct bcache_device *d, struct cache_set *c, snprintf(d->name, BCACHEDEVNAME_SIZE, "%s%u", name, d->id); - WARN(sysfs_create_link(&d->kobj, &c->kobj, "cache") || - sysfs_create_link(&c->kobj, &d->kobj, d->name), - "Couldn't create device <-> cache set symlinks"); + ret = sysfs_create_link(&d->kobj, &c->kobj, "cache"); + if (ret < 0) + pr_err("Couldn't create device -> cache set symlink"); + + ret = sysfs_create_link(&c->kobj, &d->kobj, d->name); + if (ret < 0) + pr_err("Couldn't create cache set -> device symlink"); clear_bit(BCACHE_DEV_UNLINK_DONE, &d->flags); } From patchwork Fri Jun 28 11:59:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022069 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AA60D14C0 for ; Fri, 28 Jun 2019 12:01:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B8C028437 for ; Fri, 28 Jun 2019 12:01:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8FE04287E5; Fri, 28 Jun 2019 12:01:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4850528437 for ; Fri, 28 Jun 2019 12:01:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726977AbfF1MBL (ORCPT ); Fri, 28 Jun 2019 08:01:11 -0400 Received: from mx2.suse.de ([195.135.220.15]:54486 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726957AbfF1MBL (ORCPT ); Fri, 28 Jun 2019 08:01:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5E0E3B627; Fri, 28 Jun 2019 12:01:10 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 15/37] bcache: add more error message in bch_cached_dev_attach() Date: Fri, 28 Jun 2019 19:59:38 +0800 Message-Id: <20190628120000.40753-16-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch adds more error message for attaching cached device, this is helpful to debug code failure during bache device start up. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index d4d8d1300faf..a836910ef368 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1169,6 +1169,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c, down_write(&dc->writeback_lock); if (bch_cached_dev_writeback_start(dc)) { up_write(&dc->writeback_lock); + pr_err("Couldn't start writeback facilities for %s", + dc->disk.disk->disk_name); return -ENOMEM; } @@ -1182,6 +1184,8 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c, ret = bch_cached_dev_run(dc); if (ret && (ret != -EBUSY)) { up_write(&dc->writeback_lock); + pr_err("Couldn't run cached device %s", + dc->backing_dev_name); return ret; } From patchwork Fri Jun 28 11:59:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022071 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F38E924 for ; Fri, 28 Jun 2019 12:01:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 90CDA28437 for ; Fri, 28 Jun 2019 12:01:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 85209287E5; Fri, 28 Jun 2019 12:01:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 383C728437 for ; Fri, 28 Jun 2019 12:01:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726979AbfF1MBO (ORCPT ); Fri, 28 Jun 2019 08:01:14 -0400 Received: from mx2.suse.de ([195.135.220.15]:54510 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726957AbfF1MBO (ORCPT ); Fri, 28 Jun 2019 08:01:14 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 74702B62C; Fri, 28 Jun 2019 12:01:13 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 16/37] bcache: improve error message in bch_cached_dev_run() Date: Fri, 28 Jun 2019 19:59:39 +0800 Message-Id: <20190628120000.40753-17-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch adds more error message in bch_cached_dev_run() to indicate the exact reason why an error value is returned. Please notice when printing out the "is running already" message, pr_info() is used here, because in this case also -EBUSY is returned, the bcache device can continue to attach to the cache devince and run, so it won't be an error level message in kernel message. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index a836910ef368..e9e6d653bf70 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -926,13 +926,18 @@ int bch_cached_dev_run(struct cached_dev *dc) NULL, }; - if (dc->io_disable) + if (dc->io_disable) { + pr_err("I/O disabled on cached dev %s", + dc->backing_dev_name); return -EIO; + } if (atomic_xchg(&dc->running, 1)) { kfree(env[1]); kfree(env[2]); kfree(buf); + pr_info("cached dev %s is running already", + dc->backing_dev_name); return -EBUSY; } @@ -961,7 +966,7 @@ int bch_cached_dev_run(struct cached_dev *dc) if (sysfs_create_link(&d->kobj, &disk_to_dev(d->disk)->kobj, "dev") || sysfs_create_link(&disk_to_dev(d->disk)->kobj, &d->kobj, "bcache")) { - pr_debug("error creating sysfs link"); + pr_err("Couldn't create bcache dev <-> disk sysfs symlinks"); return -ENOMEM; } From patchwork Fri Jun 28 11:59:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022073 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BA2E6924 for ; Fri, 28 Jun 2019 12:01:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD22B28437 for ; Fri, 28 Jun 2019 12:01:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A1AB5287E5; Fri, 28 Jun 2019 12:01:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EF2828437 for ; Fri, 28 Jun 2019 12:01:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726980AbfF1MBS (ORCPT ); Fri, 28 Jun 2019 08:01:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:54552 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726957AbfF1MBS (ORCPT ); Fri, 28 Jun 2019 08:01:18 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8462BB627; Fri, 28 Jun 2019 12:01:17 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 17/37] bcache: remove "XXX:" comment line from run_cache_set() Date: Fri, 28 Jun 2019 19:59:40 +0800 Message-Id: <20190628120000.40753-18-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In previous bcache patches for Linux v5.2, the failure code path of run_cache_set() is tested and fixed. So now the following comment line can be removed from run_cache_set(), /* XXX: test this, it's broken */ Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index e9e6d653bf70..c53fe0f1629f 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1979,7 +1979,7 @@ static int run_cache_set(struct cache_set *c) } closure_sync(&cl); - /* XXX: test this, it's broken */ + bch_cache_set_error(c, "%s", err); return -EIO; From patchwork Fri Jun 28 11:59:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022075 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 97CD714C0 for ; Fri, 28 Jun 2019 12:01:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8AED828437 for ; Fri, 28 Jun 2019 12:01:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7F3C8287E5; Fri, 28 Jun 2019 12:01:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C7EC28437 for ; Fri, 28 Jun 2019 12:01:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726983AbfF1MBW (ORCPT ); Fri, 28 Jun 2019 08:01:22 -0400 Received: from mx2.suse.de ([195.135.220.15]:54576 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726957AbfF1MBW (ORCPT ); Fri, 28 Jun 2019 08:01:22 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5AD35B627; Fri, 28 Jun 2019 12:01:21 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 18/37] bcache: make bset_search_tree() be more understandable Date: Fri, 28 Jun 2019 19:59:41 +0800 Message-Id: <20190628120000.40753-19-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The purpose of following code in bset_search_tree() is to avoid a branch instruction, 994 if (likely(f->exponent != 127)) 995 n = j * 2 + (((unsigned int) 996 (f->mantissa - 997 bfloat_mantissa(search, f))) >> 31); 998 else 999 n = (bkey_cmp(tree_to_bkey(t, j), search) > 0) 1000 ? j * 2 1001 : j * 2 + 1; This piece of code is not very clear to understand, even when I tried to add code comment for it, I made mistake. This patch removes the implict bit operation and uses explicit branch to calculate next location in binary tree search. Signed-off-by: Coly Li --- drivers/md/bcache/bset.c | 30 +++++++++++------------------- 1 file changed, 11 insertions(+), 19 deletions(-) diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c index 8af9509e78bd..08768796b543 100644 --- a/drivers/md/bcache/bset.c +++ b/drivers/md/bcache/bset.c @@ -975,25 +975,17 @@ static struct bset_search_iter bset_search_tree(struct bset_tree *t, j = n; f = &t->tree[j]; - /* - * Similar bit trick, use subtract operation to avoid a branch - * instruction. - * - * n = (f->mantissa > bfloat_mantissa()) - * ? j * 2 - * : j * 2 + 1; - * - * We need to subtract 1 from f->mantissa for the sign bit trick - * to work - that's done in make_bfloat() - */ - if (likely(f->exponent != 127)) - n = j * 2 + (((unsigned int) - (f->mantissa - - bfloat_mantissa(search, f))) >> 31); - else - n = (bkey_cmp(tree_to_bkey(t, j), search) > 0) - ? j * 2 - : j * 2 + 1; + if (likely(f->exponent != 127)) { + if (f->mantissa >= bfloat_mantissa(search, f)) + n = j * 2; + else + n = j * 2 + 1; + } else { + if (bkey_cmp(tree_to_bkey(t, j), search) > 0) + n = j * 2; + else + n = j * 2 + 1; + } } while (n < t->size); inorder = to_inorder(j, t); From patchwork Fri Jun 28 11:59:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022077 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B894F14C0 for ; Fri, 28 Jun 2019 12:01:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A85E528437 for ; Fri, 28 Jun 2019 12:01:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 97BB6287E5; Fri, 28 Jun 2019 12:01:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 26EED28437 for ; Fri, 28 Jun 2019 12:01:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726987AbfF1MB0 (ORCPT ); Fri, 28 Jun 2019 08:01:26 -0400 Received: from mx2.suse.de ([195.135.220.15]:54600 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726957AbfF1MB0 (ORCPT ); Fri, 28 Jun 2019 08:01:26 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 718FAB627; Fri, 28 Jun 2019 12:01:25 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 19/37] bcache: add pendings_cleanup to stop pending bcache device Date: Fri, 28 Jun 2019 19:59:42 +0800 Message-Id: <20190628120000.40753-20-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If a bcache device is in dirty state and its cache set is not registered, this bcache device will not appear in /dev/bcache, and there is no way to stop it or remove the bcache kernel module. This is an as-designed behavior, but sometimes people has to reboot whole system to release or stop the pending backing device. This sysfs interface may remove such pending bcache devices when write anything into the sysfs file manually. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index c53fe0f1629f..c4c4b2d99dc2 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2273,9 +2273,13 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page, static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, const char *buffer, size_t size); +static ssize_t bch_pending_bdevs_cleanup(struct kobject *k, + struct kobj_attribute *attr, + const char *buffer, size_t size); kobj_attribute_write(register, register_bcache); kobj_attribute_write(register_quiet, register_bcache); +kobj_attribute_write(pendings_cleanup, bch_pending_bdevs_cleanup); static bool bch_is_open_backing(struct block_device *bdev) { @@ -2400,6 +2404,56 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, goto out; } + +struct pdev { + struct list_head list; + struct cached_dev *dc; +}; + +static ssize_t bch_pending_bdevs_cleanup(struct kobject *k, + struct kobj_attribute *attr, + const char *buffer, + size_t size) +{ + LIST_HEAD(pending_devs); + ssize_t ret = size; + struct cached_dev *dc, *tdc; + struct pdev *pdev, *tpdev; + struct cache_set *c, *tc; + + mutex_lock(&bch_register_lock); + list_for_each_entry_safe(dc, tdc, &uncached_devices, list) { + pdev = kmalloc(sizeof(struct pdev), GFP_KERNEL); + if (!pdev) + break; + pdev->dc = dc; + list_add(&pdev->list, &pending_devs); + } + + list_for_each_entry_safe(pdev, tpdev, &pending_devs, list) { + list_for_each_entry_safe(c, tc, &bch_cache_sets, list) { + char *pdev_set_uuid = pdev->dc->sb.set_uuid; + char *set_uuid = c->sb.uuid; + + if (!memcmp(pdev_set_uuid, set_uuid, 16)) { + list_del(&pdev->list); + kfree(pdev); + break; + } + } + } + mutex_unlock(&bch_register_lock); + + list_for_each_entry_safe(pdev, tpdev, &pending_devs, list) { + pr_info("delete pdev %p", pdev); + list_del(&pdev->list); + bcache_device_stop(&pdev->dc->disk); + kfree(pdev); + } + + return ret; +} + static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x) { if (code == SYS_DOWN || @@ -2518,6 +2572,7 @@ static int __init bcache_init(void) static const struct attribute *files[] = { &ksysfs_register.attr, &ksysfs_register_quiet.attr, + &ksysfs_pendings_cleanup.attr, NULL }; From patchwork Fri Jun 28 11:59:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022079 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CAD30924 for ; Fri, 28 Jun 2019 12:01:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BB82828437 for ; Fri, 28 Jun 2019 12:01:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AF7A7287E5; Fri, 28 Jun 2019 12:01:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 640C228437 for ; Fri, 28 Jun 2019 12:01:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726990AbfF1MBa (ORCPT ); Fri, 28 Jun 2019 08:01:30 -0400 Received: from mx2.suse.de ([195.135.220.15]:54644 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726957AbfF1MBa (ORCPT ); Fri, 28 Jun 2019 08:01:30 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 3F0DFB627; Fri, 28 Jun 2019 12:01:29 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , stable@vger.kernel.org Subject: [PATCH 20/37] bcache: fix mistaken sysfs entry for io_error counter Date: Fri, 28 Jun 2019 19:59:43 +0800 Message-Id: <20190628120000.40753-21-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In bch_cached_dev_files[] from driver/md/bcache/sysfs.c, sysfs_errors is incorrectly inserted in. The correct entry should be sysfs_io_errors. This patch fixes the problem and now I/O errors of cached device can be read from /sys/block/bcache/bcache/io_errors. Fixes: c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev") Signed-off-by: Coly Li Cc: stable@vger.kernel.org --- drivers/md/bcache/sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index eb678e43ac00..dddb8d4048ce 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -176,7 +176,7 @@ SHOW(__bch_cached_dev) var_print(writeback_percent); sysfs_hprint(writeback_rate, wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0); - sysfs_hprint(io_errors, atomic_read(&dc->io_errors)); + sysfs_printf(io_errors, "%i", atomic_read(&dc->io_errors)); sysfs_printf(io_error_limit, "%i", dc->error_limit); sysfs_printf(io_disable, "%i", dc->io_disable); var_print(writeback_rate_update_seconds); @@ -463,7 +463,7 @@ static struct attribute *bch_cached_dev_files[] = { &sysfs_writeback_rate_p_term_inverse, &sysfs_writeback_rate_minimum, &sysfs_writeback_rate_debug, - &sysfs_errors, + &sysfs_io_errors, &sysfs_io_error_limit, &sysfs_io_disable, &sysfs_dirty_data, From patchwork Fri Jun 28 11:59:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022081 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39F4A14C0 for ; Fri, 28 Jun 2019 12:01:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2A80C28437 for ; Fri, 28 Jun 2019 12:01:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1ECFB287E5; Fri, 28 Jun 2019 12:01:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CA5F728437 for ; Fri, 28 Jun 2019 12:01:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726957AbfF1MBe (ORCPT ); Fri, 28 Jun 2019 08:01:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:54690 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726991AbfF1MBe (ORCPT ); Fri, 28 Jun 2019 08:01:34 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id E10E0B62B; Fri, 28 Jun 2019 12:01:32 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , stable@vger.kernel.org Subject: [PATCH 21/37] bcache: destroy dc->writeback_write_wq if failed to create dc->writeback_thread Date: Fri, 28 Jun 2019 19:59:44 +0800 Message-Id: <20190628120000.40753-22-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a new work queue dc->writeback_write_wq, but forgot to destroy it in the error condition when creating dc->writeback_thread failed. This patch destroys dc->writeback_write_wq if kthread_create() returns error pointer to dc->writeback_thread, then a memory leak is avoided. Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race") Signed-off-by: Coly Li Cc: stable@vger.kernel.org --- drivers/md/bcache/writeback.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 262f7ef20992..21081febcb59 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc) "bcache_writeback"); if (IS_ERR(dc->writeback_thread)) { cached_dev_put(dc); + destroy_workqueue(dc->writeback_write_wq); return PTR_ERR(dc->writeback_thread); } dc->writeback_running = true; From patchwork Fri Jun 28 11:59:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022083 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A79814C0 for ; Fri, 28 Jun 2019 12:01:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D2C628437 for ; Fri, 28 Jun 2019 12:01:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 41436287E5; Fri, 28 Jun 2019 12:01:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CFA9328437 for ; Fri, 28 Jun 2019 12:01:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726706AbfF1MBj (ORCPT ); Fri, 28 Jun 2019 08:01:39 -0400 Received: from mx2.suse.de ([195.135.220.15]:54736 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726810AbfF1MBj (ORCPT ); Fri, 28 Jun 2019 08:01:39 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 09DB2B62B; Fri, 28 Jun 2019 12:01:38 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 22/37] bcache: stop writeback kthread and kworker when bch_cached_dev_run() failed Date: Fri, 28 Jun 2019 19:59:45 +0800 Message-Id: <20190628120000.40753-23-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In bch_cached_dev_attach() after bch_cached_dev_writeback_start() called, the wrireback kthread and writeback rate update kworker of the cached device are created, if the following bch_cached_dev_run() failed, bch_cached_dev_attach() will return with -ENOMEM without stopping the writeback related kthread and kworker. This patch stops writeback kthread and writeback rate update kworker before returning -ENOMEM if bch_cached_dev_run() returns error. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index c4c4b2d99dc2..791cb930b353 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1189,6 +1189,14 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c, ret = bch_cached_dev_run(dc); if (ret && (ret != -EBUSY)) { up_write(&dc->writeback_lock); + /* + * bch_register_lock is held, bcache_device_stop() is not + * able to be directly called. The kthread and kworker + * created previously in bch_cached_dev_writeback_start() + * have to be stopped manually here. + */ + kthread_stop(dc->writeback_thread); + cancel_writeback_rate_update_dwork(dc); pr_err("Couldn't run cached device %s", dc->backing_dev_name); return ret; From patchwork Fri Jun 28 11:59:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022085 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E52BA924 for ; Fri, 28 Jun 2019 12:01:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D566D28437 for ; Fri, 28 Jun 2019 12:01:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C97B6287E5; Fri, 28 Jun 2019 12:01:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4538A28437 for ; Fri, 28 Jun 2019 12:01:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726810AbfF1MBo (ORCPT ); Fri, 28 Jun 2019 08:01:44 -0400 Received: from mx2.suse.de ([195.135.220.15]:54764 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726574AbfF1MBo (ORCPT ); Fri, 28 Jun 2019 08:01:44 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D5F49B62B; Fri, 28 Jun 2019 12:01:42 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 23/37] bcache: avoid a deadlock in bcache_reboot() Date: Fri, 28 Jun 2019 19:59:46 +0800 Message-Id: <20190628120000.40753-24-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP It is quite frequently to observe deadlock in bcache_reboot() happens and hang the system reboot process. The reason is, in bcache_reboot() when calling bch_cache_set_stop() and bcache_device_stop() the mutex bch_register_lock is held. But in the process to stop cache set and bcache device, bch_register_lock will be acquired again. If this mutex is held here, deadlock will happen inside the stopping process. The aftermath of the deadlock is, whole system reboot gets hung. The fix is to avoid holding bch_register_lock for the following loops in bcache_reboot(), list_for_each_entry_safe(c, tc, &bch_cache_sets, list) bch_cache_set_stop(c); list_for_each_entry_safe(dc, tdc, &uncached_devices, list) bcache_device_stop(&dc->disk); A module range variable 'bcache_is_reboot' is added, it sets to true in bcache_reboot(). In register_bcache(), if bcache_is_reboot is checked to be true, reject the registration by returning -EBUSY immediately. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 40 +++++++++++++++++++++++++++++++++++++++- drivers/md/bcache/sysfs.c | 26 ++++++++++++++++++++++++++ 2 files changed, 65 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 791cb930b353..a88238ad5da1 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -40,6 +40,7 @@ static const char invalid_uuid[] = { static struct kobject *bcache_kobj; struct mutex bch_register_lock; +bool bcache_is_reboot; LIST_HEAD(bch_cache_sets); static LIST_HEAD(uncached_devices); @@ -49,6 +50,7 @@ static wait_queue_head_t unregister_wait; struct workqueue_struct *bcache_wq; struct workqueue_struct *bch_journal_wq; + #define BTREE_MAX_PAGES (256 * 1024 / PAGE_SIZE) /* limitation of partitions number on single bcache device */ #define BCACHE_MINORS 128 @@ -2335,6 +2337,11 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr, if (!try_module_get(THIS_MODULE)) return -EBUSY; + /* For latest state of bcache_is_reboot */ + smp_mb(); + if (bcache_is_reboot) + return -EBUSY; + path = kstrndup(buffer, size, GFP_KERNEL); if (!path) goto err; @@ -2464,6 +2471,9 @@ static ssize_t bch_pending_bdevs_cleanup(struct kobject *k, static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x) { + if (bcache_is_reboot) + return NOTIFY_DONE; + if (code == SYS_DOWN || code == SYS_HALT || code == SYS_POWER_OFF) { @@ -2476,19 +2486,45 @@ static int bcache_reboot(struct notifier_block *n, unsigned long code, void *x) mutex_lock(&bch_register_lock); + if (bcache_is_reboot) + goto out; + + /* New registration is rejected since now */ + bcache_is_reboot = true; + /* + * Make registering caller (if there is) on other CPU + * core know bcache_is_reboot set to true earlier + */ + smp_mb(); + if (list_empty(&bch_cache_sets) && list_empty(&uncached_devices)) goto out; + mutex_unlock(&bch_register_lock); + pr_info("Stopping all devices:"); + /* + * The reason bch_register_lock is not held to call + * bch_cache_set_stop() and bcache_device_stop() is to + * avoid potential deadlock during reboot, because cache + * set or bcache device stopping process will acqurie + * bch_register_lock too. + * + * We are safe here because bcache_is_reboot sets to + * true already, register_bcache() will reject new + * registration now. bcache_is_reboot also makes sure + * bcache_reboot() won't be re-entered on by other thread, + * so there is no race in following list iteration by + * list_for_each_entry_safe(). + */ list_for_each_entry_safe(c, tc, &bch_cache_sets, list) bch_cache_set_stop(c); list_for_each_entry_safe(dc, tdc, &uncached_devices, list) bcache_device_stop(&dc->disk); - mutex_unlock(&bch_register_lock); /* * Give an early chance for other kthreads and @@ -2616,6 +2652,8 @@ static int __init bcache_init(void) bch_debug_init(); closure_debug_init(); + bcache_is_reboot = false; + return 0; err: bcache_exit(); diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index dddb8d4048ce..d62e28643109 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -16,6 +16,8 @@ #include #include +extern bool bcache_is_reboot; + /* Default is 0 ("writethrough") */ static const char * const bch_cache_modes[] = { "writethrough", @@ -267,6 +269,10 @@ STORE(__cached_dev) struct cache_set *c; struct kobj_uevent_env *env; + /* no user space access if system is rebooting */ + if (bcache_is_reboot) + return -EBUSY; + #define d_strtoul(var) sysfs_strtoul(var, dc->var) #define d_strtoul_nonzero(var) sysfs_strtoul_clamp(var, dc->var, 1, INT_MAX) #define d_strtoi_h(var) sysfs_hatoi(var, dc->var) @@ -407,6 +413,10 @@ STORE(bch_cached_dev) struct cached_dev *dc = container_of(kobj, struct cached_dev, disk.kobj); + /* no user space access if system is rebooting */ + if (bcache_is_reboot) + return -EBUSY; + mutex_lock(&bch_register_lock); size = __cached_dev_store(kobj, attr, buf, size); @@ -510,6 +520,10 @@ STORE(__bch_flash_dev) kobj); struct uuid_entry *u = &d->c->uuids[d->id]; + /* no user space access if system is rebooting */ + if (bcache_is_reboot) + return -EBUSY; + sysfs_strtoul(data_csum, d->data_csum); if (attr == &sysfs_size) { @@ -745,6 +759,10 @@ STORE(__bch_cache_set) struct cache_set *c = container_of(kobj, struct cache_set, kobj); ssize_t v; + /* no user space access if system is rebooting */ + if (bcache_is_reboot) + return -EBUSY; + if (attr == &sysfs_unregister) bch_cache_set_unregister(c); @@ -864,6 +882,10 @@ STORE(bch_cache_set_internal) { struct cache_set *c = container_of(kobj, struct cache_set, internal); + /* no user space access if system is rebooting */ + if (bcache_is_reboot) + return -EBUSY; + return bch_cache_set_store(&c->kobj, attr, buf, size); } @@ -1049,6 +1071,10 @@ STORE(__bch_cache) struct cache *ca = container_of(kobj, struct cache, kobj); ssize_t v; + /* no user space access if system is rebooting */ + if (bcache_is_reboot) + return -EBUSY; + if (attr == &sysfs_discard) { bool v = strtoul_or_return(buf); From patchwork Fri Jun 28 11:59:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022087 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E5B614C0 for ; Fri, 28 Jun 2019 12:01:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0075928437 for ; Fri, 28 Jun 2019 12:01:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E9028287E5; Fri, 28 Jun 2019 12:01:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8DB5528437 for ; Fri, 28 Jun 2019 12:01:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726819AbfF1MBs (ORCPT ); Fri, 28 Jun 2019 08:01:48 -0400 Received: from mx2.suse.de ([195.135.220.15]:54788 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726574AbfF1MBr (ORCPT ); Fri, 28 Jun 2019 08:01:47 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A60D7B62B; Fri, 28 Jun 2019 12:01:46 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 24/37] bcache: acquire bch_register_lock later in cached_dev_detach_finish() Date: Fri, 28 Jun 2019 19:59:47 +0800 Message-Id: <20190628120000.40753-25-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now there is variable bcache_is_reboot to prevent device register or unregister during reboot, it is unncessary to still hold mutex lock bch_register_lock before stopping writeback_rate_update kworker and writeback kthread. And if the stopping kworker or kthread holding bch_register_lock inside their routine (we used to have such problem in writeback thread, thanks to Junhui Wang fixed it), it is very easy to introduce deadlock during reboot/shutdown procedure. Therefore in this patch, the location to acquire bch_register_lock is moved to the location before calling calc_cached_dev_sectors(). Which is later then original location in cached_dev_detach_finish(). Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index a88238ad5da1..40d857e690f9 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1018,7 +1018,6 @@ static void cached_dev_detach_finish(struct work_struct *w) BUG_ON(!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)); BUG_ON(refcount_read(&dc->count)); - mutex_lock(&bch_register_lock); if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) cancel_writeback_rate_update_dwork(dc); @@ -1034,6 +1033,8 @@ static void cached_dev_detach_finish(struct work_struct *w) bch_write_bdev_super(dc, &cl); closure_sync(&cl); + mutex_lock(&bch_register_lock); + calc_cached_dev_sectors(dc->disk.c); bcache_device_detach(&dc->disk); list_move(&dc->list, &uncached_devices); From patchwork Fri Jun 28 11:59:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022089 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 74F30924 for ; Fri, 28 Jun 2019 12:01:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 649E028437 for ; Fri, 28 Jun 2019 12:01:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5877C287E5; Fri, 28 Jun 2019 12:01:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B3AFD28437 for ; Fri, 28 Jun 2019 12:01:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726846AbfF1MBv (ORCPT ); Fri, 28 Jun 2019 08:01:51 -0400 Received: from mx2.suse.de ([195.135.220.15]:54804 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726574AbfF1MBv (ORCPT ); Fri, 28 Jun 2019 08:01:51 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8765AB62B; Fri, 28 Jun 2019 12:01:49 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 25/37] bcache: acquire bch_register_lock later in cached_dev_free() Date: Fri, 28 Jun 2019 19:59:48 +0800 Message-Id: <20190628120000.40753-26-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When enable lockdep engine, a lockdep warning can be observed when reboot or shutdown system, [ 3142.764557][ T1] bcache: bcache_reboot() Stopping all devices: [ 3142.776265][ T2649] [ 3142.777159][ T2649] ====================================================== [ 3142.780039][ T2649] WARNING: possible circular locking dependency detected [ 3142.782869][ T2649] 5.2.0-rc4-lp151.20-default+ #1 Tainted: G W [ 3142.785684][ T2649] ------------------------------------------------------ [ 3142.788479][ T2649] kworker/3:67/2649 is trying to acquire lock: [ 3142.790738][ T2649] 00000000aaf02291 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0 [ 3142.794678][ T2649] [ 3142.794678][ T2649] but task is already holding lock: [ 3142.797402][ T2649] 000000004fcf89c5 (&bch_register_lock){+.+.}, at: cached_dev_free+0x17/0x120 [bcache] [ 3142.801462][ T2649] [ 3142.801462][ T2649] which lock already depends on the new lock. [ 3142.801462][ T2649] [ 3142.805277][ T2649] [ 3142.805277][ T2649] the existing dependency chain (in reverse order) is: [ 3142.808902][ T2649] [ 3142.808902][ T2649] -> #2 (&bch_register_lock){+.+.}: [ 3142.812396][ T2649] __mutex_lock+0x7a/0x9d0 [ 3142.814184][ T2649] cached_dev_free+0x17/0x120 [bcache] [ 3142.816415][ T2649] process_one_work+0x2a4/0x640 [ 3142.818413][ T2649] worker_thread+0x39/0x3f0 [ 3142.820276][ T2649] kthread+0x125/0x140 [ 3142.822061][ T2649] ret_from_fork+0x3a/0x50 [ 3142.823965][ T2649] [ 3142.823965][ T2649] -> #1 ((work_completion)(&cl->work)#2){+.+.}: [ 3142.827244][ T2649] process_one_work+0x277/0x640 [ 3142.829160][ T2649] worker_thread+0x39/0x3f0 [ 3142.830958][ T2649] kthread+0x125/0x140 [ 3142.832674][ T2649] ret_from_fork+0x3a/0x50 [ 3142.834915][ T2649] [ 3142.834915][ T2649] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}: [ 3142.838121][ T2649] lock_acquire+0xb4/0x1c0 [ 3142.840025][ T2649] flush_workqueue+0xae/0x4c0 [ 3142.842035][ T2649] drain_workqueue+0xa9/0x180 [ 3142.844042][ T2649] destroy_workqueue+0x17/0x250 [ 3142.846142][ T2649] cached_dev_free+0x52/0x120 [bcache] [ 3142.848530][ T2649] process_one_work+0x2a4/0x640 [ 3142.850663][ T2649] worker_thread+0x39/0x3f0 [ 3142.852464][ T2649] kthread+0x125/0x140 [ 3142.854106][ T2649] ret_from_fork+0x3a/0x50 [ 3142.855880][ T2649] [ 3142.855880][ T2649] other info that might help us debug this: [ 3142.855880][ T2649] [ 3142.859663][ T2649] Chain exists of: [ 3142.859663][ T2649] (wq_completion)bcache_writeback_wq --> (work_completion)(&cl->work)#2 --> &bch_register_lock [ 3142.859663][ T2649] [ 3142.865424][ T2649] Possible unsafe locking scenario: [ 3142.865424][ T2649] [ 3142.868022][ T2649] CPU0 CPU1 [ 3142.869885][ T2649] ---- ---- [ 3142.871751][ T2649] lock(&bch_register_lock); [ 3142.873379][ T2649] lock((work_completion)(&cl->work)#2); [ 3142.876399][ T2649] lock(&bch_register_lock); [ 3142.879727][ T2649] lock((wq_completion)bcache_writeback_wq); [ 3142.882064][ T2649] [ 3142.882064][ T2649] *** DEADLOCK *** [ 3142.882064][ T2649] [ 3142.885060][ T2649] 3 locks held by kworker/3:67/2649: [ 3142.887245][ T2649] #0: 00000000e774cdd0 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640 [ 3142.890815][ T2649] #1: 00000000f7df89da ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640 [ 3142.894884][ T2649] #2: 000000004fcf89c5 (&bch_register_lock){+.+.}, at: cached_dev_free+0x17/0x120 [bcache] [ 3142.898797][ T2649] [ 3142.898797][ T2649] stack backtrace: [ 3142.900961][ T2649] CPU: 3 PID: 2649 Comm: kworker/3:67 Tainted: G W 5.2.0-rc4-lp151.20-default+ #1 [ 3142.904789][ T2649] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 [ 3142.909168][ T2649] Workqueue: events cached_dev_free [bcache] [ 3142.911422][ T2649] Call Trace: [ 3142.912656][ T2649] dump_stack+0x85/0xcb [ 3142.914181][ T2649] print_circular_bug+0x19a/0x1f0 [ 3142.916193][ T2649] __lock_acquire+0x16cd/0x1850 [ 3142.917936][ T2649] ? __lock_acquire+0x6a8/0x1850 [ 3142.919704][ T2649] ? lock_acquire+0xb4/0x1c0 [ 3142.921335][ T2649] ? find_held_lock+0x34/0xa0 [ 3142.923052][ T2649] lock_acquire+0xb4/0x1c0 [ 3142.924635][ T2649] ? flush_workqueue+0x87/0x4c0 [ 3142.926375][ T2649] flush_workqueue+0xae/0x4c0 [ 3142.928047][ T2649] ? flush_workqueue+0x87/0x4c0 [ 3142.929824][ T2649] ? drain_workqueue+0xa9/0x180 [ 3142.931686][ T2649] drain_workqueue+0xa9/0x180 [ 3142.933534][ T2649] destroy_workqueue+0x17/0x250 [ 3142.935787][ T2649] cached_dev_free+0x52/0x120 [bcache] [ 3142.937795][ T2649] process_one_work+0x2a4/0x640 [ 3142.939803][ T2649] worker_thread+0x39/0x3f0 [ 3142.941487][ T2649] ? process_one_work+0x640/0x640 [ 3142.943389][ T2649] kthread+0x125/0x140 [ 3142.944894][ T2649] ? kthread_create_worker_on_cpu+0x70/0x70 [ 3142.947744][ T2649] ret_from_fork+0x3a/0x50 [ 3142.970358][ T2649] bcache: bcache_device_free() bcache0 stopped Here is how the deadlock happens. 1) bcache_reboot() calls bcache_device_stop(), then inside bcache_device_stop() BCACHE_DEV_CLOSING bit is set on d->flags. Then closure_queue(&d->cl) is called to invoke cached_dev_flush(). 2) In cached_dev_flush(), cached_dev_free() is called by continu_at(). 3) In cached_dev_free(), when stopping the writeback kthread of the cached device by kthread_stop(), dc->writeback_thread will be waken up to quite the kthread while-loop, then cached_dev_put() is called in bch_writeback_thread(). 4) Calling cached_dev_put() in writeback kthread may drop dc->count to 0, then dc->detach kworker is scheduled, which is initialized as cached_dev_detach_finish(). 5) Inside cached_dev_detach_finish(), the last line of code is to call closure_put(&dc->disk.cl), which drops the last reference counter of closrure dc->disk.cl, then the callback cached_dev_flush() gets called. Now cached_dev_flush() is called for second time in the code path, the first time is in step 2). And again bch_register_lock will be acquired again, and a A-A lock (lockdep terminology) is happening. The root cause of the above A-A lock is in cached_dev_free(), mutex bch_register_lock is held before stopping writeback kthread and other kworkers. Fortunately now we have variable 'bcache_is_reboot', which may prevent device registration or unregistration during reboot/shutdown time, so it is unncessary to hold bch_register_lock such early now. This is how this patch fixes the reboot/shutdown time A-A lock issue: After moving mutex_lock(&bch_register_lock) to a later location where before atomic_read(&dc->running) in cached_dev_free(), such A-A lock problem can be solved without any reboot time registration race. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 40d857e690f9..8a12a8313367 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1231,8 +1231,6 @@ static void cached_dev_free(struct closure *cl) { struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl); - mutex_lock(&bch_register_lock); - if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) cancel_writeback_rate_update_dwork(dc); @@ -1243,6 +1241,8 @@ static void cached_dev_free(struct closure *cl) if (!IS_ERR_OR_NULL(dc->status_update_thread)) kthread_stop(dc->status_update_thread); + mutex_lock(&bch_register_lock); + if (atomic_read(&dc->running)) bd_unlink_disk_holder(dc->bdev, dc->disk.disk); bcache_device_free(&dc->disk); From patchwork Fri Jun 28 11:59:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022091 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8976D924 for ; Fri, 28 Jun 2019 12:01:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7AD0F28437 for ; Fri, 28 Jun 2019 12:01:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6F377287E5; Fri, 28 Jun 2019 12:01:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BFA6628437 for ; Fri, 28 Jun 2019 12:01:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726558AbfF1MB6 (ORCPT ); Fri, 28 Jun 2019 08:01:58 -0400 Received: from mx2.suse.de ([195.135.220.15]:54828 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726574AbfF1MB6 (ORCPT ); Fri, 28 Jun 2019 08:01:58 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2EEDCB627; Fri, 28 Jun 2019 12:01:55 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 26/37] bcache: fix potential deadlock in cached_def_free() Date: Fri, 28 Jun 2019 19:59:49 +0800 Message-Id: <20190628120000.40753-27-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When enable lockdep and reboot system with a writeback mode bcache device, the following potential deadlock warning is reported by lockdep engine. [ 101.536569][ T401] kworker/2:2/401 is trying to acquire lock: [ 101.538575][ T401] 00000000bbf6e6c7 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0 [ 101.542054][ T401] [ 101.542054][ T401] but task is already holding lock: [ 101.544587][ T401] 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640 [ 101.548386][ T401] [ 101.548386][ T401] which lock already depends on the new lock. [ 101.548386][ T401] [ 101.551874][ T401] [ 101.551874][ T401] the existing dependency chain (in reverse order) is: [ 101.555000][ T401] [ 101.555000][ T401] -> #1 ((work_completion)(&cl->work)#2){+.+.}: [ 101.557860][ T401] process_one_work+0x277/0x640 [ 101.559661][ T401] worker_thread+0x39/0x3f0 [ 101.561340][ T401] kthread+0x125/0x140 [ 101.562963][ T401] ret_from_fork+0x3a/0x50 [ 101.564718][ T401] [ 101.564718][ T401] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}: [ 101.567701][ T401] lock_acquire+0xb4/0x1c0 [ 101.569651][ T401] flush_workqueue+0xae/0x4c0 [ 101.571494][ T401] drain_workqueue+0xa9/0x180 [ 101.573234][ T401] destroy_workqueue+0x17/0x250 [ 101.575109][ T401] cached_dev_free+0x44/0x120 [bcache] [ 101.577304][ T401] process_one_work+0x2a4/0x640 [ 101.579357][ T401] worker_thread+0x39/0x3f0 [ 101.581055][ T401] kthread+0x125/0x140 [ 101.582709][ T401] ret_from_fork+0x3a/0x50 [ 101.584592][ T401] [ 101.584592][ T401] other info that might help us debug this: [ 101.584592][ T401] [ 101.588355][ T401] Possible unsafe locking scenario: [ 101.588355][ T401] [ 101.590974][ T401] CPU0 CPU1 [ 101.592889][ T401] ---- ---- [ 101.594743][ T401] lock((work_completion)(&cl->work)#2); [ 101.596785][ T401] lock((wq_completion)bcache_writeback_wq); [ 101.600072][ T401] lock((work_completion)(&cl->work)#2); [ 101.602971][ T401] lock((wq_completion)bcache_writeback_wq); [ 101.605255][ T401] [ 101.605255][ T401] *** DEADLOCK *** [ 101.605255][ T401] [ 101.608310][ T401] 2 locks held by kworker/2:2/401: [ 101.610208][ T401] #0: 00000000cf2c7d17 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640 [ 101.613709][ T401] #1: 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640 [ 101.617480][ T401] [ 101.617480][ T401] stack backtrace: [ 101.619539][ T401] CPU: 2 PID: 401 Comm: kworker/2:2 Tainted: G W 5.2.0-rc4-lp151.20-default+ #1 [ 101.623225][ T401] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 [ 101.627210][ T401] Workqueue: events cached_dev_free [bcache] [ 101.629239][ T401] Call Trace: [ 101.630360][ T401] dump_stack+0x85/0xcb [ 101.631777][ T401] print_circular_bug+0x19a/0x1f0 [ 101.633485][ T401] __lock_acquire+0x16cd/0x1850 [ 101.635184][ T401] ? __lock_acquire+0x6a8/0x1850 [ 101.636863][ T401] ? lock_acquire+0xb4/0x1c0 [ 101.638421][ T401] ? find_held_lock+0x34/0xa0 [ 101.640015][ T401] lock_acquire+0xb4/0x1c0 [ 101.641513][ T401] ? flush_workqueue+0x87/0x4c0 [ 101.643248][ T401] flush_workqueue+0xae/0x4c0 [ 101.644832][ T401] ? flush_workqueue+0x87/0x4c0 [ 101.646476][ T401] ? drain_workqueue+0xa9/0x180 [ 101.648303][ T401] drain_workqueue+0xa9/0x180 [ 101.649867][ T401] destroy_workqueue+0x17/0x250 [ 101.651503][ T401] cached_dev_free+0x44/0x120 [bcache] [ 101.653328][ T401] process_one_work+0x2a4/0x640 [ 101.655029][ T401] worker_thread+0x39/0x3f0 [ 101.656693][ T401] ? process_one_work+0x640/0x640 [ 101.658501][ T401] kthread+0x125/0x140 [ 101.660012][ T401] ? kthread_create_worker_on_cpu+0x70/0x70 [ 101.661985][ T401] ret_from_fork+0x3a/0x50 [ 101.691318][ T401] bcache: bcache_device_free() bcache0 stopped Here is how the above potential deadlock may happen in reboot/shutdown code path, 1) bcache_reboot() is called firstly in the reboot/shutdown code path, then in bcache_reboot(), bcache_device_stop() is called. 2) bcache_device_stop() sets BCACHE_DEV_CLOSING on d->falgs, then call closure_queue(&d->cl) to invoke cached_dev_flush(). And in turn cached_dev_flush() calls cached_dev_free() via closure_at() 3) In cached_dev_free(), after stopped writebach kthread dc->writeback_thread, the kwork dc->writeback_write_wq is stopping by destroy_workqueue(). 4) Inside destroy_workqueue(), drain_workqueue() is called. Inside drain_workqueue(), flush_workqueue() is called. Then wq->lockdep_map is acquired by lock_map_acquire() in flush_workqueue(). After the lock acquired the rest part of flush_workqueue() just wait for the workqueue to complete. 5) Now we look back at writeback thread routine bch_writeback_thread(), in the main while-loop, write_dirty() is called via continue_at() in read_dirty_submit(), which is called via continue_at() in while-loop level called function read_dirty(). Inside write_dirty() it may be re-called on workqueeu dc->writeback_write_wq via continue_at(). It means when the writeback kthread is stopped in cached_dev_free() there might be still one kworker queued on dc->writeback_write_wq to execute write_dirty() again. 6) Now this kworker is scheduled on dc->writeback_write_wq to run by process_one_work() (which is called by worker_thread()). Before calling the kwork routine, wq->lockdep_map is acquired. 7) But wq->lockdep_map is acquired already in step 4), so a A-A lock (lockdep terminology) scenario happens. Indeed on multiple cores syatem, the above deadlock is very rare to happen, just as the code comments in process_one_work() says, 2263 * AFAICT there is no possible deadlock scenario between the 2264 * flush_work() and complete() primitives (except for single-threaded 2265 * workqueues), so hiding them isn't a problem. But it is still good to fix such lockdep warning, even no one running bcache on single core system. The fix is simple. This patch solves the above potential deadlock by, - Do not destroy workqueue dc->writeback_write_wq in cached_dev_free(). - Flush and destroy dc->writeback_write_wq in writebach kthread routine bch_writeback_thread(), where after quit the thread main while-loop and before cached_dev_put() is called. By this fix, dc->writeback_write_wq will be stopped and destroy before the writeback kthread stopped, so the chance for a A-A locking on wq->lockdep_map is disappeared, such A-A deadlock won't happen any more. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 2 -- drivers/md/bcache/writeback.c | 4 ++++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 8a12a8313367..a8ea4e2086a9 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1236,8 +1236,6 @@ static void cached_dev_free(struct closure *cl) if (!IS_ERR_OR_NULL(dc->writeback_thread)) kthread_stop(dc->writeback_thread); - if (dc->writeback_write_wq) - destroy_workqueue(dc->writeback_write_wq); if (!IS_ERR_OR_NULL(dc->status_update_thread)) kthread_stop(dc->status_update_thread); diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 21081febcb59..d60268fe49e1 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -738,6 +738,10 @@ static int bch_writeback_thread(void *arg) } } + if (dc->writeback_write_wq) { + flush_workqueue(dc->writeback_write_wq); + destroy_workqueue(dc->writeback_write_wq); + } cached_dev_put(dc); wait_for_kthread_stop(); From patchwork Fri Jun 28 11:59:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022093 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 26205924 for ; Fri, 28 Jun 2019 12:02:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1778328437 for ; Fri, 28 Jun 2019 12:02:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0B883287E5; Fri, 28 Jun 2019 12:02:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ABB4A28437 for ; Fri, 28 Jun 2019 12:02:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727000AbfF1MCB (ORCPT ); Fri, 28 Jun 2019 08:02:01 -0400 Received: from mx2.suse.de ([195.135.220.15]:54848 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726574AbfF1MCB (ORCPT ); Fri, 28 Jun 2019 08:02:01 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 07889B62B; Fri, 28 Jun 2019 12:02:00 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 27/37] bcache: add code comments for journal_read_bucket() Date: Fri, 28 Jun 2019 19:59:50 +0800 Message-Id: <20190628120000.40753-28-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch adds more code comments in journal_read_bucket(), this is an effort to make the code to be more understandable. Signed-off-by: Coly Li --- drivers/md/bcache/journal.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 54f8886b6177..98ee467ec3f7 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -100,6 +100,20 @@ reread: left = ca->sb.bucket_size - offset; blocks = set_blocks(j, block_bytes(ca->set)); + /* + * Nodes in 'list' are in linear increasing order of + * i->j.seq, the node on head has the smallest (oldest) + * journal seq, the node on tail has the biggest + * (latest) journal seq. + */ + + /* + * Check from the oldest jset for last_seq. If + * i->j.seq < j->last_seq, it means the oldest jset + * in list is expired and useless, remove it from + * this list. Otherwise, j is a condidate jset for + * further following checks. + */ while (!list_empty(list)) { i = list_first_entry(list, struct journal_replay, list); @@ -109,13 +123,22 @@ reread: left = ca->sb.bucket_size - offset; kfree(i); } + /* iterate list in reverse order (from latest jset) */ list_for_each_entry_reverse(i, list, list) { if (j->seq == i->j.seq) goto next_set; + /* + * if j->seq is less than any i->j.last_seq + * in list, j is an expired and useless jset. + */ if (j->seq < i->j.last_seq) goto next_set; + /* + * 'where' points to first jset in list which + * is elder then j. + */ if (j->seq > i->j.seq) { where = &i->list; goto add; @@ -129,6 +152,7 @@ reread: left = ca->sb.bucket_size - offset; if (!i) return -ENOMEM; memcpy(&i->j, j, bytes); + /* Add to the location after 'where' points to */ list_add(&i->list, where); ret = 1; From patchwork Fri Jun 28 11:59:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022095 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BDB05924 for ; Fri, 28 Jun 2019 12:02:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AE1AF28437 for ; Fri, 28 Jun 2019 12:02:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A2351287E5; Fri, 28 Jun 2019 12:02:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 597E028437 for ; Fri, 28 Jun 2019 12:02:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727007AbfF1MCE (ORCPT ); Fri, 28 Jun 2019 08:02:04 -0400 Received: from mx2.suse.de ([195.135.220.15]:54874 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727002AbfF1MCE (ORCPT ); Fri, 28 Jun 2019 08:02:04 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8E1EDB627; Fri, 28 Jun 2019 12:02:03 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 28/37] bcache: set largest seq to ja->seq[bucket_index] in journal_read_bucket() Date: Fri, 28 Jun 2019 19:59:51 +0800 Message-Id: <20190628120000.40753-29-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In journal_read_bucket() when setting ja->seq[bucket_index], there might be potential case that a later non-maximum overwrites a better sequence number to ja->seq[bucket_index]. This patch adds a check to make sure that ja->seq[bucket_index] will be only set a new value if it is bigger then current value. Signed-off-by: Coly Li --- drivers/md/bcache/journal.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 98ee467ec3f7..3d321bffddc9 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -156,7 +156,8 @@ reread: left = ca->sb.bucket_size - offset; list_add(&i->list, where); ret = 1; - ja->seq[bucket_index] = j->seq; + if (j->seq > ja->seq[bucket_index]) + ja->seq[bucket_index] = j->seq; next_set: offset += blocks * ca->sb.block_size; len -= blocks * ca->sb.block_size; From patchwork Fri Jun 28 11:59:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022097 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BEA8F14C0 for ; Fri, 28 Jun 2019 12:02:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B0C3528437 for ; Fri, 28 Jun 2019 12:02:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A52E1287E5; Fri, 28 Jun 2019 12:02:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4FF9128437 for ; Fri, 28 Jun 2019 12:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727002AbfF1MCH (ORCPT ); Fri, 28 Jun 2019 08:02:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:54882 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726841AbfF1MCH (ORCPT ); Fri, 28 Jun 2019 08:02:07 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 80C5DB627; Fri, 28 Jun 2019 12:02:06 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 29/37] bcache: shrink btree node cache after bch_btree_check() Date: Fri, 28 Jun 2019 19:59:52 +0800 Message-Id: <20190628120000.40753-30-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When cache set starts, bch_btree_check() will check all bkeys on cache device by calculating the checksum. This operation will consume a huge number of system memory if there are a lot of data cached. Since bcache uses its own mca cache to maintain all its read-in btree nodes, and only releases the cache space when system memory manage code starts to shrink caches. Then before memory manager code to call the mca cache shrinker callback, bcache mca cache will compete memory resource with user space application, which may have nagive effect to performance of user space workloads (e.g. data base, or I/O service of distributed storage node). This patch tries to call bcache mca shrinker routine to proactively release mca cache memory, to decrease the memory pressure of system and avoid negative effort of the overall system I/O performance. Signed-off-by: Coly Li --- drivers/md/bcache/super.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index a8ea4e2086a9..26e374fbf57c 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1880,6 +1880,23 @@ static int run_cache_set(struct cache_set *c) if (bch_btree_check(c)) goto err; + /* + * bch_btree_check() may occupy too much system memory which + * has negative effects to user space application (e.g. data + * base) performance. Shrink the mca cache memory proactively + * here to avoid competing memory with user space workloads.. + */ + if (!c->shrinker_disabled) { + struct shrink_control sc; + + sc.gfp_mask = GFP_KERNEL; + sc.nr_to_scan = c->btree_cache_used * c->btree_pages; + /* first run to clear b->accessed tag */ + c->shrink.scan_objects(&c->shrink, &sc); + /* second run to reap non-accessed nodes */ + c->shrink.scan_objects(&c->shrink, &sc); + } + bch_journal_mark(c, &journal); bch_initial_gc_finish(c); pr_debug("btree_check() done"); From patchwork Fri Jun 28 11:59:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022099 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C37AB14C0 for ; Fri, 28 Jun 2019 12:02:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B282C28437 for ; Fri, 28 Jun 2019 12:02:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A34D7287E5; Fri, 28 Jun 2019 12:02:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32B2228437 for ; Fri, 28 Jun 2019 12:02:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727010AbfF1MCN (ORCPT ); Fri, 28 Jun 2019 08:02:13 -0400 Received: from mx2.suse.de ([195.135.220.15]:54908 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726807AbfF1MCN (ORCPT ); Fri, 28 Jun 2019 08:02:13 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B07E2B627; Fri, 28 Jun 2019 12:02:11 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , stable@vger.kernel.org, Shenghui Wang Subject: [PATCH 30/37] bcache: Revert "bcache: free heap cache_set->flush_btree in bch_journal_free" Date: Fri, 28 Jun 2019 19:59:53 +0800 Message-Id: <20190628120000.40753-31-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This reverts commit 6268dc2c4703aabfb0b35681be709acf4c2826c6. This patch depends on commit c4dc2497d50d ("bcache: fix high CPU occupancy during journal") which is reverted in previous patch. So revert this one too. Fixes: 6268dc2c4703 ("bcache: free heap cache_set->flush_btree in bch_journal_free") Signed-off-by: Coly Li Cc: stable@vger.kernel.org Cc: Shenghui Wang --- drivers/md/bcache/journal.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 3d321bffddc9..11d8c93b88bb 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -884,7 +884,6 @@ void bch_journal_free(struct cache_set *c) free_pages((unsigned long) c->journal.w[1].data, JSET_BITS); free_pages((unsigned long) c->journal.w[0].data, JSET_BITS); free_fifo(&c->journal.pin); - free_heap(&c->flush_btree); } int bch_journal_alloc(struct cache_set *c) From patchwork Fri Jun 28 11:59:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022101 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B5FD924 for ; Fri, 28 Jun 2019 12:02:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D29128437 for ; Fri, 28 Jun 2019 12:02:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4177E287E5; Fri, 28 Jun 2019 12:02:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB18728437 for ; Fri, 28 Jun 2019 12:02:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726923AbfF1MCS (ORCPT ); Fri, 28 Jun 2019 08:02:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:54954 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726807AbfF1MCS (ORCPT ); Fri, 28 Jun 2019 08:02:18 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2BA0CB62B; Fri, 28 Jun 2019 12:02:16 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , stable@vger.kernel.org, Tang Junhui Subject: [PATCH 31/37] bcache: Revert "bcache: fix high CPU occupancy during journal" Date: Fri, 28 Jun 2019 19:59:54 +0800 Message-Id: <20190628120000.40753-32-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This reverts commit c4dc2497d50d9c6fb16aa0d07b6a14f3b2adb1e0. This patch enlarges a race between normal btree flush code path and flush_btree_write(), which causes deadlock when journal space is exhausted. Reverts this patch makes the race window from 128 btree nodes to only 1 btree nodes. Fixes: c4dc2497d50d ("bcache: fix high CPU occupancy during journal") Signed-off-by: Coly Li Cc: stable@vger.kernel.org Cc: Tang Junhui --- drivers/md/bcache/bcache.h | 2 -- drivers/md/bcache/journal.c | 47 +++++++++++++++------------------------------ drivers/md/bcache/util.h | 2 -- 3 files changed, 15 insertions(+), 36 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 73a97586a2ef..cb268d7c6cea 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -726,8 +726,6 @@ struct cache_set { #define BUCKET_HASH_BITS 12 struct hlist_head bucket_hash[1 << BUCKET_HASH_BITS]; - - DECLARE_HEAP(struct btree *, flush_btree); }; struct bbio { diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 11d8c93b88bb..14a4e2c44de9 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -416,12 +416,6 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list) } /* Journalling */ -#define journal_max_cmp(l, r) \ - (fifo_idx(&c->journal.pin, btree_current_write(l)->journal) < \ - fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal)) -#define journal_min_cmp(l, r) \ - (fifo_idx(&c->journal.pin, btree_current_write(l)->journal) > \ - fifo_idx(&(c)->journal.pin, btree_current_write(r)->journal)) static void btree_flush_write(struct cache_set *c) { @@ -429,35 +423,25 @@ static void btree_flush_write(struct cache_set *c) * Try to find the btree node with that references the oldest journal * entry, best is our current candidate and is locked if non NULL: */ - struct btree *b; - int i; + struct btree *b, *best; + unsigned int i; atomic_long_inc(&c->flush_write); - retry: - spin_lock(&c->journal.lock); - if (heap_empty(&c->flush_btree)) { - for_each_cached_btree(b, c, i) - if (btree_current_write(b)->journal) { - if (!heap_full(&c->flush_btree)) - heap_add(&c->flush_btree, b, - journal_max_cmp); - else if (journal_max_cmp(b, - heap_peek(&c->flush_btree))) { - c->flush_btree.data[0] = b; - heap_sift(&c->flush_btree, 0, - journal_max_cmp); - } + best = NULL; + + for_each_cached_btree(b, c, i) + if (btree_current_write(b)->journal) { + if (!best) + best = b; + else if (journal_pin_cmp(c, + btree_current_write(best)->journal, + btree_current_write(b)->journal)) { + best = b; } + } - for (i = c->flush_btree.used / 2 - 1; i >= 0; --i) - heap_sift(&c->flush_btree, i, journal_min_cmp); - } - - b = NULL; - heap_pop(&c->flush_btree, b, journal_min_cmp); - spin_unlock(&c->journal.lock); - + b = best; if (b) { mutex_lock(&b->write_lock); if (!btree_current_write(b)->journal) { @@ -898,8 +882,7 @@ int bch_journal_alloc(struct cache_set *c) j->w[0].c = c; j->w[1].c = c; - if (!(init_heap(&c->flush_btree, 128, GFP_KERNEL)) || - !(init_fifo(&j->pin, JOURNAL_PIN, GFP_KERNEL)) || + if (!(init_fifo(&j->pin, JOURNAL_PIN, GFP_KERNEL)) || !(j->w[0].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)) || !(j->w[1].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS))) return -ENOMEM; diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h index 1fbced94e4cc..c029f7443190 100644 --- a/drivers/md/bcache/util.h +++ b/drivers/md/bcache/util.h @@ -113,8 +113,6 @@ do { \ #define heap_full(h) ((h)->used == (h)->size) -#define heap_empty(h) ((h)->used == 0) - #define DECLARE_FIFO(type, name) \ struct { \ size_t front, back, size, mask; \ From patchwork Fri Jun 28 11:59:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022103 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B2B5D14C0 for ; Fri, 28 Jun 2019 12:02:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A490328437 for ; Fri, 28 Jun 2019 12:02:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 991F3287E5; Fri, 28 Jun 2019 12:02:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 45B1428437 for ; Fri, 28 Jun 2019 12:02:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726931AbfF1MCT (ORCPT ); Fri, 28 Jun 2019 08:02:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:55058 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726807AbfF1MCT (ORCPT ); Fri, 28 Jun 2019 08:02:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 757C9B627; Fri, 28 Jun 2019 12:02:18 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 32/37] bcache: only clear BTREE_NODE_dirty bit when it is set Date: Fri, 28 Jun 2019 19:59:55 +0800 Message-Id: <20190628120000.40753-33-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In bch_btree_cache_free() and btree_node_free(), BTREE_NODE_dirty is always set no matter btree node is dirty or not. The code looks like this, if (btree_node_dirty(b)) btree_complete_write(b, btree_current_write(b)); clear_bit(BTREE_NODE_dirty, &b->flags); Indeed if btree_node_dirty(b) returns false, it means BTREE_NODE_dirty bit is cleared, then it is unnecessary to clear the bit again. This patch only clears BTREE_NODE_dirty when btree_node_dirty(b) is true (the bit is set), to save a few CPU cycles. Signed-off-by: Coly Li --- drivers/md/bcache/btree.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index cf38a1b031fa..88e5aa3fbb07 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -782,10 +782,10 @@ void bch_btree_cache_free(struct cache_set *c) while (!list_empty(&c->btree_cache)) { b = list_first_entry(&c->btree_cache, struct btree, list); - if (btree_node_dirty(b)) + if (btree_node_dirty(b)) { btree_complete_write(b, btree_current_write(b)); - clear_bit(BTREE_NODE_dirty, &b->flags); - + clear_bit(BTREE_NODE_dirty, &b->flags); + } mca_data_free(b); } @@ -1073,9 +1073,10 @@ static void btree_node_free(struct btree *b) mutex_lock(&b->write_lock); - if (btree_node_dirty(b)) + if (btree_node_dirty(b)) { btree_complete_write(b, btree_current_write(b)); - clear_bit(BTREE_NODE_dirty, &b->flags); + clear_bit(BTREE_NODE_dirty, &b->flags); + } mutex_unlock(&b->write_lock); From patchwork Fri Jun 28 11:59:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022105 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 266D414C0 for ; Fri, 28 Jun 2019 12:02:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17F6928437 for ; Fri, 28 Jun 2019 12:02:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0C858287E5; Fri, 28 Jun 2019 12:02:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AED5128437 for ; Fri, 28 Jun 2019 12:02:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727013AbfF1MCY (ORCPT ); Fri, 28 Jun 2019 08:02:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:55192 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726807AbfF1MCY (ORCPT ); Fri, 28 Jun 2019 08:02:24 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0C863B62C; Fri, 28 Jun 2019 12:02:23 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 33/37] bcache: add comments for mutex_lock(&b->write_lock) Date: Fri, 28 Jun 2019 19:59:56 +0800 Message-Id: <20190628120000.40753-34-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When accessing or modifying BTREE_NODE_dirty bit, it is not always necessary to acquire b->write_lock. In bch_btree_cache_free() and mca_reap() acquiring b->write_lock is necessary, and this patch adds comments to explain why mutex_lock(&b->write_lock) is necessary for checking or clearing BTREE_NODE_dirty bit there. Signed-off-by: Coly Li --- drivers/md/bcache/btree.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 88e5aa3fbb07..846306c3a887 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -659,6 +659,11 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush) up(&b->io_mutex); } + /* + * BTREE_NODE_dirty might be cleared in btree_flush_btree() by + * __bch_btree_node_write(). To avoid an extra flush, acquire + * b->write_lock before checking BTREE_NODE_dirty bit. + */ mutex_lock(&b->write_lock); if (btree_node_dirty(b)) __bch_btree_node_write(b, &cl); @@ -782,6 +787,11 @@ void bch_btree_cache_free(struct cache_set *c) while (!list_empty(&c->btree_cache)) { b = list_first_entry(&c->btree_cache, struct btree, list); + /* + * This function is called by cache_set_free(), no I/O + * request on cache now, it is unnecessary to acquire + * b->write_lock before clearing BTREE_NODE_dirty anymore. + */ if (btree_node_dirty(b)) { btree_complete_write(b, btree_current_write(b)); clear_bit(BTREE_NODE_dirty, &b->flags); From patchwork Fri Jun 28 11:59:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022107 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E5CD9924 for ; Fri, 28 Jun 2019 12:02:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D7D0C28437 for ; Fri, 28 Jun 2019 12:02:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CC320287E5; Fri, 28 Jun 2019 12:02:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 78D2A28437 for ; Fri, 28 Jun 2019 12:02:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726860AbfF1MC1 (ORCPT ); Fri, 28 Jun 2019 08:02:27 -0400 Received: from mx2.suse.de ([195.135.220.15]:55284 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726807AbfF1MC1 (ORCPT ); Fri, 28 Jun 2019 08:02:27 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 92B87B636; Fri, 28 Jun 2019 12:02:26 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 34/37] bcache: remove retry_flush_write from struct cache_set Date: Fri, 28 Jun 2019 19:59:57 +0800 Message-Id: <20190628120000.40753-35-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In struct cache_set, retry_flush_write is added for commit c4dc2497d50d ("bcache: fix high CPU occupancy during journal") which is reverted in previous patch. Now it is useless anymore, and this patch removes it from bcache code. Signed-off-by: Coly Li --- drivers/md/bcache/bcache.h | 1 - drivers/md/bcache/journal.c | 1 - drivers/md/bcache/sysfs.c | 5 ----- 3 files changed, 7 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index cb268d7c6cea..35396248a7d5 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -706,7 +706,6 @@ struct cache_set { atomic_long_t reclaim; atomic_long_t flush_write; - atomic_long_t retry_flush_write; enum { ON_ERROR_UNREGISTER, diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 14a4e2c44de9..1218e3cada3c 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -447,7 +447,6 @@ static void btree_flush_write(struct cache_set *c) if (!btree_current_write(b)->journal) { mutex_unlock(&b->write_lock); /* We raced */ - atomic_long_inc(&c->retry_flush_write); goto retry; } diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index d62e28643109..701a386a954c 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -83,7 +83,6 @@ read_attribute(state); read_attribute(cache_read_races); read_attribute(reclaim); read_attribute(flush_write); -read_attribute(retry_flush_write); read_attribute(writeback_keys_done); read_attribute(writeback_keys_failed); read_attribute(io_errors); @@ -709,9 +708,6 @@ SHOW(__bch_cache_set) sysfs_print(flush_write, atomic_long_read(&c->flush_write)); - sysfs_print(retry_flush_write, - atomic_long_read(&c->retry_flush_write)); - sysfs_print(writeback_keys_done, atomic_long_read(&c->writeback_keys_done)); sysfs_print(writeback_keys_failed, @@ -936,7 +932,6 @@ static struct attribute *bch_cache_set_internal_files[] = { &sysfs_cache_read_races, &sysfs_reclaim, &sysfs_flush_write, - &sysfs_retry_flush_write, &sysfs_writeback_keys_done, &sysfs_writeback_keys_failed, From patchwork Fri Jun 28 11:59:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022109 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B5F56924 for ; Fri, 28 Jun 2019 12:02:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A641B28437 for ; Fri, 28 Jun 2019 12:02:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 99340287E5; Fri, 28 Jun 2019 12:02:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0EA8F28437 for ; Fri, 28 Jun 2019 12:02:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727025AbfF1MCd (ORCPT ); Fri, 28 Jun 2019 08:02:33 -0400 Received: from mx2.suse.de ([195.135.220.15]:55390 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726807AbfF1MCd (ORCPT ); Fri, 28 Jun 2019 08:02:33 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id BA8CCB627; Fri, 28 Jun 2019 12:02:31 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li , stable@vger.kernel.org Subject: [PATCH 35/37] bcache: fix race in btree_flush_write() Date: Fri, 28 Jun 2019 19:59:58 +0800 Message-Id: <20190628120000.40753-36-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There is a race between mca_reap(), btree_node_free() and journal code btree_flush_write(), which results very rare and strange deadlock or panic and are very hard to reproduce. Let me explain how the race happens. In btree_flush_write() one btree node with oldest journal pin is selected, then it is flushed to cache device, the select-and-flush is a two steps operation. Between these two steps, there are something may happen inside the race window, - The selected btree node was reaped by mca_reap() and allocated to other requesters for other btree node. - The slected btree node was selected, flushed and released by mca shrink callback bch_mca_scan(). When btree_flush_write() tries to flush the selected btree node, firstly b->write_lock is held by mutex_lock(). If the race happens and the memory of selected btree node is allocated to other btree node, if that btree node's write_lock is held already, a deadlock very probably happens here. A worse case is the memory of the selected btree node is released, then all references to this btree node (e.g. b->write_lock) will trigger NULL pointer deference panic. This race was introduced in commit cafe56359144 ("bcache: A block layer cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU occupancy during journal"), which selected 128 btree nodes and flushed them one-by-one in a quite long time period. Such race is not easy to reproduce before. On a Lenovo SR650 server with 48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0 device assembled by 3 NVMe SSDs as backing device, this race can be observed around every 10,000 times btree_flush_write() gets called. Both deadlock and kernel panic all happened as aftermath of the race. The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It is set when selecting btree nodes, and cleared after btree nodes flushed. Then when mca_reap() selects a btree node with this bit set, this btree node will be skipped. Since mca_reap() only reaps btree node without BTREE_NODE_journal_flush flag, such race is avoided. Once corner case should be noticed, that is btree_node_free(). It might be called in some error handling code path. For example the following code piece from btree_split(), 2149 err_free2: 2150 bkey_put(b->c, &n2->key); 2151 btree_node_free(n2); 2152 rw_unlock(true, n2); 2153 err_free1: 2154 bkey_put(b->c, &n1->key); 2155 btree_node_free(n1); 2156 rw_unlock(true, n1); At line 2151 and 2155, the btree node n2 and n1 are released without mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here. If btree_node_free() is called directly in such error handling path, and the selected btree node has BTREE_NODE_journal_flush bit set, just delay for 1 us and retry again. In this case this btree node won't be skipped, just retry until the BTREE_NODE_journal_flush bit cleared, and free the btree node memory. Fixes: cafe56359144 ("bcache: A block layer cache") Signed-off-by: Coly Li Reported-and-tested-by: kbuild test robot Cc: stable@vger.kernel.org --- drivers/md/bcache/btree.c | 28 +++++++++++++++++++++++++++- drivers/md/bcache/btree.h | 2 ++ drivers/md/bcache/journal.c | 7 +++++++ 3 files changed, 36 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 846306c3a887..ba434d9ac720 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -35,7 +35,7 @@ #include #include #include - +#include #include /* @@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush) up(&b->io_mutex); } +retry: /* * BTREE_NODE_dirty might be cleared in btree_flush_btree() by * __bch_btree_node_write(). To avoid an extra flush, acquire * b->write_lock before checking BTREE_NODE_dirty bit. */ mutex_lock(&b->write_lock); + /* + * If this btree node is selected in btree_flush_write() by journal + * code, delay and retry until the node is flushed by journal code + * and BTREE_NODE_journal_flush bit cleared by btree_flush_write(). + */ + if (btree_node_journal_flush(b)) { + pr_debug("bnode %p is flushing by journal, retry", b); + mutex_unlock(&b->write_lock); + udelay(1); + goto retry; + } + if (btree_node_dirty(b)) __bch_btree_node_write(b, &cl); mutex_unlock(&b->write_lock); @@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b) BUG_ON(b == b->c->root); +retry: mutex_lock(&b->write_lock); + /* + * If the btree node is selected and flushing in btree_flush_write(), + * delay and retry until the BTREE_NODE_journal_flush bit cleared, + * then it is safe to free the btree node here. Otherwise this btree + * node will be in race condition. + */ + if (btree_node_journal_flush(b)) { + mutex_unlock(&b->write_lock); + pr_debug("bnode %p journal_flush set, retry", b); + udelay(1); + goto retry; + } if (btree_node_dirty(b)) { btree_complete_write(b, btree_current_write(b)); diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h index d1c72ef64edf..76cfd121a486 100644 --- a/drivers/md/bcache/btree.h +++ b/drivers/md/bcache/btree.h @@ -158,11 +158,13 @@ enum btree_flags { BTREE_NODE_io_error, BTREE_NODE_dirty, BTREE_NODE_write_idx, + BTREE_NODE_journal_flush, }; BTREE_FLAG(io_error); BTREE_FLAG(dirty); BTREE_FLAG(write_idx); +BTREE_FLAG(journal_flush); static inline struct btree_write *btree_current_write(struct btree *b) { diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 1218e3cada3c..a1e3e1fcea6e 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c) retry: best = NULL; + mutex_lock(&c->bucket_lock); for_each_cached_btree(b, c, i) if (btree_current_write(b)->journal) { if (!best) @@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c) } b = best; + if (b) + set_btree_node_journal_flush(b); + mutex_unlock(&c->bucket_lock); + if (b) { mutex_lock(&b->write_lock); if (!btree_current_write(b)->journal) { + clear_bit(BTREE_NODE_journal_flush, &b->flags); mutex_unlock(&b->write_lock); /* We raced */ goto retry; } __bch_btree_node_write(b, NULL); + clear_bit(BTREE_NODE_journal_flush, &b->flags); mutex_unlock(&b->write_lock); } } From patchwork Fri Jun 28 11:59:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022111 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3272C14C0 for ; Fri, 28 Jun 2019 12:02:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2233E287E3 for ; Fri, 28 Jun 2019 12:02:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 16945287EF; Fri, 28 Jun 2019 12:02:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 80F88287E3 for ; Fri, 28 Jun 2019 12:02:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726807AbfF1MCi (ORCPT ); Fri, 28 Jun 2019 08:02:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:55506 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727015AbfF1MCi (ORCPT ); Fri, 28 Jun 2019 08:02:38 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id BBCCEB62D; Fri, 28 Jun 2019 12:02:36 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 36/37] bcache: performance improvement for btree_flush_write() Date: Fri, 28 Jun 2019 19:59:59 +0800 Message-Id: <20190628120000.40753-37-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch improves performance for btree_flush_write() in following ways, - Use another spinlock journal.flush_write_lock to replace the very hot journal.lock. We don't have to use journal.lock here, selecting candidate btree nodes takes a lot of time, hold journal.lock here will block other jouranling threads and drop the overall I/O performance. - Only select flushing btree node from c->btree_cache list. When the machine has a large system memory, mca cache may have a huge number of cached btree nodes. Iterating all the cached nodes will take a lot of CPU time, and most of the nodes on c->btree_cache_freeable and c->btree_cache_freed lists are cleared and have need to flush. So only travel mca list c->btree_cache to select flushing btree node should be enough for most of the cases. - Don't iterate whole c->btree_cache list, only reversely select first BTREE_FLUSH_NR btree nodes to flush. Iterate all btree nodes from c->btree_cache and select the oldest journal pin btree nodes consumes huge number of CPU cycles if the list is huge (push and pop a node into/out of a heap is expensive). The last several dirty btree nodes on the tail of c->btree_cache list are earlest allocated and cached btree nodes, they are relative to the oldest journal pin btree nodes. Therefore only flushing BTREE_FLUSH_NR btree nodes from tail of c->btree_cache probably includes the oldest journal pin btree nodes. In my testing, the above change decreases 50%+ CPU consumption when journal space is full. Some times IOPS drops to 0 for 5-8 seconds, comparing blocking I/O for 120+ seconds in previous code, this is much better. Maybe there is room to improve in future, but at this momment the fix looks fine and performs well in my testing. Signed-off-by: Coly Li --- drivers/md/bcache/journal.c | 85 +++++++++++++++++++++++++++++++++------------ drivers/md/bcache/journal.h | 4 +++ 2 files changed, 67 insertions(+), 22 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index a1e3e1fcea6e..8bcd8f1bf8cb 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -419,47 +419,87 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list) static void btree_flush_write(struct cache_set *c) { - /* - * Try to find the btree node with that references the oldest journal - * entry, best is our current candidate and is locked if non NULL: - */ - struct btree *b, *best; - unsigned int i; + struct btree *b, *t, *btree_nodes[BTREE_FLUSH_NR]; + unsigned int i, n; + + if (c->journal.btree_flushing) + return; + + spin_lock(&c->journal.flush_write_lock); + if (c->journal.btree_flushing) { + spin_unlock(&c->journal.flush_write_lock); + return; + } + c->journal.btree_flushing = true; + spin_unlock(&c->journal.flush_write_lock); atomic_long_inc(&c->flush_write); -retry: - best = NULL; + memset(btree_nodes, 0, sizeof(btree_nodes)); + n = 0; mutex_lock(&c->bucket_lock); - for_each_cached_btree(b, c, i) - if (btree_current_write(b)->journal) { - if (!best) - best = b; - else if (journal_pin_cmp(c, - btree_current_write(best)->journal, - btree_current_write(b)->journal)) { - best = b; - } + list_for_each_entry_safe_reverse(b, t, &c->btree_cache, list) { + if (btree_node_journal_flush(b)) + pr_err("BUG: flush_write bit should not be set here!"); + + mutex_lock(&b->write_lock); + + if (!btree_node_dirty(b)) { + mutex_unlock(&b->write_lock); + continue; + } + + if (!btree_current_write(b)->journal) { + mutex_unlock(&b->write_lock); + continue; } - b = best; - if (b) set_btree_node_journal_flush(b); + + mutex_unlock(&b->write_lock); + + btree_nodes[n++] = b; + if (n == BTREE_FLUSH_NR) + break; + } mutex_unlock(&c->bucket_lock); - if (b) { + for (i = 0; i < n; i++) { + b = btree_nodes[i]; + if (!b) { + pr_err("BUG: btree_nodes[%d] is NULL", i); + continue; + } + + /* safe to check without holding b->write_lock */ + if (!btree_node_journal_flush(b)) { + pr_err("BUG: bnode %p: journal_flush bit cleaned", b); + continue; + } + mutex_lock(&b->write_lock); if (!btree_current_write(b)->journal) { clear_bit(BTREE_NODE_journal_flush, &b->flags); mutex_unlock(&b->write_lock); - /* We raced */ - goto retry; + pr_debug("bnode %p: written by others", b); + continue; + } + + if (!btree_node_dirty(b)) { + clear_bit(BTREE_NODE_journal_flush, &b->flags); + mutex_unlock(&b->write_lock); + pr_debug("bnode %p: dirty bit cleaned by others", b); + continue; } __bch_btree_node_write(b, NULL); clear_bit(BTREE_NODE_journal_flush, &b->flags); mutex_unlock(&b->write_lock); } + + spin_lock(&c->journal.flush_write_lock); + c->journal.btree_flushing = false; + spin_unlock(&c->journal.flush_write_lock); } #define last_seq(j) ((j)->seq - fifo_used(&(j)->pin) + 1) @@ -881,6 +921,7 @@ int bch_journal_alloc(struct cache_set *c) struct journal *j = &c->journal; spin_lock_init(&j->lock); + spin_lock_init(&j->flush_write_lock); INIT_DELAYED_WORK(&j->work, journal_write_work); c->journal_delay_ms = 100; diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h index 66f0facff84b..f2ea34d5f431 100644 --- a/drivers/md/bcache/journal.h +++ b/drivers/md/bcache/journal.h @@ -103,6 +103,8 @@ struct journal_write { /* Embedded in struct cache_set */ struct journal { spinlock_t lock; + spinlock_t flush_write_lock; + bool btree_flushing; /* used when waiting because the journal was full */ struct closure_waitlist wait; struct closure io; @@ -154,6 +156,8 @@ struct journal_device { struct bio_vec bv[8]; }; +#define BTREE_FLUSH_NR 8 + #define journal_pin_cmp(c, l, r) \ (fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r))) From patchwork Fri Jun 28 12:00:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Coly Li X-Patchwork-Id: 11022113 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E7B514C0 for ; Fri, 28 Jun 2019 12:02:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2042828437 for ; Fri, 28 Jun 2019 12:02:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 14C66287E5; Fri, 28 Jun 2019 12:02:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A921728437 for ; Fri, 28 Jun 2019 12:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726587AbfF1MCl (ORCPT ); Fri, 28 Jun 2019 08:02:41 -0400 Received: from mx2.suse.de ([195.135.220.15]:55534 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727015AbfF1MCl (ORCPT ); Fri, 28 Jun 2019 08:02:41 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 09BC7B62D; Fri, 28 Jun 2019 12:02:40 +0000 (UTC) From: Coly Li To: axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, Coly Li Subject: [PATCH 37/37] bcache: add reclaimed_journal_buckets to struct cache_set Date: Fri, 28 Jun 2019 20:00:00 +0800 Message-Id: <20190628120000.40753-38-colyli@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190628120000.40753-1-colyli@suse.de> References: <20190628120000.40753-1-colyli@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now we have counters for how many times jouranl is reclaimed, how many times cached dirty btree nodes are flushed, but we don't know how many jouranl buckets are really reclaimed. This patch adds reclaimed_journal_buckets into struct cache_set, this is an increasing only counter, to tell how many journal buckets are reclaimed since cache set runs. From all these three counters (reclaim, reclaimed_journal_buckets, flush_write), we can have idea how well current journal space reclaim code works. Signed-off-by: Coly Li --- drivers/md/bcache/bcache.h | 1 + drivers/md/bcache/journal.c | 1 + drivers/md/bcache/sysfs.c | 5 +++++ 3 files changed, 7 insertions(+) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 35396248a7d5..013e35a9e317 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -705,6 +705,7 @@ struct cache_set { atomic_long_t writeback_keys_failed; atomic_long_t reclaim; + atomic_long_t reclaimed_journal_buckets; atomic_long_t flush_write; enum { diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 8bcd8f1bf8cb..be2a2a201603 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -614,6 +614,7 @@ static void journal_reclaim(struct cache_set *c) k->ptr[n++] = MAKE_PTR(0, bucket_to_sector(c, ca->sb.d[ja->cur_idx]), ca->sb.nr_this_dev); + atomic_long_inc(&c->reclaimed_journal_buckets); } if (n) { diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index 701a386a954c..9f0826712845 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -82,6 +82,7 @@ read_attribute(bset_tree_stats); read_attribute(state); read_attribute(cache_read_races); read_attribute(reclaim); +read_attribute(reclaimed_journal_buckets); read_attribute(flush_write); read_attribute(writeback_keys_done); read_attribute(writeback_keys_failed); @@ -705,6 +706,9 @@ SHOW(__bch_cache_set) sysfs_print(reclaim, atomic_long_read(&c->reclaim)); + sysfs_print(reclaimed_journal_buckets, + atomic_long_read(&c->reclaimed_journal_buckets)); + sysfs_print(flush_write, atomic_long_read(&c->flush_write)); @@ -931,6 +935,7 @@ static struct attribute *bch_cache_set_internal_files[] = { &sysfs_bset_tree_stats, &sysfs_cache_read_races, &sysfs_reclaim, + &sysfs_reclaimed_journal_buckets, &sysfs_flush_write, &sysfs_writeback_keys_done, &sysfs_writeback_keys_failed,