[06/19] bcache: treat stale && dirty keys as bad keys
diff mbox series

Message ID 20190209045311.15677-7-colyli@suse.de
State New
Headers show
Series
  • bcache patches for Linux v5.1
Related show

Commit Message

Coly Li Feb. 9, 2019, 4:52 a.m. UTC
From: Tang Junhui <tang.junhui.linux@gmail.com>

Stale && dirty keys can be produced in the follow way:
After writeback in write_dirty_finish(), dirty keys k1 will
replace by clean keys k2
==>ret = bch_btree_insert(dc->disk.c, &keys, NULL, &w->key);
==>btree_insert_fn(struct btree_op *b_op, struct btree *b)
==>static int bch_btree_insert_node(struct btree *b,
       struct btree_op *op,
       struct keylist *insert_keys,
       atomic_t *journal_ref,
Then two steps:
A) update k1 to k2 in btree node memory;
   bch_btree_insert_keys(b, op, insert_keys, replace_key)
B) Write the bset(contains k2) to cache disk by a 30s delay work
   bch_btree_leaf_dirty(b, journal_ref).
But before the 30s delay work write the bset to cache device,
these things happened:
A) GC works, and reclaim the bucket k2 point to;
B) Allocator works, and invalidate the bucket k2 point to,
   and increase the gen of the bucket, and place it into free_inc
   fifo;
C) Until now, the 30s delay work still does not finish work,
   so in the disk, the key still is k1, it is dirty and stale
   (its gen is smaller than the gen of the bucket). and then the
   machine power off suddenly happens;
D) When the machine power on again, after the btree reconstruction,
   the stale dirty key appear.

In bch_extent_bad(), when expensive_debug_checks is off, it would
treat the dirty key as good even it is stale keys, and it would
cause bellow probelms:
A) In read_dirty() it would cause machine crash:
   BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
B) It could be worse when reads hits stale dirty keys, it would
   read old incorrect data.

This patch tolerate the existence of these stale && dirty keys,
and treat them as bad key in bch_extent_bad().

(Coly Li: fix indent which was modified by sender's email client)

Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/extents.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Coly Li Feb. 12, 2019, 4:42 p.m. UTC | #1
On 2019/2/12 9:28 下午, Sasha Levin wrote:
> Hi,
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
> 
> The bot has tested the following trees: v4.20.7, v4.19.20, v4.14.98, v4.9.155, v4.4.173, v3.18.134.
> 
> v4.20.7: Build OK!
> v4.19.20: Failed to apply! Possible dependencies:
>     149d0efada77 ("bcache: replace hard coded number with BUCKET_GC_GEN_MAX")
> 
> v4.14.98: Failed to apply! Possible dependencies:
>     1d316e658374 ("bcache: implement PI controller for writeback rate")
>     25d8be77e192 ("block: move bio_alloc_pages() to bcache")
>     27a40ab9269e ("bcache: add backing_request_endio() for bi_end_io")
>     2831231d4c3f ("bcache: reduce cache_set devices iteration by devices_max_used")
>     3b304d24a718 ("bcache: convert cached_dev.count from atomic_t to refcount_t")
>     3fd47bfe55b0 ("bcache: stop dc->writeback_rate_update properly")
>     5138ac6748e3 ("bcache: fix misleading error message in bch_count_io_errors()")
>     539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()")
>     5fa89fb9a86b ("bcache: don't write back data if reading it failed")
>     6f10f7d1b02b ("bcache: style fix to replace 'unsigned' by 'unsigned int'")
>     771f393e8ffc ("bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags")
>     7ba0d830dc0e ("bcache: set error_limit correctly")
>     7e027ca4b534 ("bcache: add stop_when_cache_set_failed option to backing device")
>     804f3c6981f5 ("bcache: fix cached_dev->count usage for bch_cache_set_error()")
>     a8500fc816b1 ("bcache: rearrange writeback main thread ratelimit")
>     b1092c9af9ed ("bcache: allow quick writeback when backing idle")
>     bc082a55d25c ("bcache: fix inaccurate io state for detached bcache devices")
>     c7b7bd07404c ("bcache: add io_disable to struct cached_dev")
> 
> v4.9.155: Failed to apply! Possible dependencies:
>     1d316e658374 ("bcache: implement PI controller for writeback rate")
>     2831231d4c3f ("bcache: reduce cache_set devices iteration by devices_max_used")
>     297e3d854784 ("blk-throttle: make throtl_slice tunable")
>     3fd47bfe55b0 ("bcache: stop dc->writeback_rate_update properly")
>     4e4cbee93d56 ("block: switch bios to blk_status_t")
>     5138ac6748e3 ("bcache: fix misleading error message in bch_count_io_errors()")
>     6f10f7d1b02b ("bcache: style fix to replace 'unsigned' by 'unsigned int'")
>     7e027ca4b534 ("bcache: add stop_when_cache_set_failed option to backing device")
>     87760e5eef35 ("block: hook up writeback throttling")
>     9e234eeafbe1 ("blk-throttle: add a simple idle detection")
>     c7b7bd07404c ("bcache: add io_disable to struct cached_dev")
>     cf43e6be865a ("block: add scalable completion tracking of requests")
>     e806402130c9 ("block: split out request-only flags into a new namespace")
>     fbbaf700e7b1 ("block: trace completion of all bios.")
> 
> v4.4.173: Failed to apply! Possible dependencies:
>     005411ea7ee7 ("doc: update block/queue-sysfs.txt entries")
>     1d316e658374 ("bcache: implement PI controller for writeback rate")
>     27489a3c827b ("blk-mq: turn hctx->run_work into a regular work struct")
>     2831231d4c3f ("bcache: reduce cache_set devices iteration by devices_max_used")
>     297e3d854784 ("blk-throttle: make throtl_slice tunable")
>     38f8baae8905 ("block: factor out chained bio completion")
>     3fd47bfe55b0 ("bcache: stop dc->writeback_rate_update properly")
>     4e4cbee93d56 ("block: switch bios to blk_status_t")
>     511cbce2ff8b ("irq_poll: make blk-iopoll available outside the block layer")
>     5138ac6748e3 ("bcache: fix misleading error message in bch_count_io_errors()")
>     6f10f7d1b02b ("bcache: style fix to replace 'unsigned' by 'unsigned int'")
>     7e027ca4b534 ("bcache: add stop_when_cache_set_failed option to backing device")
>     87760e5eef35 ("block: hook up writeback throttling")
>     8d354f133e86 ("blk-mq: improve layout of blk_mq_hw_ctx")
>     9467f85960a3 ("blk-mq/cpu-notif: Convert to new hotplug state machine")
>     9e234eeafbe1 ("blk-throttle: add a simple idle detection")
>     af3e3a5259e3 ("block: don't unecessarily clobber bi_error for chained bios")
>     ba8c6967b739 ("block: cleanup bio_endio")
>     c7b7bd07404c ("bcache: add io_disable to struct cached_dev")
>     cf43e6be865a ("block: add scalable completion tracking of requests")
>     e57690fe009b ("blk-mq: don't overwrite rq->mq_ctx")
>     fbbaf700e7b1 ("block: trace completion of all bios.")
> 
> v3.18.134: Failed to apply! Possible dependencies:
>     0f8087ecdeac ("block: Consolidate static integrity profile properties")
>     1b94b5567e9c ("Btrfs, raid56: use a variant to record the operation type")
>     1d316e658374 ("bcache: implement PI controller for writeback rate")
>     2831231d4c3f ("bcache: reduce cache_set devices iteration by devices_max_used")
>     2c8cdd6ee4e7 ("Btrfs, replace: write dirty pages into the replace target device")
>     326e1dbb5736 ("block: remove management of bi_remaining when restoring original bi_end_io")
>     3fd47bfe55b0 ("bcache: stop dc->writeback_rate_update properly")
>     4246a0b63bd8 ("block: add a bi_error field to struct bio")
>     4e4cbee93d56 ("block: switch bios to blk_status_t")
>     5138ac6748e3 ("bcache: fix misleading error message in bch_count_io_errors()")
>     5a6ac9eacb49 ("Btrfs, raid56: support parity scrub on raid56")
>     6e9606d2a2dc ("Btrfs: add ref_count and free function for btrfs_bio")
>     6f10f7d1b02b ("bcache: style fix to replace 'unsigned' by 'unsigned int'")
>     7e027ca4b534 ("bcache: add stop_when_cache_set_failed option to backing device")
>     8e5cfb55d3f7 ("Btrfs: Make raid_map array be inlined in btrfs_bio structure")
>     af8e2d1df984 ("Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted")
>     b89e1b012c7f ("Btrfs, raid56: don't change bbio and raid_map")
>     c4cf5261f8bf ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
>     c7b7bd07404c ("bcache: add io_disable to struct cached_dev")
>     f90523d1aa3c ("Btrfs: remove noused bbio_ret in __btrfs_map_block in condition")
> 
> 
> How should we proceed with this patch?

Can we rebase this patch for stable kernels, and send them to stable
kernel maintainers separately? If this behavior is accepted for stable
kernel maintenance, I would suggest Junhui Tang to think about to rebase
for these stable kernels.

Thanks.

Patch
diff mbox series

diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c
index 956004366699..886710043025 100644
--- a/drivers/md/bcache/extents.c
+++ b/drivers/md/bcache/extents.c
@@ -538,6 +538,7 @@  static bool bch_extent_bad(struct btree_keys *bk, const struct bkey *k)
 {
 	struct btree *b = container_of(bk, struct btree, keys);
 	unsigned int i, stale;
+	char buf[80];
 
 	if (!KEY_PTRS(k) ||
 	    bch_extent_invalid(bk, k))
@@ -547,19 +548,19 @@  static bool bch_extent_bad(struct btree_keys *bk, const struct bkey *k)
 		if (!ptr_available(b->c, k, i))
 			return true;
 
-	if (!expensive_debug_checks(b->c) && KEY_DIRTY(k))
-		return false;
-
 	for (i = 0; i < KEY_PTRS(k); i++) {
 		stale = ptr_stale(b->c, k, i);
 
+		if (stale && KEY_DIRTY(k)) {
+			bch_extent_to_text(buf, sizeof(buf), k);
+			pr_info("stale dirty pointer, stale %u, key: %s",
+				stale, buf);
+		}
+
 		btree_bug_on(stale > BUCKET_GC_GEN_MAX, b,
 			     "key too stale: %i, need_gc %u",
 			     stale, b->c->need_gc);
 
-		btree_bug_on(stale && KEY_DIRTY(k) && KEY_SIZE(k),
-			     b, "stale dirty pointer");
-
 		if (stale)
 			return true;