diff mbox series

[01/12] bcache: fix fifo index swapping condition in journal_pin_cmp()

Message ID 20191113080326.69989-2-colyli@suse.de (mailing list archive)
State New, archived
Headers show
Series bcache patches for Linux v5.5 | expand

Commit Message

Coly Li Nov. 13, 2019, 8:03 a.m. UTC
Fifo structure journal.pin is implemented by a cycle buffer, if the back
index reaches highest location of the cycle buffer, it will be swapped
to 0. Once the swapping happens, it means a smaller fifo index might be
associated to a newer journal entry. So the btree node with oldest
journal entry won't be selected in bch_btree_leaf_dirty() to reference
the dirty B+tree leaf node. This problem may cause bcache journal won't
protect unflushed oldest B+tree dirty leaf node in power failure, and
this B+tree leaf node is possible to beinconsistent after reboot from
power failure.

This patch fixes the fifo index comparing logic in journal_pin_cmp(),
to avoid potential corrupted B+tree leaf node when the back index of
journal pin is swapped.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/btree.c   | 26 ++++++++++++++++++++++++++
 drivers/md/bcache/journal.h |  4 ----
 2 files changed, 26 insertions(+), 4 deletions(-)

Comments

Coly Li Nov. 18, 2019, 3:28 p.m. UTC | #1
On 2019/11/13 4:03 下午, Coly Li wrote:
> Fifo structure journal.pin is implemented by a cycle buffer, if the back
> index reaches highest location of the cycle buffer, it will be swapped
> to 0. Once the swapping happens, it means a smaller fifo index might be
> associated to a newer journal entry. So the btree node with oldest
> journal entry won't be selected in bch_btree_leaf_dirty() to reference
> the dirty B+tree leaf node. This problem may cause bcache journal won't
> protect unflushed oldest B+tree dirty leaf node in power failure, and
> this B+tree leaf node is possible to beinconsistent after reboot from
> power failure.
> 
> This patch fixes the fifo index comparing logic in journal_pin_cmp(),
> to avoid potential corrupted B+tree leaf node when the back index of
> journal pin is swapped.
> 
> Signed-off-by: Coly Li <colyli@suse.de>

Hi Jens,

Guoju Fang talked to me today, he told me this change was unnecessary
and I was over-thought.

Then I realize fifo_idx() uses a mask to handle the array index overflow
condition, so the index swap in journal_pin_cmp() won't happen. And yes,
Guoju and Kent are correct.

Since you already applied this patch, can you please to remove this
patch from your for-next branch ? This single patch does not break
thing, but it is unecessary at this moment.

Thanks.

Coly Li

> ---
>  drivers/md/bcache/btree.c   | 26 ++++++++++++++++++++++++++
>  drivers/md/bcache/journal.h |  4 ----
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index ba434d9ac720..00523cd1db80 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -528,6 +528,32 @@ static void btree_node_write_work(struct work_struct *w)
>  	mutex_unlock(&b->write_lock);
>  }
>  
> +/* return true if journal pin 'l' is newer than 'r' */
> +static bool journal_pin_cmp(struct cache_set *c,
> +			    atomic_t *l,
> +			    atomic_t *r)
> +{
> +	int l_idx, r_idx, f_idx, b_idx;
> +	bool ret = false;
> +
> +	l_idx = fifo_idx(&(c)->journal.pin, (l));
> +	r_idx = fifo_idx(&(c)->journal.pin, (r));
> +	f_idx = (c)->journal.pin.front;
> +	b_idx = (c)->journal.pin.back;
> +
> +	if (l_idx > r_idx)
> +		ret = true;
> +	/* in case fifo back pointer is swapped */
> +	if (b_idx < f_idx) {
> +		if (l_idx <= b_idx && r_idx >= f_idx)
> +			ret = true;
> +		else if (l_idx >= f_idx && r_idx <= b_idx)
> +			ret = false;
> +	}
> +
> +	return ret;
> +}
> +
>  static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
>  {
>  	struct bset *i = btree_bset_last(b);
> diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h
> index f2ea34d5f431..06b3eaab7d16 100644
> --- a/drivers/md/bcache/journal.h
> +++ b/drivers/md/bcache/journal.h
> @@ -157,10 +157,6 @@ struct journal_device {
>  };
>  
>  #define BTREE_FLUSH_NR	8
> -
> -#define journal_pin_cmp(c, l, r)				\
> -	(fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r)))
> -
>  #define JOURNAL_PIN	20000
>  
>  #define journal_full(j)						\
>
Jens Axboe Nov. 18, 2019, 3:36 p.m. UTC | #2
On 11/18/19 8:28 AM, Coly Li wrote:
> On 2019/11/13 4:03 下午, Coly Li wrote:
>> Fifo structure journal.pin is implemented by a cycle buffer, if the back
>> index reaches highest location of the cycle buffer, it will be swapped
>> to 0. Once the swapping happens, it means a smaller fifo index might be
>> associated to a newer journal entry. So the btree node with oldest
>> journal entry won't be selected in bch_btree_leaf_dirty() to reference
>> the dirty B+tree leaf node. This problem may cause bcache journal won't
>> protect unflushed oldest B+tree dirty leaf node in power failure, and
>> this B+tree leaf node is possible to beinconsistent after reboot from
>> power failure.
>>
>> This patch fixes the fifo index comparing logic in journal_pin_cmp(),
>> to avoid potential corrupted B+tree leaf node when the back index of
>> journal pin is swapped.
>>
>> Signed-off-by: Coly Li <colyli@suse.de>
> 
> Hi Jens,
> 
> Guoju Fang talked to me today, he told me this change was unnecessary
> and I was over-thought.
> 
> Then I realize fifo_idx() uses a mask to handle the array index overflow
> condition, so the index swap in journal_pin_cmp() won't happen. And yes,
> Guoju and Kent are correct.
> 
> Since you already applied this patch, can you please to remove this
> patch from your for-next branch ? This single patch does not break
> thing, but it is unecessary at this moment.

Sure, done.
diff mbox series

Patch

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index ba434d9ac720..00523cd1db80 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -528,6 +528,32 @@  static void btree_node_write_work(struct work_struct *w)
 	mutex_unlock(&b->write_lock);
 }
 
+/* return true if journal pin 'l' is newer than 'r' */
+static bool journal_pin_cmp(struct cache_set *c,
+			    atomic_t *l,
+			    atomic_t *r)
+{
+	int l_idx, r_idx, f_idx, b_idx;
+	bool ret = false;
+
+	l_idx = fifo_idx(&(c)->journal.pin, (l));
+	r_idx = fifo_idx(&(c)->journal.pin, (r));
+	f_idx = (c)->journal.pin.front;
+	b_idx = (c)->journal.pin.back;
+
+	if (l_idx > r_idx)
+		ret = true;
+	/* in case fifo back pointer is swapped */
+	if (b_idx < f_idx) {
+		if (l_idx <= b_idx && r_idx >= f_idx)
+			ret = true;
+		else if (l_idx >= f_idx && r_idx <= b_idx)
+			ret = false;
+	}
+
+	return ret;
+}
+
 static void bch_btree_leaf_dirty(struct btree *b, atomic_t *journal_ref)
 {
 	struct bset *i = btree_bset_last(b);
diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h
index f2ea34d5f431..06b3eaab7d16 100644
--- a/drivers/md/bcache/journal.h
+++ b/drivers/md/bcache/journal.h
@@ -157,10 +157,6 @@  struct journal_device {
 };
 
 #define BTREE_FLUSH_NR	8
-
-#define journal_pin_cmp(c, l, r)				\
-	(fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r)))
-
 #define JOURNAL_PIN	20000
 
 #define journal_full(j)						\