diff mbox series

[-next,1/3] md/raid10: fix improper BUG_ON() in raise_barrier()

Message ID 20220829131502.165356-2-yukuai1@huaweicloud.com (mailing list archive)
State New, archived
Delegated to: Song Liu
Headers show
Series md/raid10: reduce lock contention for io | expand

Commit Message

Yu Kuai Aug. 29, 2022, 1:15 p.m. UTC
From: Yu Kuai <yukuai3@huawei.com>

'conf->barrier' is protected by 'conf->resync_lock', reading
'conf->barrier' without holding the lock is wrong.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid10.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

John Stoffel Aug. 29, 2022, 7:53 p.m. UTC | #1
>>>>> "Yu" == Yu Kuai <yukuai1@huaweicloud.com> writes:

Yu> From: Yu Kuai <yukuai3@huawei.com>
Yu> 'conf->barrier' is protected by 'conf->resync_lock', reading
Yu> 'conf->barrier' without holding the lock is wrong.

Yu> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Yu> ---
Yu>  drivers/md/raid10.c | 2 +-
Yu>  1 file changed, 1 insertion(+), 1 deletion(-)

Yu> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
Yu> index 9117fcdee1be..b70c207f7932 100644
Yu> --- a/drivers/md/raid10.c
Yu> +++ b/drivers/md/raid10.c
Yu> @@ -930,8 +930,8 @@ static void flush_pending_writes(struct r10conf *conf)
 
Yu>  static void raise_barrier(struct r10conf *conf, int force)
Yu>  {
Yu> -	BUG_ON(force && !conf->barrier);
Yu>  	spin_lock_irq(&conf->resync_lock);
Yu> +	BUG_ON(force && !conf->barrier);

I don't like this BUG_ON() at all, why are you crashing the system
here instead of just doing a simple WARN_ONCE() instead?  Is there
anything the user can do to get into this situation on their own, or
does it really signify a logic error in the code?  If so, why are you
killing the system?


 
Yu>  	/* Wait until no block IO is waiting (unless 'force') */
Yu>  	wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting,
Yu> -- 
Yu> 2.31.1
Yu Kuai Aug. 30, 2022, 1:01 a.m. UTC | #2
Hi, John

在 2022/08/30 3:53, John Stoffel 写道:
>>>>>> "Yu" == Yu Kuai <yukuai1@huaweicloud.com> writes:
> 
> Yu> From: Yu Kuai <yukuai3@huawei.com>
> Yu> 'conf->barrier' is protected by 'conf->resync_lock', reading
> Yu> 'conf->barrier' without holding the lock is wrong.
> 
> Yu> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> Yu> ---
> Yu>  drivers/md/raid10.c | 2 +-
> Yu>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Yu> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> Yu> index 9117fcdee1be..b70c207f7932 100644
> Yu> --- a/drivers/md/raid10.c
> Yu> +++ b/drivers/md/raid10.c
> Yu> @@ -930,8 +930,8 @@ static void flush_pending_writes(struct r10conf *conf)
>   
> Yu>  static void raise_barrier(struct r10conf *conf, int force)
> Yu>  {
> Yu> -	BUG_ON(force && !conf->barrier);
> Yu>  	spin_lock_irq(&conf->resync_lock);
> Yu> +	BUG_ON(force && !conf->barrier);
> 
> I don't like this BUG_ON() at all, why are you crashing the system
> here instead of just doing a simple WARN_ONCE() instead?  Is there
> anything the user can do to get into this situation on their own, or
> does it really signify a logic error in the code?  If so, why are you
> killing the system?

I'm not sure why to use the BUG_ON() here. I just noticed that
'conf->barrier' is read without holding 'resync_lock', and BUG_ON() can
be triggered false positive.

Thanks,
Kuai
> 
> 
>   
> Yu>  	/* Wait until no block IO is waiting (unless 'force') */
> Yu>  	wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting,
> Yu> --
> Yu> 2.31.1
> 
> 
> .
>
Paul Menzel Aug. 30, 2022, 6:32 a.m. UTC | #3
Dear John,


Am 29.08.22 um 21:53 schrieb John Stoffel:
>>>>>> "Yu" == Yu Kuai <yukuai1@huaweicloud.com> writes:
> 
> Yu> From: Yu Kuai <yukuai3@huawei.com>

The quoting style is really confusing, as it does not seem to be the 
standard, and a lot of MUAs won’t mark up the citation.

[…]

> Yu> 'conf->barrier' is protected by 'conf->resync_lock', reading
> Yu> 'conf->barrier' without holding the lock is wrong.
> 
> Yu> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> Yu> ---
> Yu>  drivers/md/raid10.c | 2 +-
> Yu>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Yu> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> Yu> index 9117fcdee1be..b70c207f7932 100644
> Yu> --- a/drivers/md/raid10.c
> Yu> +++ b/drivers/md/raid10.c
> Yu> @@ -930,8 +930,8 @@ static void flush_pending_writes(struct r10conf *conf)
>   
> Yu>  static void raise_barrier(struct r10conf *conf, int force)
> Yu>  {
> Yu> -	BUG_ON(force && !conf->barrier);
> Yu>  	spin_lock_irq(&conf->resync_lock);
> Yu> +	BUG_ON(force && !conf->barrier);
> 
> I don't like this BUG_ON() at all, why are you crashing the system
> here instead of just doing a simple WARN_ONCE() instead?  Is there
> anything the user can do to get into this situation on their own, or
> does it really signify a logic error in the code?  If so, why are you
> killing the system?

As you can see, the BUG_ON() was there before, so it’s unrelated to this 
patch and Yun is not killing anything.

[…]


> Yu>  	/* Wait until no block IO is waiting (unless 'force') */
> Yu>  	wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting,
> Yu> --
> Yu> 2.31.1


Kind regards,

Paul
diff mbox series

Patch

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 9117fcdee1be..b70c207f7932 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -930,8 +930,8 @@  static void flush_pending_writes(struct r10conf *conf)
 
 static void raise_barrier(struct r10conf *conf, int force)
 {
-	BUG_ON(force && !conf->barrier);
 	spin_lock_irq(&conf->resync_lock);
+	BUG_ON(force && !conf->barrier);
 
 	/* Wait until no block IO is waiting (unless 'force') */
 	wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting,