diff mbox series

[v2] md/raid5: fix atomicity violation in raid5_cache_count

Message ID 20231222045224.4439-1-2045gemini@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] md/raid5: fix atomicity violation in raid5_cache_count | expand

Commit Message

Gui-Dong Han Dec. 22, 2023, 4:52 a.m. UTC
In raid5_cache_count():
	if (conf->max_nr_stripes < conf->min_nr_stripes)
		return 0;
	return conf->max_nr_stripes - conf->min_nr_stripes;
The current check is ineffective, as the values could change immediately
after being checked.

In raid5_set_cache_size():
	...
	conf->min_nr_stripes = size;
	...
	while (size > conf->max_nr_stripes)
		conf->min_nr_stripes = conf->max_nr_stripes;
	...

Due to intermediate value updates in raid5_set_cache_size(), concurrent
execution of raid5_cache_count() and raid5_set_cache_size() may lead to
inconsistent reads of conf->max_nr_stripes and conf->min_nr_stripes.
The current checks are ineffective as values could change immediately
after being checked, raising the risk of conf->min_nr_stripes exceeding
conf->max_nr_stripes and potentially causing an integer overflow.

This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs to extract
function pairs that can be concurrently executed, and then analyzes the
instructions in the paired functions to identify possible concurrency bugs
including data races and atomicity violations. The above possible bug is
reported when our tool analyzes the source code of Linux 6.2.

To resolve this issue, it is suggested to introduce local variables
'min_stripes' and 'max_stripes' in raid5_cache_count() to ensure the
values remain stable throughout the check. Adding locks in
raid5_cache_count() fails to resolve atomicity violations, as
raid5_set_cache_size() may hold intermediate values of
conf->min_nr_stripes while unlocked. With this patch applied, our tool no
longer reports the bug, with the kernel configuration allyesconfig for
x86_64. Due to the lack of associated hardware, we cannot test the patch
in runtime testing, and just verify it according to the code logic.

Fixes: edbe83ab4c27e ("md/raid5: allow the stripe_cache to grow and ...")
Reported-by: BassCheck <bass@buaa.edu.cn>
Signed-off-by: Gui-Dong Han <2045gemini@gmail.com>

---
v2:
* In this patch v2, we've updated to use READ_ONCE() instead of direct
reads for accessing max_nr_stripes and min_nr_stripes, since read and
write can concurrent.
  Thank Yu Kuai for helpful advice.
---
 drivers/md/raid5.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Yu Kuai Dec. 22, 2023, 9:15 a.m. UTC | #1
Hi,

在 2023/12/22 12:52, Gui-Dong Han 写道:
> In raid5_cache_count():
> 	if (conf->max_nr_stripes < conf->min_nr_stripes)
> 		return 0;
> 	return conf->max_nr_stripes - conf->min_nr_stripes;
> The current check is ineffective, as the values could change immediately
> after being checked.
> 
> In raid5_set_cache_size():
> 	...
> 	conf->min_nr_stripes = size;
> 	...
> 	while (size > conf->max_nr_stripes)
> 		conf->min_nr_stripes = conf->max_nr_stripes;
> 	...
> 
> Due to intermediate value updates in raid5_set_cache_size(), concurrent
> execution of raid5_cache_count() and raid5_set_cache_size() may lead to
> inconsistent reads of conf->max_nr_stripes and conf->min_nr_stripes.
> The current checks are ineffective as values could change immediately
> after being checked, raising the risk of conf->min_nr_stripes exceeding
> conf->max_nr_stripes and potentially causing an integer overflow.
> 
> This possible bug is found by an experimental static analysis tool
> developed by our team. This tool analyzes the locking APIs to extract
> function pairs that can be concurrently executed, and then analyzes the
> instructions in the paired functions to identify possible concurrency bugs
> including data races and atomicity violations. The above possible bug is
> reported when our tool analyzes the source code of Linux 6.2.
> 
> To resolve this issue, it is suggested to introduce local variables
> 'min_stripes' and 'max_stripes' in raid5_cache_count() to ensure the
> values remain stable throughout the check. Adding locks in
> raid5_cache_count() fails to resolve atomicity violations, as
> raid5_set_cache_size() may hold intermediate values of
> conf->min_nr_stripes while unlocked. With this patch applied, our tool no
> longer reports the bug, with the kernel configuration allyesconfig for
> x86_64. Due to the lack of associated hardware, we cannot test the patch
> in runtime testing, and just verify it according to the code logic.
> 
> Fixes: edbe83ab4c27e ("md/raid5: allow the stripe_cache to grow and ...")
> Reported-by: BassCheck <bass@buaa.edu.cn>
> Signed-off-by: Gui-Dong Han <2045gemini@gmail.com>
> 
> ---
> v2:
> * In this patch v2, we've updated to use READ_ONCE() instead of direct
> reads for accessing max_nr_stripes and min_nr_stripes, since read and
> write can concurrent.
>    Thank Yu Kuai for helpful advice.
> ---
>   drivers/md/raid5.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 8497880135ee..9037e46de0e2 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7391,10 +7391,12 @@ static unsigned long raid5_cache_count(struct shrinker *shrink,
>   {
>   	struct r5conf *conf = shrink->private_data;
>   
> -	if (conf->max_nr_stripes < conf->min_nr_stripes)
> +	int max_stripes = READ_ONCE(conf->max_nr_stripes);
> +	int min_stripes = READ_ONCE(conf->min_nr_stripes);

READ_ONCE() itself is meaningless, it should pair with WRITE_ONCE(),
this will prevent reading abnormal value in some arch. Please also
update raid5_set_cache_size(), grow_one_stripe() and drop_one_stripe()
to use WRITE_ONCE(). (setup_conf() is not necessary).

Thanks,
Kuai

> +	if (max_stripes < min_stripes)
>   		/* unlikely, but not impossible */
>   		return 0;
> -	return conf->max_nr_stripes - conf->min_nr_stripes;
> +	return max_stripes - min_stripes;
>   }
>   
>   static struct r5conf *setup_conf(struct mddev *mddev)
>
Gui-Dong Han Jan. 11, 2024, 1:51 a.m. UTC | #2
Dear All:

I hope this email finds you well. I hope you haven't missed my previous 
email, as I understand that everyone has a busy schedule. I just wanted 
to follow up on my previous message sent.
I understand that you may be occupied with other tasks or priorities. 
However, I would greatly appreciate it if you could spare a few moments 
to check the patch in my previous email. Your prompt response would be 
highly valuable to me.
Thank you for your attention to this matter, and I look forward to 
hearing from you soon.

Thanks,
Han

On 22/12/2023 下午12:52, Gui-Dong Han wrote:
> In raid5_cache_count():
> 	if (conf->max_nr_stripes < conf->min_nr_stripes)
> 		return 0;
> 	return conf->max_nr_stripes - conf->min_nr_stripes;
> The current check is ineffective, as the values could change immediately
> after being checked.
>
> In raid5_set_cache_size():
> 	...
> 	conf->min_nr_stripes = size;
> 	...
> 	while (size > conf->max_nr_stripes)
> 		conf->min_nr_stripes = conf->max_nr_stripes;
> 	...
>
> Due to intermediate value updates in raid5_set_cache_size(), concurrent
> execution of raid5_cache_count() and raid5_set_cache_size() may lead to
> inconsistent reads of conf->max_nr_stripes and conf->min_nr_stripes.
> The current checks are ineffective as values could change immediately
> after being checked, raising the risk of conf->min_nr_stripes exceeding
> conf->max_nr_stripes and potentially causing an integer overflow.
>
> This possible bug is found by an experimental static analysis tool
> developed by our team. This tool analyzes the locking APIs to extract
> function pairs that can be concurrently executed, and then analyzes the
> instructions in the paired functions to identify possible concurrency bugs
> including data races and atomicity violations. The above possible bug is
> reported when our tool analyzes the source code of Linux 6.2.
>
> To resolve this issue, it is suggested to introduce local variables
> 'min_stripes' and 'max_stripes' in raid5_cache_count() to ensure the
> values remain stable throughout the check. Adding locks in
> raid5_cache_count() fails to resolve atomicity violations, as
> raid5_set_cache_size() may hold intermediate values of
> conf->min_nr_stripes while unlocked. With this patch applied, our tool no
> longer reports the bug, with the kernel configuration allyesconfig for
> x86_64. Due to the lack of associated hardware, we cannot test the patch
> in runtime testing, and just verify it according to the code logic.
>
> Fixes: edbe83ab4c27e ("md/raid5: allow the stripe_cache to grow and ...")
> Reported-by: BassCheck <bass@buaa.edu.cn>
> Signed-off-by: Gui-Dong Han <2045gemini@gmail.com>
>
> ---
> v2:
> * In this patch v2, we've updated to use READ_ONCE() instead of direct
> reads for accessing max_nr_stripes and min_nr_stripes, since read and
> write can concurrent.
>    Thank Yu Kuai for helpful advice.
> ---
>   drivers/md/raid5.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 8497880135ee..9037e46de0e2 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7391,10 +7391,12 @@ static unsigned long raid5_cache_count(struct shrinker *shrink,
>   {
>   	struct r5conf *conf = shrink->private_data;
>   
> -	if (conf->max_nr_stripes < conf->min_nr_stripes)
> +	int max_stripes = READ_ONCE(conf->max_nr_stripes);
> +	int min_stripes = READ_ONCE(conf->min_nr_stripes);
> +	if (max_stripes < min_stripes)
>   		/* unlikely, but not impossible */
>   		return 0;
> -	return conf->max_nr_stripes - conf->min_nr_stripes;
> +	return max_stripes - min_stripes;
>   }
>   
>   static struct r5conf *setup_conf(struct mddev *mddev)
Yu Kuai Jan. 11, 2024, 1:56 a.m. UTC | #3
Hi,

在 2024/01/11 9:51, Gui-Dong Han 写道:
> Dear All:
> 
> I hope this email finds you well. I hope you haven't missed my previous 
> email, as I understand that everyone has a busy schedule. I just wanted 
> to follow up on my previous message sent.
> I understand that you may be occupied with other tasks or priorities. 
> However, I would greatly appreciate it if you could spare a few moments 
> to check the patch in my previous email. Your prompt response would be 
> highly valuable to me.
> Thank you for your attention to this matter, and I look forward to 
> hearing from you soon.

I already replied:

https://lore.kernel.org/all/97363298-7aa1-cd42-d2cf-c7e2bbeb179f@huaweicloud.com/

You might need to figure out why if you don't receive the email.

Thanks,
Kuai

> 
> Thanks,
> Han
> 
> On 22/12/2023 下午12:52, Gui-Dong Han wrote:
>> In raid5_cache_count():
>>     if (conf->max_nr_stripes < conf->min_nr_stripes)
>>         return 0;
>>     return conf->max_nr_stripes - conf->min_nr_stripes;
>> The current check is ineffective, as the values could change immediately
>> after being checked.
>>
>> In raid5_set_cache_size():
>>     ...
>>     conf->min_nr_stripes = size;
>>     ...
>>     while (size > conf->max_nr_stripes)
>>         conf->min_nr_stripes = conf->max_nr_stripes;
>>     ...
>>
>> Due to intermediate value updates in raid5_set_cache_size(), concurrent
>> execution of raid5_cache_count() and raid5_set_cache_size() may lead to
>> inconsistent reads of conf->max_nr_stripes and conf->min_nr_stripes.
>> The current checks are ineffective as values could change immediately
>> after being checked, raising the risk of conf->min_nr_stripes exceeding
>> conf->max_nr_stripes and potentially causing an integer overflow.
>>
>> This possible bug is found by an experimental static analysis tool
>> developed by our team. This tool analyzes the locking APIs to extract
>> function pairs that can be concurrently executed, and then analyzes the
>> instructions in the paired functions to identify possible concurrency 
>> bugs
>> including data races and atomicity violations. The above possible bug is
>> reported when our tool analyzes the source code of Linux 6.2.
>>
>> To resolve this issue, it is suggested to introduce local variables
>> 'min_stripes' and 'max_stripes' in raid5_cache_count() to ensure the
>> values remain stable throughout the check. Adding locks in
>> raid5_cache_count() fails to resolve atomicity violations, as
>> raid5_set_cache_size() may hold intermediate values of
>> conf->min_nr_stripes while unlocked. With this patch applied, our tool no
>> longer reports the bug, with the kernel configuration allyesconfig for
>> x86_64. Due to the lack of associated hardware, we cannot test the patch
>> in runtime testing, and just verify it according to the code logic.
>>
>> Fixes: edbe83ab4c27e ("md/raid5: allow the stripe_cache to grow and ...")
>> Reported-by: BassCheck <bass@buaa.edu.cn>
>> Signed-off-by: Gui-Dong Han <2045gemini@gmail.com>
>>
>> ---
>> v2:
>> * In this patch v2, we've updated to use READ_ONCE() instead of direct
>> reads for accessing max_nr_stripes and min_nr_stripes, since read and
>> write can concurrent.
>>    Thank Yu Kuai for helpful advice.
>> ---
>>   drivers/md/raid5.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 8497880135ee..9037e46de0e2 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -7391,10 +7391,12 @@ static unsigned long raid5_cache_count(struct 
>> shrinker *shrink,
>>   {
>>       struct r5conf *conf = shrink->private_data;
>> -    if (conf->max_nr_stripes < conf->min_nr_stripes)
>> +    int max_stripes = READ_ONCE(conf->max_nr_stripes);
>> +    int min_stripes = READ_ONCE(conf->min_nr_stripes);
>> +    if (max_stripes < min_stripes)
>>           /* unlikely, but not impossible */
>>           return 0;
>> -    return conf->max_nr_stripes - conf->min_nr_stripes;
>> +    return max_stripes - min_stripes;
>>   }
>>   static struct r5conf *setup_conf(struct mddev *mddev)
> .
>
Gui-Dong Han Jan. 11, 2024, 2:09 a.m. UTC | #4
Dear Kuai,

Due to unknown reasons, I did not see your response in my Gmail. I came 
across it on Patchwork and understand the need to pair READ_ONCE() with 
WRITE_ONCE(). I'll make these changes and submit patch v3 soon. Thank 
you for your guidance.

Best,
Han

On 11/1/2024 上午9:51, Gui-Dong Han wrote:
> Dear All:
>
> I hope this email finds you well. I hope you haven't missed my 
> previous email, as I understand that everyone has a busy schedule. I 
> just wanted to follow up on my previous message sent.
> I understand that you may be occupied with other tasks or priorities. 
> However, I would greatly appreciate it if you could spare a few 
> moments to check the patch in my previous email. Your prompt response 
> would be highly valuable to me.
> Thank you for your attention to this matter, and I look forward to 
> hearing from you soon.
>
> Thanks,
> Han
>
> On 22/12/2023 下午12:52, Gui-Dong Han wrote:
>> In raid5_cache_count():
>>     if (conf->max_nr_stripes < conf->min_nr_stripes)
>>         return 0;
>>     return conf->max_nr_stripes - conf->min_nr_stripes;
>> The current check is ineffective, as the values could change immediately
>> after being checked.
>>
>> In raid5_set_cache_size():
>>     ...
>>     conf->min_nr_stripes = size;
>>     ...
>>     while (size > conf->max_nr_stripes)
>>         conf->min_nr_stripes = conf->max_nr_stripes;
>>     ...
>>
>> Due to intermediate value updates in raid5_set_cache_size(), concurrent
>> execution of raid5_cache_count() and raid5_set_cache_size() may lead to
>> inconsistent reads of conf->max_nr_stripes and conf->min_nr_stripes.
>> The current checks are ineffective as values could change immediately
>> after being checked, raising the risk of conf->min_nr_stripes exceeding
>> conf->max_nr_stripes and potentially causing an integer overflow.
>>
>> This possible bug is found by an experimental static analysis tool
>> developed by our team. This tool analyzes the locking APIs to extract
>> function pairs that can be concurrently executed, and then analyzes the
>> instructions in the paired functions to identify possible concurrency 
>> bugs
>> including data races and atomicity violations. The above possible bug is
>> reported when our tool analyzes the source code of Linux 6.2.
>>
>> To resolve this issue, it is suggested to introduce local variables
>> 'min_stripes' and 'max_stripes' in raid5_cache_count() to ensure the
>> values remain stable throughout the check. Adding locks in
>> raid5_cache_count() fails to resolve atomicity violations, as
>> raid5_set_cache_size() may hold intermediate values of
>> conf->min_nr_stripes while unlocked. With this patch applied, our 
>> tool no
>> longer reports the bug, with the kernel configuration allyesconfig for
>> x86_64. Due to the lack of associated hardware, we cannot test the patch
>> in runtime testing, and just verify it according to the code logic.
>>
>> Fixes: edbe83ab4c27e ("md/raid5: allow the stripe_cache to grow and 
>> ...")
>> Reported-by: BassCheck <bass@buaa.edu.cn>
>> Signed-off-by: Gui-Dong Han <2045gemini@gmail.com>
>>
>> ---
>> v2:
>> * In this patch v2, we've updated to use READ_ONCE() instead of direct
>> reads for accessing max_nr_stripes and min_nr_stripes, since read and
>> write can concurrent.
>>    Thank Yu Kuai for helpful advice.
>> ---
>>   drivers/md/raid5.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 8497880135ee..9037e46de0e2 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -7391,10 +7391,12 @@ static unsigned long raid5_cache_count(struct 
>> shrinker *shrink,
>>   {
>>       struct r5conf *conf = shrink->private_data;
>>   -    if (conf->max_nr_stripes < conf->min_nr_stripes)
>> +    int max_stripes = READ_ONCE(conf->max_nr_stripes);
>> +    int min_stripes = READ_ONCE(conf->min_nr_stripes);
>> +    if (max_stripes < min_stripes)
>>           /* unlikely, but not impossible */
>>           return 0;
>> -    return conf->max_nr_stripes - conf->min_nr_stripes;
>> +    return max_stripes - min_stripes;
>>   }
>>     static struct r5conf *setup_conf(struct mddev *mddev)
diff mbox series

Patch

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8497880135ee..9037e46de0e2 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7391,10 +7391,12 @@  static unsigned long raid5_cache_count(struct shrinker *shrink,
 {
 	struct r5conf *conf = shrink->private_data;
 
-	if (conf->max_nr_stripes < conf->min_nr_stripes)
+	int max_stripes = READ_ONCE(conf->max_nr_stripes);
+	int min_stripes = READ_ONCE(conf->min_nr_stripes);
+	if (max_stripes < min_stripes)
 		/* unlikely, but not impossible */
 		return 0;
-	return conf->max_nr_stripes - conf->min_nr_stripes;
+	return max_stripes - min_stripes;
 }
 
 static struct r5conf *setup_conf(struct mddev *mddev)