diff mbox series

[v2] The value may overflow

Message ID 20230904094251.64022-1-arefev@swemel.ru (mailing list archive)
State Accepted
Commit ed083b0e22f1396dee3599896249a3f218845298
Headers show
Series [v2] The value may overflow | expand

Commit Message

Denis Arefev Sept. 4, 2023, 9:42 a.m. UTC
The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
is subject to overflow due to a failure to cast operands to a larger
data type before performing arithmetic

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Denis Arefev <arefev@swemel.ru>
---
v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by 
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
 kernel/rcu/srcutree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Mathieu Desnoyers Sept. 4, 2023, 10:24 a.m. UTC | #1
On 9/4/23 05:42, Denis Arefev wrote:
> The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> is subject to overflow due to a failure to cast operands to a larger
> data type before performing arithmetic

The patch title should identify more precisely its context, e.g.:

"srcu: Fix srcu_struct node grpmask overflow on 64-bit systems"

Also, as I stated in my reply to the previous version, the patch commit
message should describe the impact of the bug it fixes.

Thanks,

Mathieu


> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> Signed-off-by: Denis Arefev <arefev@swemel.ru>
> ---
> v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>   kernel/rcu/srcutree.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 20d7a238d675..6c18e6005ae1 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
>   				snp->grplo = cpu;
>   			snp->grphi = cpu;
>   		}
> -		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> +		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
>   	}
>   	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
>   	return true;
> @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
>   	int cpu;
>   
>   	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> -		if (!(mask & (1 << (cpu - snp->grplo))))
> +		if (!(mask & (1UL << (cpu - snp->grplo))))
>   			continue;
>   		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>   	}
David Laight Sept. 5, 2023, 9:31 a.m. UTC | #2
From: Mathieu Desnoyers
> Sent: 04 September 2023 11:24
> 
> On 9/4/23 05:42, Denis Arefev wrote:
> > The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> > is subject to overflow due to a failure to cast operands to a larger
> > data type before performing arithmetic
> 
> The patch title should identify more precisely its context, e.g.:
> 
> "srcu: Fix srcu_struct node grpmask overflow on 64-bit systems"
> 
> Also, as I stated in my reply to the previous version, the patch commit
> message should describe the impact of the bug it fixes.

And is the analysis complete?
Is 1UL right for 32bit archs??
Is 64 bits even enough??

	David

> 
> Thanks,
> 
> Mathieu
> 
> 
> >
> > Found by Linux Verification Center (linuxtesting.org) with SVACE.
> >
> > Signed-off-by: Denis Arefev <arefev@swemel.ru>
> > ---
> > v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >   kernel/rcu/srcutree.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 20d7a238d675..6c18e6005ae1 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
> >   				snp->grplo = cpu;
> >   			snp->grphi = cpu;
> >   		}
> > -		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> > +		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
> >   	}
> >   	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
> >   	return true;
> > @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
> >   	int cpu;
> >
> >   	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> > -		if (!(mask & (1 << (cpu - snp->grplo))))
> > +		if (!(mask & (1UL << (cpu - snp->grplo))))
> >   			continue;
> >   		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> >   	}
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Mathieu Desnoyers Sept. 5, 2023, 12:26 p.m. UTC | #3
On 9/5/23 05:31, David Laight wrote:
> From: Mathieu Desnoyers
>> Sent: 04 September 2023 11:24
>>
>> On 9/4/23 05:42, Denis Arefev wrote:
>>> The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
>>> is subject to overflow due to a failure to cast operands to a larger
>>> data type before performing arithmetic
>>
>> The patch title should identify more precisely its context, e.g.:
>>
>> "srcu: Fix srcu_struct node grpmask overflow on 64-bit systems"
>>
>> Also, as I stated in my reply to the previous version, the patch commit
>> message should describe the impact of the bug it fixes.
> 
> And is the analysis complete?
> Is 1UL right for 32bit archs??
> Is 64 bits even enough??

I understand from include/linux/rcu_node_tree.h and kernel/rcu/Kconfig 
RCU_FANOUT and RCU_FANOUT_LEAF ranges that a 32-bit integer is 
sufficient to hold the mask on 32-bit architectures, and a 64-bit 
integer is enough on 64-bit architectures given the current implementation.

At least this appears to be the intent. I did not do a thorough analysis 
of the various parameter limits.

Thanks,

Mathieu

> 
> 	David
> 
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>>>
>>> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>>>
>>> Signed-off-by: Denis Arefev <arefev@swemel.ru>
>>> ---
>>> v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by
>>> Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>    kernel/rcu/srcutree.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
>>> index 20d7a238d675..6c18e6005ae1 100644
>>> --- a/kernel/rcu/srcutree.c
>>> +++ b/kernel/rcu/srcutree.c
>>> @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
>>>    				snp->grplo = cpu;
>>>    			snp->grphi = cpu;
>>>    		}
>>> -		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
>>> +		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
>>>    	}
>>>    	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
>>>    	return true;
>>> @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
>>>    	int cpu;
>>>
>>>    	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
>>> -		if (!(mask & (1 << (cpu - snp->grplo))))
>>> +		if (!(mask & (1UL << (cpu - snp->grplo))))
>>>    			continue;
>>>    		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>>>    	}
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Paul E. McKenney Sept. 5, 2023, 1:26 p.m. UTC | #4
On Tue, Sep 05, 2023 at 08:26:51AM -0400, Mathieu Desnoyers wrote:
> On 9/5/23 05:31, David Laight wrote:
> > From: Mathieu Desnoyers
> > > Sent: 04 September 2023 11:24
> > > 
> > > On 9/4/23 05:42, Denis Arefev wrote:
> > > > The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> > > > is subject to overflow due to a failure to cast operands to a larger
> > > > data type before performing arithmetic
> > > 
> > > The patch title should identify more precisely its context, e.g.:
> > > 
> > > "srcu: Fix srcu_struct node grpmask overflow on 64-bit systems"
> > > 
> > > Also, as I stated in my reply to the previous version, the patch commit
> > > message should describe the impact of the bug it fixes.
> > 
> > And is the analysis complete?
> > Is 1UL right for 32bit archs??
> > Is 64 bits even enough??
> 
> I understand from include/linux/rcu_node_tree.h and kernel/rcu/Kconfig
> RCU_FANOUT and RCU_FANOUT_LEAF ranges that a 32-bit integer is sufficient to
> hold the mask on 32-bit architectures, and a 64-bit integer is enough on
> 64-bit architectures given the current implementation.
> 
> At least this appears to be the intent. I did not do a thorough analysis of
> the various parameter limits.

Mathieu has it right.

32-bit kernels are unaffected by this bug.

RCU_FANOUT_LEAF defaults to 16, which means that a 64-bit kernel would
need more than 32 leaf rcu_node structures for the parent rcu_node
structure to need more than 32 bit to track its children.  This means
that more than 32*16=512 CPUs are required for this bug to affect 64-bit
systems.  And there really are systems this big, so I am surprised that
this has not shown up long ago.  But it would not be the first time that
objective reality surprised me.  ;-)

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> > 
> > 	David
> > 
> > > 
> > > Thanks,
> > > 
> > > Mathieu
> > > 
> > > 
> > > > 
> > > > Found by Linux Verification Center (linuxtesting.org) with SVACE.
> > > > 
> > > > Signed-off-by: Denis Arefev <arefev@swemel.ru>
> > > > ---
> > > > v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by
> > > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > >    kernel/rcu/srcutree.c | 4 ++--
> > > >    1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > > index 20d7a238d675..6c18e6005ae1 100644
> > > > --- a/kernel/rcu/srcutree.c
> > > > +++ b/kernel/rcu/srcutree.c
> > > > @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
> > > >    				snp->grplo = cpu;
> > > >    			snp->grphi = cpu;
> > > >    		}
> > > > -		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> > > > +		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
> > > >    	}
> > > >    	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
> > > >    	return true;
> > > > @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
> > > >    	int cpu;
> > > > 
> > > >    	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> > > > -		if (!(mask & (1 << (cpu - snp->grplo))))
> > > > +		if (!(mask & (1UL << (cpu - snp->grplo))))
> > > >    			continue;
> > > >    		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> > > >    	}
> > > 
> > > --
> > > Mathieu Desnoyers
> > > EfficiOS Inc.
> > > https://www.efficios.com
> > 
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>
Mathieu Desnoyers Sept. 5, 2023, 1:34 p.m. UTC | #5
On 9/5/23 09:26, Paul E. McKenney wrote:
> On Tue, Sep 05, 2023 at 08:26:51AM -0400, Mathieu Desnoyers wrote:
>> On 9/5/23 05:31, David Laight wrote:
>>> From: Mathieu Desnoyers
>>>> Sent: 04 September 2023 11:24
>>>>
>>>> On 9/4/23 05:42, Denis Arefev wrote:
>>>>> The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
>>>>> is subject to overflow due to a failure to cast operands to a larger
>>>>> data type before performing arithmetic
>>>>
>>>> The patch title should identify more precisely its context, e.g.:
>>>>
>>>> "srcu: Fix srcu_struct node grpmask overflow on 64-bit systems"
>>>>
>>>> Also, as I stated in my reply to the previous version, the patch commit
>>>> message should describe the impact of the bug it fixes.
>>>
>>> And is the analysis complete?
>>> Is 1UL right for 32bit archs??
>>> Is 64 bits even enough??
>>
>> I understand from include/linux/rcu_node_tree.h and kernel/rcu/Kconfig
>> RCU_FANOUT and RCU_FANOUT_LEAF ranges that a 32-bit integer is sufficient to
>> hold the mask on 32-bit architectures, and a 64-bit integer is enough on
>> 64-bit architectures given the current implementation.
>>
>> At least this appears to be the intent. I did not do a thorough analysis of
>> the various parameter limits.
> 
> Mathieu has it right.
> 
> 32-bit kernels are unaffected by this bug.
> 
> RCU_FANOUT_LEAF defaults to 16, which means that a 64-bit kernel would
> need more than 32 leaf rcu_node structures for the parent rcu_node
> structure to need more than 32 bit to track its children.  This means
> that more than 32*16=512 CPUs are required for this bug to affect 64-bit
> systems.  And there really are systems this big, so I am surprised that
> this has not shown up long ago.  But it would not be the first time that
> objective reality surprised me.  ;-)

So with a 64-bit kernel, RCU_FANOUT_LEAF=16, if we have exactly 32 leaf 
rcu_node structures (exactly 512 CPUs), we end up in the situation where 
the type promotion from signed integer (32-bit) to unsigned long will 
carry the sign, and thus create a mask of 0xffffffff80000000.

So if this weird mask is indeed an issue we should state that 
configurations _starting from 512 CPUs_ are affected, not just those 
with more than 512 CPUs.

Thanks,

Mathieu


> 
> 							Thanx, Paul
> 
>> Thanks,
>>
>> Mathieu
>>
>>>
>>> 	David
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>
>>>>>
>>>>> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>>>>>
>>>>> Signed-off-by: Denis Arefev <arefev@swemel.ru>
>>>>> ---
>>>>> v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by
>>>>> Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>>     kernel/rcu/srcutree.c | 4 ++--
>>>>>     1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
>>>>> index 20d7a238d675..6c18e6005ae1 100644
>>>>> --- a/kernel/rcu/srcutree.c
>>>>> +++ b/kernel/rcu/srcutree.c
>>>>> @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
>>>>>     				snp->grplo = cpu;
>>>>>     			snp->grphi = cpu;
>>>>>     		}
>>>>> -		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
>>>>> +		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
>>>>>     	}
>>>>>     	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
>>>>>     	return true;
>>>>> @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
>>>>>     	int cpu;
>>>>>
>>>>>     	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
>>>>> -		if (!(mask & (1 << (cpu - snp->grplo))))
>>>>> +		if (!(mask & (1UL << (cpu - snp->grplo))))
>>>>>     			continue;
>>>>>     		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>>>>>     	}
>>>>
>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> https://www.efficios.com
>>>
>>> -
>>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
>>> Registration No: 1397386 (Wales)
>>
>> -- 
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
>>
Paul E. McKenney Sept. 5, 2023, 1:56 p.m. UTC | #6
On Tue, Sep 05, 2023 at 09:34:45AM -0400, Mathieu Desnoyers wrote:
> On 9/5/23 09:26, Paul E. McKenney wrote:
> > On Tue, Sep 05, 2023 at 08:26:51AM -0400, Mathieu Desnoyers wrote:
> > > On 9/5/23 05:31, David Laight wrote:
> > > > From: Mathieu Desnoyers
> > > > > Sent: 04 September 2023 11:24
> > > > > 
> > > > > On 9/4/23 05:42, Denis Arefev wrote:
> > > > > > The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> > > > > > is subject to overflow due to a failure to cast operands to a larger
> > > > > > data type before performing arithmetic
> > > > > 
> > > > > The patch title should identify more precisely its context, e.g.:
> > > > > 
> > > > > "srcu: Fix srcu_struct node grpmask overflow on 64-bit systems"
> > > > > 
> > > > > Also, as I stated in my reply to the previous version, the patch commit
> > > > > message should describe the impact of the bug it fixes.
> > > > 
> > > > And is the analysis complete?
> > > > Is 1UL right for 32bit archs??
> > > > Is 64 bits even enough??
> > > 
> > > I understand from include/linux/rcu_node_tree.h and kernel/rcu/Kconfig
> > > RCU_FANOUT and RCU_FANOUT_LEAF ranges that a 32-bit integer is sufficient to
> > > hold the mask on 32-bit architectures, and a 64-bit integer is enough on
> > > 64-bit architectures given the current implementation.
> > > 
> > > At least this appears to be the intent. I did not do a thorough analysis of
> > > the various parameter limits.
> > 
> > Mathieu has it right.
> > 
> > 32-bit kernels are unaffected by this bug.
> > 
> > RCU_FANOUT_LEAF defaults to 16, which means that a 64-bit kernel would
> > need more than 32 leaf rcu_node structures for the parent rcu_node
> > structure to need more than 32 bit to track its children.  This means
> > that more than 32*16=512 CPUs are required for this bug to affect 64-bit
> > systems.  And there really are systems this big, so I am surprised that
> > this has not shown up long ago.  But it would not be the first time that
> > objective reality surprised me.  ;-)
> 
> So with a 64-bit kernel, RCU_FANOUT_LEAF=16, if we have exactly 32 leaf
> rcu_node structures (exactly 512 CPUs), we end up in the situation where the
> type promotion from signed integer (32-bit) to unsigned long will carry the
> sign, and thus create a mask of 0xffffffff80000000.

If there is someplace that does a shift of 1UL, yes, this would be a
problem.  I don't see one, but I might have missed something.  If all
the shifts are from 1, then the top 33 bits would be always set and
cleared as a unit, as if they were one bit.

> So if this weird mask is indeed an issue we should state that configurations
> _starting from 512 CPUs_ are affected, not just those with more than 512
> CPUs.

That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
only require a 31-bit shift, which should be OK, but 497 would require
a 32-bit shift, which would result in sign extension.  If it turns out
that sign extension is OK, then we should get in trouble at 513 CPUs,
which would result in a 33-bit shift (and is that even defined in C?).

But this can be tested.  Let's try setting RCU_FANOUT_LEAF to 2.  Then we
"only" need 64 (maybe 62) CPUs to trigger the bug.  And I happen to
have access to a system with 80 CPUs.  (Those with smaller systems can
try to trigger this bug by changing the current "1 <<" to (say) "4 <<",
which should trigger it on 16-CPU systems.)

I just fired off an 80-CPU test with CONFIG_RCU_FANOUT_LEAF=2 and will
let you know how it goes.

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > 							Thanx, Paul
> > 
> > > Thanks,
> > > 
> > > Mathieu
> > > 
> > > > 
> > > > 	David
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Mathieu
> > > > > 
> > > > > 
> > > > > > 
> > > > > > Found by Linux Verification Center (linuxtesting.org) with SVACE.
> > > > > > 
> > > > > > Signed-off-by: Denis Arefev <arefev@swemel.ru>
> > > > > > ---
> > > > > > v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by
> > > > > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > > >     kernel/rcu/srcutree.c | 4 ++--
> > > > > >     1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > > > > index 20d7a238d675..6c18e6005ae1 100644
> > > > > > --- a/kernel/rcu/srcutree.c
> > > > > > +++ b/kernel/rcu/srcutree.c
> > > > > > @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
> > > > > >     				snp->grplo = cpu;
> > > > > >     			snp->grphi = cpu;
> > > > > >     		}
> > > > > > -		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> > > > > > +		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
> > > > > >     	}
> > > > > >     	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
> > > > > >     	return true;
> > > > > > @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
> > > > > >     	int cpu;
> > > > > > 
> > > > > >     	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> > > > > > -		if (!(mask & (1 << (cpu - snp->grplo))))
> > > > > > +		if (!(mask & (1UL << (cpu - snp->grplo))))
> > > > > >     			continue;
> > > > > >     		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> > > > > >     	}
> > > > > 
> > > > > --
> > > > > Mathieu Desnoyers
> > > > > EfficiOS Inc.
> > > > > https://www.efficios.com
> > > > 
> > > > -
> > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > > > Registration No: 1397386 (Wales)
> > > 
> > > -- 
> > > Mathieu Desnoyers
> > > EfficiOS Inc.
> > > https://www.efficios.com
> > > 
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>
David Laight Sept. 5, 2023, 2:15 p.m. UTC | #7
...
> That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
> only require a 31-bit shift, which should be OK, but 497 would require
> a 32-bit shift, which would result in sign extension.  If it turns out
> that sign extension is OK, then we should get in trouble at 513 CPUs,
> which would result in a 33-bit shift (and is that even defined in C?).

Not quite right :-)

(1 << 31) is int and negative, that gets sign extended before
being converted to 'unsigned long' - so has the top 33 bits set.

(1 << 32) is undefined, the current x86 cpu ignore the high
shift bits so it is (1 << 0).

If the mask is being used to optimise a search the code might
just happen to work!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Mathieu Desnoyers Sept. 5, 2023, 2:34 p.m. UTC | #8
On 9/5/23 10:15, David Laight wrote:
> ...
>> That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
>> only require a 31-bit shift, which should be OK, but 497 would require
>> a 32-bit shift, which would result in sign extension.  If it turns out
>> that sign extension is OK, then we should get in trouble at 513 CPUs,
>> which would result in a 33-bit shift (and is that even defined in C?).
> 
> Not quite right :-)
> 
> (1 << 31) is int and negative, that gets sign extended before
> being converted to 'unsigned long' - so has the top 33 bits set.
> 
> (1 << 32) is undefined, the current x86 cpu ignore the high
> shift bits so it is (1 << 0).

Yes, I was about to reply the same thing. A shift of 31 is buggy,
because shifting 1 << 31 raises the sign bit, which sets the top 33
bits when cast to unsigned long. A shift of 1 << 32 is undefined,
with for instance x86 choosing to ignore the top bit.

But in order to have a 1 << 31 shift from this expression:

                 sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);

I *think* we need the group to have 32 cpus or more (indexed within
the group from grplo to grplo + 31 (both inclusive)).

So as soon as we have one group with 32 cpus, the problem should show
up. With FANOUT_LEAF=16, we can have 15 groups of 31 cpus and 1
group of 32 cpus, for:

   15*31 + 32 = 497 cpus.

AFAIU, this would be the minimum value for smp_processor_id()+1 which
triggers this issue.

Thanks,

Mathieu

> 
> If the mask is being used to optimise a search the code might
> just happen to work!
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
Paul E. McKenney Sept. 5, 2023, 4:40 p.m. UTC | #9
On Tue, Sep 05, 2023 at 10:34:43AM -0400, Mathieu Desnoyers wrote:
> On 9/5/23 10:15, David Laight wrote:
> > ...
> > > That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
> > > only require a 31-bit shift, which should be OK, but 497 would require
> > > a 32-bit shift, which would result in sign extension.  If it turns out
> > > that sign extension is OK, then we should get in trouble at 513 CPUs,
> > > which would result in a 33-bit shift (and is that even defined in C?).
> > 
> > Not quite right :-)
> > 
> > (1 << 31) is int and negative, that gets sign extended before
> > being converted to 'unsigned long' - so has the top 33 bits set.

Good point, thank you for the correction.

> > (1 << 32) is undefined, the current x86 cpu ignore the high
> > shift bits so it is (1 << 0).

Which would cause it to attempt to invoke SRCU callbacks on the
lowest-numbered CPU associated with that srcu_node structure.

> Yes, I was about to reply the same thing. A shift of 31 is buggy,
> because shifting 1 << 31 raises the sign bit, which sets the top 33
> bits when cast to unsigned long. A shift of 1 << 32 is undefined,
> with for instance x86 choosing to ignore the top bit.
> 
> But in order to have a 1 << 31 shift from this expression:
> 
>                 sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> 
> I *think* we need the group to have 32 cpus or more (indexed within
> the group from grplo to grplo + 31 (both inclusive)).
> 
> So as soon as we have one group with 32 cpus, the problem should show
> up. With FANOUT_LEAF=16, we can have 15 groups of 31 cpus and 1
> group of 32 cpus, for:
> 
>   15*31 + 32 = 497 cpus.
> 
> AFAIU, this would be the minimum value for smp_processor_id()+1 which
> triggers this issue.

By default, there are 16 CPUs per leaf srcu_node structure.  Each leaf
srcu_node structure takes up one bit in the parent srcu_node structures'
srcu_data_have_cbs[] array.  Up to 30 bits is OK, 31 bits is questionable,
and 32 bits and larger erroneous.

This is the situation as I see it (assuming dense CPU numbering):

	# Leaf srcu_node		Largest
	structures	#CPUs		CPU #		Status

	0-30		0-480		479		Good
	31		481-496		495		Questionable
	32-		497-		496+		Bad.

Tree SRCU differs from Tree RCU in its operation, so this bug should
not hold up SRCU grace periods.  It might instead cause SRCU callbacks
to be ignored (which would admittedly look quite similar to the user).

My attempts to cheat the numbering system ran up against the limited
height of the srcu_node tree.

But there is still the question of why this has not been seen in the
wild, given that there really are systems with more than 479 CPUs
out there.  One possibility is the call to srcu_schedule_cbs_sdp()
from srcu_funnel_gp_start().  But it does not seem likely that this
would happen all that often, as it requires back-to-back grace periods
and then some.

Maybe people with large systems boot with srcutree.convert_to_big=0?

Adding Laurent for his thoughts.

Aha!

Here is what makes it work, given David's description of 1<<32:

static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp,
				  unsigned long mask, unsigned long delay)
{
	int cpu;

	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
		if (!(mask & (1 << (cpu - snp->grplo))))
			continue;
		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
	}
}

As long as at least one bit is set in the result of 1<<N, and as long
as the compiler always does the same thing for a given N, then this loop
will make the right thing happen.

But even that is not relied upon, because the calling code looks like
this:

			spin_lock_irq_rcu_node(snp);
			cbs = false;
			last_lvl = snp >= sup->level[rcu_num_lvls - 1];
			if (last_lvl)
				cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
			snp->srcu_have_cbs[idx] = gpseq;
			rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
			sgsne = snp->srcu_gp_seq_needed_exp;
			if (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, gpseq))
				WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
			if (ss_state < SRCU_SIZE_BIG)
				mask = ~0;
			else
				mask = snp->srcu_data_have_cbs[idx];
			snp->srcu_data_have_cbs[idx] = 0;
			spin_unlock_irq_rcu_node(snp);
			if (cbs)
				srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);

So that last_lvl check means that the srcu_schedule_cbs_snp() function
is invoked only for leaf srcu_node structures.  Which by default limit
the shift to 16.

So this bug appears to have no effect for default RCU setups, even on very
large 64-bit systems, which is consistent with field experience.  Even if
this is the case, it still should be fixed, to avoid confusion if nothing
else.  Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
Which actually happened the other day due to someone trusting ChatGPT's
opinion about RCU Kconfig options...

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> > 
> > If the mask is being used to optimise a search the code might
> > just happen to work!
> > 
> > 	David
> > 
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> > 
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>
Paul E. McKenney Sept. 5, 2023, 7:27 p.m. UTC | #10
On Tue, Sep 05, 2023 at 09:40:51AM -0700, Paul E. McKenney wrote:
> On Tue, Sep 05, 2023 at 10:34:43AM -0400, Mathieu Desnoyers wrote:
> > On 9/5/23 10:15, David Laight wrote:
> > > ...
> > > > That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
> > > > only require a 31-bit shift, which should be OK, but 497 would require
> > > > a 32-bit shift, which would result in sign extension.  If it turns out
> > > > that sign extension is OK, then we should get in trouble at 513 CPUs,
> > > > which would result in a 33-bit shift (and is that even defined in C?).
> > > 
> > > Not quite right :-)
> > > 
> > > (1 << 31) is int and negative, that gets sign extended before
> > > being converted to 'unsigned long' - so has the top 33 bits set.
> 
> Good point, thank you for the correction.
> 
> > > (1 << 32) is undefined, the current x86 cpu ignore the high
> > > shift bits so it is (1 << 0).
> 
> Which would cause it to attempt to invoke SRCU callbacks on the
> lowest-numbered CPU associated with that srcu_node structure.
> 
> > Yes, I was about to reply the same thing. A shift of 31 is buggy,
> > because shifting 1 << 31 raises the sign bit, which sets the top 33
> > bits when cast to unsigned long. A shift of 1 << 32 is undefined,
> > with for instance x86 choosing to ignore the top bit.
> > 
> > But in order to have a 1 << 31 shift from this expression:
> > 
> >                 sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> > 
> > I *think* we need the group to have 32 cpus or more (indexed within
> > the group from grplo to grplo + 31 (both inclusive)).
> > 
> > So as soon as we have one group with 32 cpus, the problem should show
> > up. With FANOUT_LEAF=16, we can have 15 groups of 31 cpus and 1
> > group of 32 cpus, for:
> > 
> >   15*31 + 32 = 497 cpus.
> > 
> > AFAIU, this would be the minimum value for smp_processor_id()+1 which
> > triggers this issue.
> 
> By default, there are 16 CPUs per leaf srcu_node structure.  Each leaf
> srcu_node structure takes up one bit in the parent srcu_node structures'
> srcu_data_have_cbs[] array.  Up to 30 bits is OK, 31 bits is questionable,
> and 32 bits and larger erroneous.
> 
> This is the situation as I see it (assuming dense CPU numbering):
> 
> 	# Leaf srcu_node		Largest
> 	structures	#CPUs		CPU #		Status
> 
> 	0-30		0-480		479		Good
> 	31		481-496		495		Questionable
> 	32-		497-		496+		Bad.
> 
> Tree SRCU differs from Tree RCU in its operation, so this bug should
> not hold up SRCU grace periods.  It might instead cause SRCU callbacks
> to be ignored (which would admittedly look quite similar to the user).
> 
> My attempts to cheat the numbering system ran up against the limited
> height of the srcu_node tree.
> 
> But there is still the question of why this has not been seen in the
> wild, given that there really are systems with more than 479 CPUs
> out there.  One possibility is the call to srcu_schedule_cbs_sdp()
> from srcu_funnel_gp_start().  But it does not seem likely that this
> would happen all that often, as it requires back-to-back grace periods
> and then some.
> 
> Maybe people with large systems boot with srcutree.convert_to_big=0?
> 
> Adding Laurent for his thoughts.
> 
> Aha!
> 
> Here is what makes it work, given David's description of 1<<32:
> 
> static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp,
> 				  unsigned long mask, unsigned long delay)
> {
> 	int cpu;
> 
> 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> 		if (!(mask & (1 << (cpu - snp->grplo))))
> 			continue;
> 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> 	}
> }
> 
> As long as at least one bit is set in the result of 1<<N, and as long
> as the compiler always does the same thing for a given N, then this loop
> will make the right thing happen.
> 
> But even that is not relied upon, because the calling code looks like
> this:
> 
> 			spin_lock_irq_rcu_node(snp);
> 			cbs = false;
> 			last_lvl = snp >= sup->level[rcu_num_lvls - 1];
> 			if (last_lvl)
> 				cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
> 			snp->srcu_have_cbs[idx] = gpseq;
> 			rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
> 			sgsne = snp->srcu_gp_seq_needed_exp;
> 			if (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, gpseq))
> 				WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
> 			if (ss_state < SRCU_SIZE_BIG)
> 				mask = ~0;
> 			else
> 				mask = snp->srcu_data_have_cbs[idx];
> 			snp->srcu_data_have_cbs[idx] = 0;
> 			spin_unlock_irq_rcu_node(snp);
> 			if (cbs)
> 				srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);
> 
> So that last_lvl check means that the srcu_schedule_cbs_snp() function
> is invoked only for leaf srcu_node structures.  Which by default limit
> the shift to 16.
> 
> So this bug appears to have no effect for default RCU setups, even on very
> large 64-bit systems, which is consistent with field experience.  Even if
> this is the case, it still should be fixed, to avoid confusion if nothing
> else.  Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
> Which actually happened the other day due to someone trusting ChatGPT's
> opinion about RCU Kconfig options...

And I therefore queued Denis's v3 patch with an edited commit log.
Of course, if anyone sees some other way that the bug could manifest
other than in a 64-bit kernel built with CONFIG_RCU_FANOUT_LEAF greater
than 30 on a system with at least 31 CPUs, please let me know so that
I can adjust.

							Thanx, Paul

------------------------------------------------------------------------

commit ed083b0e22f1396dee3599896249a3f218845298
Author: Denis Arefev <arefev@swemel.ru>
Date:   Mon Sep 4 15:21:14 2023 +0300

    Fix srcu_struct node grpmask overflow on 64-bit systems
    
    The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
    is subject to overflow due to a failure to cast operands to a larger
    data type before performing arithmetic.
    
    The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
    Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
    maximum shift of 15), but which can be set up as high as 64 (resulting
    in a maximum shift of 63).  A value of 31 can result in sign extension,
    resulting in 0xffffffff80000000 instead of the desired 0x80000000.
    A value of 31 or greater triggers undefined behavior per the C standard.
    
    This bug has not been known to cause issues because almost all kernels
    take the default CONFIG_RCU_FANOUT_LEAF=16.  Furthermore, as long as
    a given compiler gives a deterministic result for 1<<N for N>=32, the
    code correctly invokes all SRCU callbacks, albeit wasting CPU time along
    the way.
    
    This commit therefore substitutes the correct 1UL for the buggy 1.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Signed-off-by: Denis Arefev <arefev@swemel.ru>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: David Laight <David.Laight@aculab.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 833a8f848a90..5602042856b1 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
 				snp->grplo = cpu;
 			snp->grphi = cpu;
 		}
-		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
+		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
 	}
 	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
 	return true;
@@ -835,7 +835,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 	int cpu;
 
 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1 << (cpu - snp->grplo))))
+		if (!(mask & (1UL << (cpu - snp->grplo))))
 			continue;
 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 	}
Mathieu Desnoyers Sept. 5, 2023, 7:36 p.m. UTC | #11
On 9/5/23 15:27, Paul E. McKenney wrote:
> On Tue, Sep 05, 2023 at 09:40:51AM -0700, Paul E. McKenney wrote:
>> On Tue, Sep 05, 2023 at 10:34:43AM -0400, Mathieu Desnoyers wrote:
>>> On 9/5/23 10:15, David Laight wrote:
>>>> ...
>>>>> That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
>>>>> only require a 31-bit shift, which should be OK, but 497 would require
>>>>> a 32-bit shift, which would result in sign extension.  If it turns out
>>>>> that sign extension is OK, then we should get in trouble at 513 CPUs,
>>>>> which would result in a 33-bit shift (and is that even defined in C?).
>>>>
>>>> Not quite right :-)
>>>>
>>>> (1 << 31) is int and negative, that gets sign extended before
>>>> being converted to 'unsigned long' - so has the top 33 bits set.
>>
>> Good point, thank you for the correction.
>>
>>>> (1 << 32) is undefined, the current x86 cpu ignore the high
>>>> shift bits so it is (1 << 0).
>>
>> Which would cause it to attempt to invoke SRCU callbacks on the
>> lowest-numbered CPU associated with that srcu_node structure.
>>
>>> Yes, I was about to reply the same thing. A shift of 31 is buggy,
>>> because shifting 1 << 31 raises the sign bit, which sets the top 33
>>> bits when cast to unsigned long. A shift of 1 << 32 is undefined,
>>> with for instance x86 choosing to ignore the top bit.
>>>
>>> But in order to have a 1 << 31 shift from this expression:
>>>
>>>                  sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
>>>
>>> I *think* we need the group to have 32 cpus or more (indexed within
>>> the group from grplo to grplo + 31 (both inclusive)).
>>>
>>> So as soon as we have one group with 32 cpus, the problem should show
>>> up. With FANOUT_LEAF=16, we can have 15 groups of 31 cpus and 1
>>> group of 32 cpus, for:
>>>
>>>    15*31 + 32 = 497 cpus.
>>>
>>> AFAIU, this would be the minimum value for smp_processor_id()+1 which
>>> triggers this issue.
>>
>> By default, there are 16 CPUs per leaf srcu_node structure.  Each leaf
>> srcu_node structure takes up one bit in the parent srcu_node structures'
>> srcu_data_have_cbs[] array.  Up to 30 bits is OK, 31 bits is questionable,
>> and 32 bits and larger erroneous.
>>
>> This is the situation as I see it (assuming dense CPU numbering):
>>
>> 	# Leaf srcu_node		Largest
>> 	structures	#CPUs		CPU #		Status
>>
>> 	0-30		0-480		479		Good
>> 	31		481-496		495		Questionable
>> 	32-		497-		496+		Bad.
>>
>> Tree SRCU differs from Tree RCU in its operation, so this bug should
>> not hold up SRCU grace periods.  It might instead cause SRCU callbacks
>> to be ignored (which would admittedly look quite similar to the user).
>>
>> My attempts to cheat the numbering system ran up against the limited
>> height of the srcu_node tree.
>>
>> But there is still the question of why this has not been seen in the
>> wild, given that there really are systems with more than 479 CPUs
>> out there.  One possibility is the call to srcu_schedule_cbs_sdp()
>> from srcu_funnel_gp_start().  But it does not seem likely that this
>> would happen all that often, as it requires back-to-back grace periods
>> and then some.
>>
>> Maybe people with large systems boot with srcutree.convert_to_big=0?
>>
>> Adding Laurent for his thoughts.
>>
>> Aha!
>>
>> Here is what makes it work, given David's description of 1<<32:
>>
>> static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp,
>> 				  unsigned long mask, unsigned long delay)
>> {
>> 	int cpu;
>>
>> 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
>> 		if (!(mask & (1 << (cpu - snp->grplo))))
>> 			continue;
>> 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>> 	}
>> }
>>
>> As long as at least one bit is set in the result of 1<<N, and as long
>> as the compiler always does the same thing for a given N, then this loop
>> will make the right thing happen.
>>
>> But even that is not relied upon, because the calling code looks like
>> this:
>>
>> 			spin_lock_irq_rcu_node(snp);
>> 			cbs = false;
>> 			last_lvl = snp >= sup->level[rcu_num_lvls - 1];
>> 			if (last_lvl)
>> 				cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
>> 			snp->srcu_have_cbs[idx] = gpseq;
>> 			rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
>> 			sgsne = snp->srcu_gp_seq_needed_exp;
>> 			if (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, gpseq))
>> 				WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
>> 			if (ss_state < SRCU_SIZE_BIG)
>> 				mask = ~0;
>> 			else
>> 				mask = snp->srcu_data_have_cbs[idx];
>> 			snp->srcu_data_have_cbs[idx] = 0;
>> 			spin_unlock_irq_rcu_node(snp);
>> 			if (cbs)
>> 				srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);
>>
>> So that last_lvl check means that the srcu_schedule_cbs_snp() function
>> is invoked only for leaf srcu_node structures.  Which by default limit
>> the shift to 16.
>>
>> So this bug appears to have no effect for default RCU setups, even on very
>> large 64-bit systems, which is consistent with field experience.  Even if
>> this is the case, it still should be fixed, to avoid confusion if nothing
>> else.  Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
>> Which actually happened the other day due to someone trusting ChatGPT's
>> opinion about RCU Kconfig options...
> 
> And I therefore queued Denis's v3 patch with an edited commit log.
> Of course, if anyone sees some other way that the bug could manifest
> other than in a 64-bit kernel built with CONFIG_RCU_FANOUT_LEAF greater
> than 30 on a system with at least 31 CPUs, please let me know so that
> I can adjust.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit ed083b0e22f1396dee3599896249a3f218845298
> Author: Denis Arefev <arefev@swemel.ru>
> Date:   Mon Sep 4 15:21:14 2023 +0300
> 
>      Fix srcu_struct node grpmask overflow on 64-bit systems
>      
>      The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)

AFAIU, the overflow resides in the "bitwise expression" and not
the arithmetic expression.

Other than this, please add my

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>


>      is subject to overflow due to a failure to cast operands to a larger
>      data type before performing arithmetic.
>      
>      The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
>      Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
>      maximum shift of 15), but which can be set up as high as 64 (resulting
>      in a maximum shift of 63).  A value of 31 can result in sign extension,
>      resulting in 0xffffffff80000000 instead of the desired 0x80000000.
>      A value of 31 or greater triggers undefined behavior per the C standard.
>      
>      This bug has not been known to cause issues because almost all kernels
>      take the default CONFIG_RCU_FANOUT_LEAF=16.  Furthermore, as long as
>      a given compiler gives a deterministic result for 1<<N for N>=32, the
>      code correctly invokes all SRCU callbacks, albeit wasting CPU time along
>      the way.
>      
>      This commit therefore substitutes the correct 1UL for the buggy 1.
>      
>      Found by Linux Verification Center (linuxtesting.org) with SVACE.
>      
>      Signed-off-by: Denis Arefev <arefev@swemel.ru>
>      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>      Cc: David Laight <David.Laight@aculab.com>
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 833a8f848a90..5602042856b1 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
>   				snp->grplo = cpu;
>   			snp->grphi = cpu;
>   		}
> -		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> +		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
>   	}
>   	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
>   	return true;
> @@ -835,7 +835,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
>   	int cpu;
>   
>   	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> -		if (!(mask & (1 << (cpu - snp->grplo))))
> +		if (!(mask & (1UL << (cpu - snp->grplo))))
>   			continue;
>   		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>   	}
Paul E. McKenney Sept. 5, 2023, 7:45 p.m. UTC | #12
On Tue, Sep 05, 2023 at 03:36:40PM -0400, Mathieu Desnoyers wrote:
> On 9/5/23 15:27, Paul E. McKenney wrote:
> > On Tue, Sep 05, 2023 at 09:40:51AM -0700, Paul E. McKenney wrote:
> > > On Tue, Sep 05, 2023 at 10:34:43AM -0400, Mathieu Desnoyers wrote:
> > > > On 9/5/23 10:15, David Laight wrote:
> > > > > ...
> > > > > > That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
> > > > > > only require a 31-bit shift, which should be OK, but 497 would require
> > > > > > a 32-bit shift, which would result in sign extension.  If it turns out
> > > > > > that sign extension is OK, then we should get in trouble at 513 CPUs,
> > > > > > which would result in a 33-bit shift (and is that even defined in C?).
> > > > > 
> > > > > Not quite right :-)
> > > > > 
> > > > > (1 << 31) is int and negative, that gets sign extended before
> > > > > being converted to 'unsigned long' - so has the top 33 bits set.
> > > 
> > > Good point, thank you for the correction.
> > > 
> > > > > (1 << 32) is undefined, the current x86 cpu ignore the high
> > > > > shift bits so it is (1 << 0).
> > > 
> > > Which would cause it to attempt to invoke SRCU callbacks on the
> > > lowest-numbered CPU associated with that srcu_node structure.
> > > 
> > > > Yes, I was about to reply the same thing. A shift of 31 is buggy,
> > > > because shifting 1 << 31 raises the sign bit, which sets the top 33
> > > > bits when cast to unsigned long. A shift of 1 << 32 is undefined,
> > > > with for instance x86 choosing to ignore the top bit.
> > > > 
> > > > But in order to have a 1 << 31 shift from this expression:
> > > > 
> > > >                  sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> > > > 
> > > > I *think* we need the group to have 32 cpus or more (indexed within
> > > > the group from grplo to grplo + 31 (both inclusive)).
> > > > 
> > > > So as soon as we have one group with 32 cpus, the problem should show
> > > > up. With FANOUT_LEAF=16, we can have 15 groups of 31 cpus and 1
> > > > group of 32 cpus, for:
> > > > 
> > > >    15*31 + 32 = 497 cpus.
> > > > 
> > > > AFAIU, this would be the minimum value for smp_processor_id()+1 which
> > > > triggers this issue.
> > > 
> > > By default, there are 16 CPUs per leaf srcu_node structure.  Each leaf
> > > srcu_node structure takes up one bit in the parent srcu_node structures'
> > > srcu_data_have_cbs[] array.  Up to 30 bits is OK, 31 bits is questionable,
> > > and 32 bits and larger erroneous.
> > > 
> > > This is the situation as I see it (assuming dense CPU numbering):
> > > 
> > > 	# Leaf srcu_node		Largest
> > > 	structures	#CPUs		CPU #		Status
> > > 
> > > 	0-30		0-480		479		Good
> > > 	31		481-496		495		Questionable
> > > 	32-		497-		496+		Bad.
> > > 
> > > Tree SRCU differs from Tree RCU in its operation, so this bug should
> > > not hold up SRCU grace periods.  It might instead cause SRCU callbacks
> > > to be ignored (which would admittedly look quite similar to the user).
> > > 
> > > My attempts to cheat the numbering system ran up against the limited
> > > height of the srcu_node tree.
> > > 
> > > But there is still the question of why this has not been seen in the
> > > wild, given that there really are systems with more than 479 CPUs
> > > out there.  One possibility is the call to srcu_schedule_cbs_sdp()
> > > from srcu_funnel_gp_start().  But it does not seem likely that this
> > > would happen all that often, as it requires back-to-back grace periods
> > > and then some.
> > > 
> > > Maybe people with large systems boot with srcutree.convert_to_big=0?
> > > 
> > > Adding Laurent for his thoughts.
> > > 
> > > Aha!
> > > 
> > > Here is what makes it work, given David's description of 1<<32:
> > > 
> > > static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp,
> > > 				  unsigned long mask, unsigned long delay)
> > > {
> > > 	int cpu;
> > > 
> > > 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> > > 		if (!(mask & (1 << (cpu - snp->grplo))))
> > > 			continue;
> > > 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> > > 	}
> > > }
> > > 
> > > As long as at least one bit is set in the result of 1<<N, and as long
> > > as the compiler always does the same thing for a given N, then this loop
> > > will make the right thing happen.
> > > 
> > > But even that is not relied upon, because the calling code looks like
> > > this:
> > > 
> > > 			spin_lock_irq_rcu_node(snp);
> > > 			cbs = false;
> > > 			last_lvl = snp >= sup->level[rcu_num_lvls - 1];
> > > 			if (last_lvl)
> > > 				cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
> > > 			snp->srcu_have_cbs[idx] = gpseq;
> > > 			rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
> > > 			sgsne = snp->srcu_gp_seq_needed_exp;
> > > 			if (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, gpseq))
> > > 				WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
> > > 			if (ss_state < SRCU_SIZE_BIG)
> > > 				mask = ~0;
> > > 			else
> > > 				mask = snp->srcu_data_have_cbs[idx];
> > > 			snp->srcu_data_have_cbs[idx] = 0;
> > > 			spin_unlock_irq_rcu_node(snp);
> > > 			if (cbs)
> > > 				srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);
> > > 
> > > So that last_lvl check means that the srcu_schedule_cbs_snp() function
> > > is invoked only for leaf srcu_node structures.  Which by default limit
> > > the shift to 16.
> > > 
> > > So this bug appears to have no effect for default RCU setups, even on very
> > > large 64-bit systems, which is consistent with field experience.  Even if
> > > this is the case, it still should be fixed, to avoid confusion if nothing
> > > else.  Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
> > > Which actually happened the other day due to someone trusting ChatGPT's
> > > opinion about RCU Kconfig options...
> > 
> > And I therefore queued Denis's v3 patch with an edited commit log.
> > Of course, if anyone sees some other way that the bug could manifest
> > other than in a 64-bit kernel built with CONFIG_RCU_FANOUT_LEAF greater
> > than 30 on a system with at least 31 CPUs, please let me know so that
> > I can adjust.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit ed083b0e22f1396dee3599896249a3f218845298
> > Author: Denis Arefev <arefev@swemel.ru>
> > Date:   Mon Sep 4 15:21:14 2023 +0300
> > 
> >      Fix srcu_struct node grpmask overflow on 64-bit systems
> >      The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> 
> AFAIU, the overflow resides in the "bitwise expression" and not
> the arithmetic expression.

Rather than quibble about exactly what constitutes arithmetic, I
updated the first paragraph and added your Reviewed-by as shown
below.  ;-)

> Other than this, please add my
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Thank you all!!!

------------------------------------------------------------------------

commit 50477ff756ab99402b1523b7c6be8b5d790d05e7
Author: Denis Arefev <arefev@swemel.ru>
Date:   Mon Sep 4 15:21:14 2023 +0300

    Fix srcu_struct node grpmask overflow on 64-bit systems
    
    The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
    is subject to overflow due to a failure to cast operands to a larger
    data type before performing the bitwise operation.
    
    The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
    Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
    maximum shift of 15), but which can be set up as high as 64 (resulting
    in a maximum shift of 63).  A value of 31 can result in sign extension,
    resulting in 0xffffffff80000000 instead of the desired 0x80000000.
    A value of 31 or greater triggers undefined behavior per the C standard.
    
    This bug has not been known to cause issues because almost all kernels
    take the default CONFIG_RCU_FANOUT_LEAF=16.  Furthermore, as long as
    a given compiler gives a deterministic result for 1<<N for N>=32, the
    code correctly invokes all SRCU callbacks, albeit wasting CPU time along
    the way.
    
    This commit therefore substitutes the correct 1UL for the buggy 1.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Signed-off-by: Denis Arefev <arefev@swemel.ru>
    Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: David Laight <David.Laight@aculab.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 833a8f848a90..5602042856b1 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
 				snp->grplo = cpu;
 			snp->grphi = cpu;
 		}
-		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
+		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
 	}
 	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
 	return true;
@@ -835,7 +835,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 	int cpu;
 
 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1 << (cpu - snp->grplo))))
+		if (!(mask & (1UL << (cpu - snp->grplo))))
 			continue;
 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 	}
Joel Fernandes Sept. 5, 2023, 8:40 p.m. UTC | #13
On Tue, Sep 5, 2023 at 4:13 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Sep 05, 2023 at 03:36:40PM -0400, Mathieu Desnoyers wrote:
> > On 9/5/23 15:27, Paul E. McKenney wrote:
> > > On Tue, Sep 05, 2023 at 09:40:51AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Sep 05, 2023 at 10:34:43AM -0400, Mathieu Desnoyers wrote:
> > > > > On 9/5/23 10:15, David Laight wrote:
> > > > > > ...
> > > > > > > That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
> > > > > > > only require a 31-bit shift, which should be OK, but 497 would require
> > > > > > > a 32-bit shift, which would result in sign extension.  If it turns out
> > > > > > > that sign extension is OK, then we should get in trouble at 513 CPUs,
> > > > > > > which would result in a 33-bit shift (and is that even defined in C?).
> > > > > >
> > > > > > Not quite right :-)
> > > > > >
> > > > > > (1 << 31) is int and negative, that gets sign extended before
> > > > > > being converted to 'unsigned long' - so has the top 33 bits set.
> > > >
> > > > Good point, thank you for the correction.
> > > >
> > > > > > (1 << 32) is undefined, the current x86 cpu ignore the high
> > > > > > shift bits so it is (1 << 0).
> > > >
> > > > Which would cause it to attempt to invoke SRCU callbacks on the
> > > > lowest-numbered CPU associated with that srcu_node structure.
> > > >
> > > > > Yes, I was about to reply the same thing. A shift of 31 is buggy,
> > > > > because shifting 1 << 31 raises the sign bit, which sets the top 33
> > > > > bits when cast to unsigned long. A shift of 1 << 32 is undefined,
> > > > > with for instance x86 choosing to ignore the top bit.
> > > > >
> > > > > But in order to have a 1 << 31 shift from this expression:
> > > > >
> > > > >                  sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> > > > >
> > > > > I *think* we need the group to have 32 cpus or more (indexed within
> > > > > the group from grplo to grplo + 31 (both inclusive)).
> > > > >
> > > > > So as soon as we have one group with 32 cpus, the problem should show
> > > > > up. With FANOUT_LEAF=16, we can have 15 groups of 31 cpus and 1
> > > > > group of 32 cpus, for:
> > > > >
> > > > >    15*31 + 32 = 497 cpus.
> > > > >
> > > > > AFAIU, this would be the minimum value for smp_processor_id()+1 which
> > > > > triggers this issue.
> > > >
> > > > By default, there are 16 CPUs per leaf srcu_node structure.  Each leaf
> > > > srcu_node structure takes up one bit in the parent srcu_node structures'
> > > > srcu_data_have_cbs[] array.  Up to 30 bits is OK, 31 bits is questionable,
> > > > and 32 bits and larger erroneous.
> > > >
> > > > This is the situation as I see it (assuming dense CPU numbering):
> > > >
> > > >   # Leaf srcu_node                Largest
> > > >   structures      #CPUs           CPU #           Status
> > > >
> > > >   0-30            0-480           479             Good
> > > >   31              481-496         495             Questionable
> > > >   32-             497-            496+            Bad.
> > > >
> > > > Tree SRCU differs from Tree RCU in its operation, so this bug should
> > > > not hold up SRCU grace periods.  It might instead cause SRCU callbacks
> > > > to be ignored (which would admittedly look quite similar to the user).
> > > >
> > > > My attempts to cheat the numbering system ran up against the limited
> > > > height of the srcu_node tree.
> > > >
> > > > But there is still the question of why this has not been seen in the
> > > > wild, given that there really are systems with more than 479 CPUs
> > > > out there.  One possibility is the call to srcu_schedule_cbs_sdp()
> > > > from srcu_funnel_gp_start().  But it does not seem likely that this
> > > > would happen all that often, as it requires back-to-back grace periods
> > > > and then some.
> > > >
> > > > Maybe people with large systems boot with srcutree.convert_to_big=0?
> > > >
> > > > Adding Laurent for his thoughts.
> > > >
> > > > Aha!
> > > >
> > > > Here is what makes it work, given David's description of 1<<32:
> > > >
> > > > static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp,
> > > >                             unsigned long mask, unsigned long delay)
> > > > {
> > > >   int cpu;
> > > >
> > > >   for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> > > >           if (!(mask & (1 << (cpu - snp->grplo))))
> > > >                   continue;
> > > >           srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> > > >   }
> > > > }
> > > >
> > > > As long as at least one bit is set in the result of 1<<N, and as long
> > > > as the compiler always does the same thing for a given N, then this loop
> > > > will make the right thing happen.
> > > >
> > > > But even that is not relied upon, because the calling code looks like
> > > > this:
> > > >
> > > >                   spin_lock_irq_rcu_node(snp);
> > > >                   cbs = false;
> > > >                   last_lvl = snp >= sup->level[rcu_num_lvls - 1];
> > > >                   if (last_lvl)
> > > >                           cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
> > > >                   snp->srcu_have_cbs[idx] = gpseq;
> > > >                   rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
> > > >                   sgsne = snp->srcu_gp_seq_needed_exp;
> > > >                   if (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, gpseq))
> > > >                           WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
> > > >                   if (ss_state < SRCU_SIZE_BIG)
> > > >                           mask = ~0;
> > > >                   else
> > > >                           mask = snp->srcu_data_have_cbs[idx];
> > > >                   snp->srcu_data_have_cbs[idx] = 0;
> > > >                   spin_unlock_irq_rcu_node(snp);
> > > >                   if (cbs)
> > > >                           srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);
> > > >
> > > > So that last_lvl check means that the srcu_schedule_cbs_snp() function
> > > > is invoked only for leaf srcu_node structures.  Which by default limit
> > > > the shift to 16.
> > > >
> > > > So this bug appears to have no effect for default RCU setups, even on very
> > > > large 64-bit systems, which is consistent with field experience.  Even if
> > > > this is the case, it still should be fixed, to avoid confusion if nothing
> > > > else.  Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
> > > > Which actually happened the other day due to someone trusting ChatGPT's
> > > > opinion about RCU Kconfig options...
> > >
> > > And I therefore queued Denis's v3 patch with an edited commit log.
> > > Of course, if anyone sees some other way that the bug could manifest
> > > other than in a 64-bit kernel built with CONFIG_RCU_FANOUT_LEAF greater
> > > than 30 on a system with at least 31 CPUs, please let me know so that
> > > I can adjust.
> > >
> > >                                                     Thanx, Paul
> > >
> > > ------------------------------------------------------------------------
> > >
> > > commit ed083b0e22f1396dee3599896249a3f218845298
> > > Author: Denis Arefev <arefev@swemel.ru>
> > > Date:   Mon Sep 4 15:21:14 2023 +0300
> > >
> > >      Fix srcu_struct node grpmask overflow on 64-bit systems
> > >      The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> >
> > AFAIU, the overflow resides in the "bitwise expression" and not
> > the arithmetic expression.
>
> Rather than quibble about exactly what constitutes arithmetic, I
> updated the first paragraph and added your Reviewed-by as shown
> below.  ;-)
>
> > Other than this, please add my
> >
> > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>
> Thank you all!!!
>
> ------------------------------------------------------------------------
>
> commit 50477ff756ab99402b1523b7c6be8b5d790d05e7
> Author: Denis Arefev <arefev@swemel.ru>
> Date:   Mon Sep 4 15:21:14 2023 +0300
>
>     Fix srcu_struct node grpmask overflow on 64-bit systems
>
>     The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
>     is subject to overflow due to a failure to cast operands to a larger
>     data type before performing the bitwise operation.
>
>     The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
>     Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
>     maximum shift of 15), but which can be set up as high as 64 (resulting
>     in a maximum shift of 63).  A value of 31 can result in sign extension,
>     resulting in 0xffffffff80000000 instead of the desired 0x80000000.
>     A value of 31 or greater triggers undefined behavior per the C standard.

Do you mean here "A value of 32 or greater"?

Only N >= 32 throws warning for:
unsigned long foo = (1 << N);

N=31 does undesirable sign extension but no warning.

thanks,

 - Joel
Paul E. McKenney Sept. 5, 2023, 8:52 p.m. UTC | #14
On Tue, Sep 05, 2023 at 04:40:46PM -0400, Joel Fernandes wrote:
> On Tue, Sep 5, 2023 at 4:13 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Tue, Sep 05, 2023 at 03:36:40PM -0400, Mathieu Desnoyers wrote:
> > > On 9/5/23 15:27, Paul E. McKenney wrote:
> > > > On Tue, Sep 05, 2023 at 09:40:51AM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Sep 05, 2023 at 10:34:43AM -0400, Mathieu Desnoyers wrote:
> > > > > > On 9/5/23 10:15, David Laight wrote:
> > > > > > > ...
> > > > > > > > That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
> > > > > > > > only require a 31-bit shift, which should be OK, but 497 would require
> > > > > > > > a 32-bit shift, which would result in sign extension.  If it turns out
> > > > > > > > that sign extension is OK, then we should get in trouble at 513 CPUs,
> > > > > > > > which would result in a 33-bit shift (and is that even defined in C?).
> > > > > > >
> > > > > > > Not quite right :-)
> > > > > > >
> > > > > > > (1 << 31) is int and negative, that gets sign extended before
> > > > > > > being converted to 'unsigned long' - so has the top 33 bits set.
> > > > >
> > > > > Good point, thank you for the correction.
> > > > >
> > > > > > > (1 << 32) is undefined, the current x86 cpu ignore the high
> > > > > > > shift bits so it is (1 << 0).
> > > > >
> > > > > Which would cause it to attempt to invoke SRCU callbacks on the
> > > > > lowest-numbered CPU associated with that srcu_node structure.
> > > > >
> > > > > > Yes, I was about to reply the same thing. A shift of 31 is buggy,
> > > > > > because shifting 1 << 31 raises the sign bit, which sets the top 33
> > > > > > bits when cast to unsigned long. A shift of 1 << 32 is undefined,
> > > > > > with for instance x86 choosing to ignore the top bit.
> > > > > >
> > > > > > But in order to have a 1 << 31 shift from this expression:
> > > > > >
> > > > > >                  sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> > > > > >
> > > > > > I *think* we need the group to have 32 cpus or more (indexed within
> > > > > > the group from grplo to grplo + 31 (both inclusive)).
> > > > > >
> > > > > > So as soon as we have one group with 32 cpus, the problem should show
> > > > > > up. With FANOUT_LEAF=16, we can have 15 groups of 31 cpus and 1
> > > > > > group of 32 cpus, for:
> > > > > >
> > > > > >    15*31 + 32 = 497 cpus.
> > > > > >
> > > > > > AFAIU, this would be the minimum value for smp_processor_id()+1 which
> > > > > > triggers this issue.
> > > > >
> > > > > By default, there are 16 CPUs per leaf srcu_node structure.  Each leaf
> > > > > srcu_node structure takes up one bit in the parent srcu_node structures'
> > > > > srcu_data_have_cbs[] array.  Up to 30 bits is OK, 31 bits is questionable,
> > > > > and 32 bits and larger erroneous.
> > > > >
> > > > > This is the situation as I see it (assuming dense CPU numbering):
> > > > >
> > > > >   # Leaf srcu_node                Largest
> > > > >   structures      #CPUs           CPU #           Status
> > > > >
> > > > >   0-30            0-480           479             Good
> > > > >   31              481-496         495             Questionable
> > > > >   32-             497-            496+            Bad.
> > > > >
> > > > > Tree SRCU differs from Tree RCU in its operation, so this bug should
> > > > > not hold up SRCU grace periods.  It might instead cause SRCU callbacks
> > > > > to be ignored (which would admittedly look quite similar to the user).
> > > > >
> > > > > My attempts to cheat the numbering system ran up against the limited
> > > > > height of the srcu_node tree.
> > > > >
> > > > > But there is still the question of why this has not been seen in the
> > > > > wild, given that there really are systems with more than 479 CPUs
> > > > > out there.  One possibility is the call to srcu_schedule_cbs_sdp()
> > > > > from srcu_funnel_gp_start().  But it does not seem likely that this
> > > > > would happen all that often, as it requires back-to-back grace periods
> > > > > and then some.
> > > > >
> > > > > Maybe people with large systems boot with srcutree.convert_to_big=0?
> > > > >
> > > > > Adding Laurent for his thoughts.
> > > > >
> > > > > Aha!
> > > > >
> > > > > Here is what makes it work, given David's description of 1<<32:
> > > > >
> > > > > static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp,
> > > > >                             unsigned long mask, unsigned long delay)
> > > > > {
> > > > >   int cpu;
> > > > >
> > > > >   for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> > > > >           if (!(mask & (1 << (cpu - snp->grplo))))
> > > > >                   continue;
> > > > >           srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> > > > >   }
> > > > > }
> > > > >
> > > > > As long as at least one bit is set in the result of 1<<N, and as long
> > > > > as the compiler always does the same thing for a given N, then this loop
> > > > > will make the right thing happen.
> > > > >
> > > > > But even that is not relied upon, because the calling code looks like
> > > > > this:
> > > > >
> > > > >                   spin_lock_irq_rcu_node(snp);
> > > > >                   cbs = false;
> > > > >                   last_lvl = snp >= sup->level[rcu_num_lvls - 1];
> > > > >                   if (last_lvl)
> > > > >                           cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
> > > > >                   snp->srcu_have_cbs[idx] = gpseq;
> > > > >                   rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
> > > > >                   sgsne = snp->srcu_gp_seq_needed_exp;
> > > > >                   if (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, gpseq))
> > > > >                           WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
> > > > >                   if (ss_state < SRCU_SIZE_BIG)
> > > > >                           mask = ~0;
> > > > >                   else
> > > > >                           mask = snp->srcu_data_have_cbs[idx];
> > > > >                   snp->srcu_data_have_cbs[idx] = 0;
> > > > >                   spin_unlock_irq_rcu_node(snp);
> > > > >                   if (cbs)
> > > > >                           srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);
> > > > >
> > > > > So that last_lvl check means that the srcu_schedule_cbs_snp() function
> > > > > is invoked only for leaf srcu_node structures.  Which by default limit
> > > > > the shift to 16.
> > > > >
> > > > > So this bug appears to have no effect for default RCU setups, even on very
> > > > > large 64-bit systems, which is consistent with field experience.  Even if
> > > > > this is the case, it still should be fixed, to avoid confusion if nothing
> > > > > else.  Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
> > > > > Which actually happened the other day due to someone trusting ChatGPT's
> > > > > opinion about RCU Kconfig options...
> > > >
> > > > And I therefore queued Denis's v3 patch with an edited commit log.
> > > > Of course, if anyone sees some other way that the bug could manifest
> > > > other than in a 64-bit kernel built with CONFIG_RCU_FANOUT_LEAF greater
> > > > than 30 on a system with at least 31 CPUs, please let me know so that
> > > > I can adjust.
> > > >
> > > >                                                     Thanx, Paul
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > commit ed083b0e22f1396dee3599896249a3f218845298
> > > > Author: Denis Arefev <arefev@swemel.ru>
> > > > Date:   Mon Sep 4 15:21:14 2023 +0300
> > > >
> > > >      Fix srcu_struct node grpmask overflow on 64-bit systems
> > > >      The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> > >
> > > AFAIU, the overflow resides in the "bitwise expression" and not
> > > the arithmetic expression.
> >
> > Rather than quibble about exactly what constitutes arithmetic, I
> > updated the first paragraph and added your Reviewed-by as shown
> > below.  ;-)
> >
> > > Other than this, please add my
> > >
> > > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >
> > Thank you all!!!
> >
> > ------------------------------------------------------------------------
> >
> > commit 50477ff756ab99402b1523b7c6be8b5d790d05e7
> > Author: Denis Arefev <arefev@swemel.ru>
> > Date:   Mon Sep 4 15:21:14 2023 +0300
> >
> >     Fix srcu_struct node grpmask overflow on 64-bit systems
> >
> >     The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
> >     is subject to overflow due to a failure to cast operands to a larger
> >     data type before performing the bitwise operation.
> >
> >     The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
> >     Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
> >     maximum shift of 15), but which can be set up as high as 64 (resulting
> >     in a maximum shift of 63).  A value of 31 can result in sign extension,
> >     resulting in 0xffffffff80000000 instead of the desired 0x80000000.
> >     A value of 31 or greater triggers undefined behavior per the C standard.
> 
> Do you mean here "A value of 32 or greater"?
> 
> Only N >= 32 throws warning for:
> unsigned long foo = (1 << N);
> 
> N=31 does undesirable sign extension but no warning.

Good catch, thank you, and I will update this on my next rebase.

							Thanx, Paul
Joel Fernandes Sept. 5, 2023, 8:56 p.m. UTC | #15
On Tue, Sep 5, 2023 at 4:52 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Sep 05, 2023 at 04:40:46PM -0400, Joel Fernandes wrote:
[...]
> > > > > > So this bug appears to have no effect for default RCU setups, even on very
> > > > > > large 64-bit systems, which is consistent with field experience.  Even if
> > > > > > this is the case, it still should be fixed, to avoid confusion if nothing
> > > > > > else.  Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
> > > > > > Which actually happened the other day due to someone trusting ChatGPT's
> > > > > > opinion about RCU Kconfig options...
> > > > >
> > > > > And I therefore queued Denis's v3 patch with an edited commit log.
> > > > > Of course, if anyone sees some other way that the bug could manifest
> > > > > other than in a 64-bit kernel built with CONFIG_RCU_FANOUT_LEAF greater
> > > > > than 30 on a system with at least 31 CPUs, please let me know so that
> > > > > I can adjust.
> > > > >
> > > > >                                                     Thanx, Paul
> > > > >
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > commit ed083b0e22f1396dee3599896249a3f218845298
> > > > > Author: Denis Arefev <arefev@swemel.ru>
> > > > > Date:   Mon Sep 4 15:21:14 2023 +0300
> > > > >
> > > > >      Fix srcu_struct node grpmask overflow on 64-bit systems
> > > > >      The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> > > >
> > > > AFAIU, the overflow resides in the "bitwise expression" and not
> > > > the arithmetic expression.
> > >
> > > Rather than quibble about exactly what constitutes arithmetic, I
> > > updated the first paragraph and added your Reviewed-by as shown
> > > below.  ;-)
> > >
> > > > Other than this, please add my
> > > >
> > > > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > >
> > > Thank you all!!!
> > >
> > > ------------------------------------------------------------------------
> > >
> > > commit 50477ff756ab99402b1523b7c6be8b5d790d05e7
> > > Author: Denis Arefev <arefev@swemel.ru>
> > > Date:   Mon Sep 4 15:21:14 2023 +0300
> > >
> > >     Fix srcu_struct node grpmask overflow on 64-bit systems
> > >
> > >     The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
> > >     is subject to overflow due to a failure to cast operands to a larger
> > >     data type before performing the bitwise operation.
> > >
> > >     The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
> > >     Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
> > >     maximum shift of 15), but which can be set up as high as 64 (resulting
> > >     in a maximum shift of 63).  A value of 31 can result in sign extension,
> > >     resulting in 0xffffffff80000000 instead of the desired 0x80000000.
> > >     A value of 31 or greater triggers undefined behavior per the C standard.
> >
> > Do you mean here "A value of 32 or greater"?
> >
> > Only N >= 32 throws warning for:
> > unsigned long foo = (1 << N);
> >
> > N=31 does undesirable sign extension but no warning.
>
> Good catch, thank you, and I will update this on my next rebase.

Thanks, and with that the patch looks good to me:
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

 - Joel
Paul E. McKenney Sept. 5, 2023, 10:22 p.m. UTC | #16
On Tue, Sep 05, 2023 at 04:56:12PM -0400, Joel Fernandes wrote:
> On Tue, Sep 5, 2023 at 4:52 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Tue, Sep 05, 2023 at 04:40:46PM -0400, Joel Fernandes wrote:
> [...]
> > > > > > > So this bug appears to have no effect for default RCU setups, even on very
> > > > > > > large 64-bit systems, which is consistent with field experience.  Even if
> > > > > > > this is the case, it still should be fixed, to avoid confusion if nothing
> > > > > > > else.  Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
> > > > > > > Which actually happened the other day due to someone trusting ChatGPT's
> > > > > > > opinion about RCU Kconfig options...
> > > > > >
> > > > > > And I therefore queued Denis's v3 patch with an edited commit log.
> > > > > > Of course, if anyone sees some other way that the bug could manifest
> > > > > > other than in a 64-bit kernel built with CONFIG_RCU_FANOUT_LEAF greater
> > > > > > than 30 on a system with at least 31 CPUs, please let me know so that
> > > > > > I can adjust.
> > > > > >
> > > > > >                                                     Thanx, Paul
> > > > > >
> > > > > > ------------------------------------------------------------------------
> > > > > >
> > > > > > commit ed083b0e22f1396dee3599896249a3f218845298
> > > > > > Author: Denis Arefev <arefev@swemel.ru>
> > > > > > Date:   Mon Sep 4 15:21:14 2023 +0300
> > > > > >
> > > > > >      Fix srcu_struct node grpmask overflow on 64-bit systems
> > > > > >      The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> > > > >
> > > > > AFAIU, the overflow resides in the "bitwise expression" and not
> > > > > the arithmetic expression.
> > > >
> > > > Rather than quibble about exactly what constitutes arithmetic, I
> > > > updated the first paragraph and added your Reviewed-by as shown
> > > > below.  ;-)
> > > >
> > > > > Other than this, please add my
> > > > >
> > > > > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > >
> > > > Thank you all!!!
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > commit 50477ff756ab99402b1523b7c6be8b5d790d05e7
> > > > Author: Denis Arefev <arefev@swemel.ru>
> > > > Date:   Mon Sep 4 15:21:14 2023 +0300
> > > >
> > > >     Fix srcu_struct node grpmask overflow on 64-bit systems
> > > >
> > > >     The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
> > > >     is subject to overflow due to a failure to cast operands to a larger
> > > >     data type before performing the bitwise operation.
> > > >
> > > >     The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
> > > >     Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
> > > >     maximum shift of 15), but which can be set up as high as 64 (resulting
> > > >     in a maximum shift of 63).  A value of 31 can result in sign extension,
> > > >     resulting in 0xffffffff80000000 instead of the desired 0x80000000.
> > > >     A value of 31 or greater triggers undefined behavior per the C standard.
> > >
> > > Do you mean here "A value of 32 or greater"?
> > >
> > > Only N >= 32 throws warning for:
> > > unsigned long foo = (1 << N);
> > >
> > > N=31 does undesirable sign extension but no warning.
> >
> > Good catch, thank you, and I will update this on my next rebase.
> 
> Thanks, and with that the patch looks good to me:
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Thank you very much, and I will apply this on the next rebase.

							Thanx, Paul
diff mbox series

Patch

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 20d7a238d675..6c18e6005ae1 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -223,7 +223,7 @@  static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
 				snp->grplo = cpu;
 			snp->grphi = cpu;
 		}
-		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
+		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
 	}
 	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
 	return true;
@@ -833,7 +833,7 @@  static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 	int cpu;
 
 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1 << (cpu - snp->grplo))))
+		if (!(mask & (1UL << (cpu - snp->grplo))))
 			continue;
 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 	}