diff mbox series

[3/3] xen/common: Add NUMA node id bounds check to page_alloc.c/node_to_scrub

Message ID a47962a573eab38991a8aa3e09ca6b8bd3193b6c.1695681330.git.sanastasio@raptorengineering.com (mailing list archive)
State Superseded
Headers show
Series Fix Power CI build | expand

Commit Message

Shawn Anastasio Sept. 25, 2023, 10:42 p.m. UTC
When building for Power with CONFIG_DEBUG unset, a compiler error gets
raised inside page_alloc.c's node_to_scrub function, likely due to the
increased optimization level:

common/page_alloc.c: In function 'node_to_scrub.part.0':
common/page_alloc.c:1217:29: error: array subscript 1 is above array
            bounds of 'long unsigned int[1]' [-Werror=array-bounds]
 1217 |         if ( node_need_scrub[node] )

It appears that this is a false positive, given that in practice
cycle_node should never return a node ID >= MAX_NUMNODES as long as the
architecture's node_online_map is properly defined and initialized, so
this additional bounds check is only to satisfy GCC.

Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com>
---
 xen/common/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

--
2.30.2

Comments

Stefano Stabellini Sept. 25, 2023, 11:04 p.m. UTC | #1
Adding Henry as this issue is currently causing gitlab-ci failures. I
think we should fix it as soon as possible to get clearer results for
the 4.18 release. (This comment applies to the rest of the series as
well).


On Mon, 25 Sep 2023, Shawn Anastasio wrote:
> When building for Power with CONFIG_DEBUG unset, a compiler error gets
> raised inside page_alloc.c's node_to_scrub function, likely due to the
> increased optimization level:
> 
> common/page_alloc.c: In function 'node_to_scrub.part.0':
> common/page_alloc.c:1217:29: error: array subscript 1 is above array
>             bounds of 'long unsigned int[1]' [-Werror=array-bounds]
>  1217 |         if ( node_need_scrub[node] )
> 
> It appears that this is a false positive, given that in practice
> cycle_node should never return a node ID >= MAX_NUMNODES as long as the
> architecture's node_online_map is properly defined and initialized, so
> this additional bounds check is only to satisfy GCC.
> 
> Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com>

Acked-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
>  xen/common/page_alloc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 35d9a26fa6..6df2a223e1 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1211,6 +1211,9 @@ static unsigned int node_to_scrub(bool get_node)
>          } while ( !cpumask_empty(&node_to_cpumask(node)) &&
>                    (node != local_node) );
> 
> +        if ( node >= MAX_NUMNODES )
> +            break;
> +
>          if ( node == local_node )
>              break;
> 
> --
> 2.30.2
>
Henry Wang Sept. 26, 2023, 12:06 a.m. UTC | #2
Hi Stefano,

> On Sep 26, 2023, at 07:04, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> Adding Henry as this issue is currently causing gitlab-ci failures. I
> think we should fix it as soon as possible to get clearer results for
> the 4.18 release. (This comment applies to the rest of the series as
> well).

Thanks, the CI should indeed be fixed asap.

> 
> 
> On Mon, 25 Sep 2023, Shawn Anastasio wrote:
>> When building for Power with CONFIG_DEBUG unset, a compiler error gets
>> raised inside page_alloc.c's node_to_scrub function, likely due to the
>> increased optimization level:
>> 
>> common/page_alloc.c: In function 'node_to_scrub.part.0':
>> common/page_alloc.c:1217:29: error: array subscript 1 is above array
>>            bounds of 'long unsigned int[1]' [-Werror=array-bounds]
>> 1217 |         if ( node_need_scrub[node] )
>> 
>> It appears that this is a false positive, given that in practice
>> cycle_node should never return a node ID >= MAX_NUMNODES as long as the
>> architecture's node_online_map is properly defined and initialized, so
>> this additional bounds check is only to satisfy GCC.
>> 
>> Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com>
> 
> Acked-by: Stefano Stabellini <sstabellini@kernel.org>

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry

> 
> 
>> ---
>> xen/common/page_alloc.c | 3 +++
>> 1 file changed, 3 insertions(+)
>> 
>> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>> index 35d9a26fa6..6df2a223e1 100644
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1211,6 +1211,9 @@ static unsigned int node_to_scrub(bool get_node)
>>         } while ( !cpumask_empty(&node_to_cpumask(node)) &&
>>                   (node != local_node) );
>> 
>> +        if ( node >= MAX_NUMNODES )
>> +            break;
>> +
>>         if ( node == local_node )
>>             break;
>> 
>> --
>> 2.30.2
>>
Jan Beulich Sept. 26, 2023, 7:32 a.m. UTC | #3
On 26.09.2023 00:42, Shawn Anastasio wrote:
> When building for Power with CONFIG_DEBUG unset, a compiler error gets
> raised inside page_alloc.c's node_to_scrub function, likely due to the
> increased optimization level:
> 
> common/page_alloc.c: In function 'node_to_scrub.part.0':
> common/page_alloc.c:1217:29: error: array subscript 1 is above array
>             bounds of 'long unsigned int[1]' [-Werror=array-bounds]
>  1217 |         if ( node_need_scrub[node] )

That's gcc13?

> It appears that this is a false positive, given that in practice
> cycle_node should never return a node ID >= MAX_NUMNODES as long as the
> architecture's node_online_map is properly defined and initialized, so
> this additional bounds check is only to satisfy GCC.

Looks very similar to the situation that c890499871cc ("timer: fix
NR_CPUS=1 build with gcc13") was dealing with, just that here it's
MAX_NUMNODES. I'd therefore prefer a solution similar to the one
taken there, i.e. make code conditional rather than add yet more
code.

Otherwise, ...

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1211,6 +1211,9 @@ static unsigned int node_to_scrub(bool get_node)
>          } while ( !cpumask_empty(&node_to_cpumask(node)) &&
>                    (node != local_node) );
> 
> +        if ( node >= MAX_NUMNODES )
> +            break;

... this clearly redundant check would need to gain a comment.

What I'm puzzled by is that on Arm we had no reports of a similar
problem, despite NUMA also not getting selected there (yet).

Jan
Jan Beulich Sept. 26, 2023, 7:36 a.m. UTC | #4
On 26.09.2023 09:32, Jan Beulich wrote:
> On 26.09.2023 00:42, Shawn Anastasio wrote:
>> When building for Power with CONFIG_DEBUG unset,

Hmm, depending on what gcc versions are used in CI, the above may be the
reason why ...

>> a compiler error gets
>> raised inside page_alloc.c's node_to_scrub function, likely due to the
>> increased optimization level:
>>
>> common/page_alloc.c: In function 'node_to_scrub.part.0':
>> common/page_alloc.c:1217:29: error: array subscript 1 is above array
>>             bounds of 'long unsigned int[1]' [-Werror=array-bounds]
>>  1217 |         if ( node_need_scrub[node] )
> 
> That's gcc13?
> 
>> It appears that this is a false positive, given that in practice
>> cycle_node should never return a node ID >= MAX_NUMNODES as long as the
>> architecture's node_online_map is properly defined and initialized, so
>> this additional bounds check is only to satisfy GCC.
> 
> Looks very similar to the situation that c890499871cc ("timer: fix
> NR_CPUS=1 build with gcc13") was dealing with, just that here it's
> MAX_NUMNODES. I'd therefore prefer a solution similar to the one
> taken there, i.e. make code conditional rather than add yet more
> code.
> 
> Otherwise, ...
> 
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1211,6 +1211,9 @@ static unsigned int node_to_scrub(bool get_node)
>>          } while ( !cpumask_empty(&node_to_cpumask(node)) &&
>>                    (node != local_node) );
>>
>> +        if ( node >= MAX_NUMNODES )
>> +            break;
> 
> ... this clearly redundant check would need to gain a comment.
> 
> What I'm puzzled by is that on Arm we had no reports of a similar
> problem, despite NUMA also not getting selected there (yet).

... this wasn't observed, yet. As far as I'm concerned, all my Arm builds
are debug ones (which I may need to change).

Jan
Shawn Anastasio Sept. 26, 2023, 6:27 p.m. UTC | #5
On 9/26/23 2:32 AM, Jan Beulich wrote:
> On 26.09.2023 00:42, Shawn Anastasio wrote:
>> When building for Power with CONFIG_DEBUG unset, a compiler error gets
>> raised inside page_alloc.c's node_to_scrub function, likely due to the
>> increased optimization level:
>>
>> common/page_alloc.c: In function 'node_to_scrub.part.0':
>> common/page_alloc.c:1217:29: error: array subscript 1 is above array
>>             bounds of 'long unsigned int[1]' [-Werror=array-bounds]
>>  1217 |         if ( node_need_scrub[node] )
> 
> That's gcc13?
> 

Gcc 10.2.1, actually.

>> It appears that this is a false positive, given that in practice
>> cycle_node should never return a node ID >= MAX_NUMNODES as long as the
>> architecture's node_online_map is properly defined and initialized, so
>> this additional bounds check is only to satisfy GCC.
> 
> Looks very similar to the situation that c890499871cc ("timer: fix
> NR_CPUS=1 build with gcc13") was dealing with, just that here it's
> MAX_NUMNODES. I'd therefore prefer a solution similar to the one
> taken there, i.e. make code conditional rather than add yet more
> code.
> 

Looking at that commit, it doesn't look like conditionalizing this code
in the same way would be as clean -- it'd likely require separate
conditional blocks around the initial variable declarations and the
function body.

> Otherwise, ...
> 
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1211,6 +1211,9 @@ static unsigned int node_to_scrub(bool get_node)
>>          } while ( !cpumask_empty(&node_to_cpumask(node)) &&
>>                    (node != local_node) );
>>
>> +        if ( node >= MAX_NUMNODES )
>> +            break;
> 
> ... this clearly redundant check would need to gain a comment.
>

This isn't a problem. I'll submit a v2 with a commit.

> What I'm puzzled by is that on Arm we had no reports of a similar
> problem, despite NUMA also not getting selected there (yet).
>

As you pointed out in your follow-up email, it might be due to
CONFIG_DEBUG. It might also be because on ppc, our implementation of
find_next_bit (which is used by cycle_node) is declared static inline,
so the compiler has access to its definition when doing this sort of
static analysis. On Arm I believe the function lives in its own
compilation unit which might preclude GCC from making any assumptions
about its return value.

> Jan

Thanks,
Shawn
diff mbox series

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 35d9a26fa6..6df2a223e1 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1211,6 +1211,9 @@  static unsigned int node_to_scrub(bool get_node)
         } while ( !cpumask_empty(&node_to_cpumask(node)) &&
                   (node != local_node) );

+        if ( node >= MAX_NUMNODES )
+            break;
+
         if ( node == local_node )
             break;