Message ID | cbecf35392441b03dc3ccd2a119b6fbdbf13b855.1695754185.git.sanastasio@raptorengineering.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix Power CI build | expand |
On 26/09/2023 7:54 pm, Shawn Anastasio wrote: > When building for Power with CONFIG_DEBUG unset, a compiler error gets > raised inside page_alloc.c's node_to_scrub function: > > common/page_alloc.c: In function 'node_to_scrub.part.0': > common/page_alloc.c:1217:29: error: array subscript 1 is above array > bounds of 'long unsigned int[1]' [-Werror=array-bounds] > 1217 | if ( node_need_scrub[node] ) > > It appears that this is a false positive, given that in practice > cycle_node should never return a node ID >= MAX_NUMNODES as long as the > architecture's node_online_map is properly defined and initialized, so > this additional bounds check is only to satisfy GCC. If GCC thinks it's got an index of 1 here, then it thinks it's proved that a 1 gets passed. But the fact that cycle_node() really can return 1 if one variable gets tweaked in memory means that logic derived on this property is broken. But we've got even more basic problems to fix first. xen$ git grep -B1 '__read_mostly node_online_map' arch/arm/smpboot.c-45-/* Fake one node for now. See also asm/numa.h */ arch/arm/smpboot.c:46:nodemask_t __read_mostly node_online_map = { { [0] = 1UL } }; arch/ppc/stubs.c-25- arch/ppc/stubs.c:26:nodemask_t __read_mostly node_online_map = { { [0] = 1UL } }; common/numa.c-44- common/numa.c:45:nodemask_t __read_mostly node_online_map = { { [0] = 1UL } }; There are 3 identical definitions of node_online_map, one for the architecture which really supports NUMA, and two for the stubs which don't. And the bug here is that code outside of CONFIG_NUMA assumes NUMA is valid, including several places in page_alloc.c, domain_set_node_affinity(), credit1 and sysctl. XEN_SYSCTL_numainfo even has a bigger sin of using a static MAX_NUMNODES array when it doesn't need an array in the first place. It's rude for xen/nodemask.h to short circuit some of the operations on node_online_map based on MAX_NUMNODES but not others. If nothing else, the fallback for "not really NUMA" needs providing by the common code and not by the arch stubs. When that is sorted, we might have more consistent behaviour to investigate. ~Andrew
On Tue, 26 Sep 2023, Shawn Anastasio wrote: > When building for Power with CONFIG_DEBUG unset, a compiler error gets > raised inside page_alloc.c's node_to_scrub function: > > common/page_alloc.c: In function 'node_to_scrub.part.0': > common/page_alloc.c:1217:29: error: array subscript 1 is above array > bounds of 'long unsigned int[1]' [-Werror=array-bounds] > 1217 | if ( node_need_scrub[node] ) > > It appears that this is a false positive, given that in practice > cycle_node should never return a node ID >= MAX_NUMNODES as long as the > architecture's node_online_map is properly defined and initialized, so > this additional bounds check is only to satisfy GCC. > > Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> > --- > v2: Add comment to explain the bounds check. > > xen/common/page_alloc.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c > index 35d9a26fa6..c53f917dbc 100644 > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1211,6 +1211,14 @@ static unsigned int node_to_scrub(bool get_node) > } while ( !cpumask_empty(&node_to_cpumask(node)) && > (node != local_node) ); > > + /* > + * In practice `node` will always be within MAX_NUMNODES, but GCC can't > + * always see that, so an explicit check is necessary to avoid tripping > + * its out-of-bounds array access warning (-Warray-bounds). > + */ > + if ( node >= MAX_NUMNODES ) > + break; > + > if ( node == local_node ) > break; > > -- > 2.30.2 >
On 9/26/23 3:54 PM, Andrew Cooper wrote: > On 26/09/2023 7:54 pm, Shawn Anastasio wrote: >> When building for Power with CONFIG_DEBUG unset, a compiler error gets >> raised inside page_alloc.c's node_to_scrub function: >> >> common/page_alloc.c: In function 'node_to_scrub.part.0': >> common/page_alloc.c:1217:29: error: array subscript 1 is above array >> bounds of 'long unsigned int[1]' [-Werror=array-bounds] >> 1217 | if ( node_need_scrub[node] ) >> >> It appears that this is a false positive, given that in practice >> cycle_node should never return a node ID >= MAX_NUMNODES as long as the >> architecture's node_online_map is properly defined and initialized, so >> this additional bounds check is only to satisfy GCC. > > If GCC thinks it's got an index of 1 here, then it thinks it's proved > that a 1 gets passed. But the fact that cycle_node() really can return > 1 if one variable gets tweaked in memory means that logic derived on > this property is broken. > > But we've got even more basic problems to fix first. > > xen$ git grep -B1 '__read_mostly node_online_map' > arch/arm/smpboot.c-45-/* Fake one node for now. See also asm/numa.h */ > arch/arm/smpboot.c:46:nodemask_t __read_mostly node_online_map = { { [0] > = 1UL } }; > arch/ppc/stubs.c-25- > arch/ppc/stubs.c:26:nodemask_t __read_mostly node_online_map = { { [0] = > 1UL } }; > common/numa.c-44- > common/numa.c:45:nodemask_t __read_mostly node_online_map = { { [0] = > 1UL } }; > > There are 3 identical definitions of node_online_map, one for the > architecture which really supports NUMA, and two for the stubs which don't. > > And the bug here is that code outside of CONFIG_NUMA assumes NUMA is > valid, including several places in page_alloc.c, > domain_set_node_affinity(), credit1 and sysctl. XEN_SYSCTL_numainfo > even has a bigger sin of using a static MAX_NUMNODES array when it > doesn't need an array in the first place. > Agreed. It seems like the current state of affairs resulted from hacks in Arm to work around NUMA assumptions in common instead of properly making common support !CONFIG_NUMA. And now ppc is inheriting some of that baggage. > It's rude for xen/nodemask.h to short circuit some of the operations on > node_online_map based on MAX_NUMNODES but not others. > > If nothing else, the fallback for "not really NUMA" needs providing by > the common code and not by the arch stubs. When that is sorted, we > might have more consistent behaviour to investigate. > I'll leave these more sweeping changes up to someone with a bit more familiarity with this code for now. In the meantime, this patch at should introduce no functional change to the status quo and allows ppc to build in the interim. > ~Andrew Thanks, Shawn
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 35d9a26fa6..c53f917dbc 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -1211,6 +1211,14 @@ static unsigned int node_to_scrub(bool get_node) } while ( !cpumask_empty(&node_to_cpumask(node)) && (node != local_node) ); + /* + * In practice `node` will always be within MAX_NUMNODES, but GCC can't + * always see that, so an explicit check is necessary to avoid tripping + * its out-of-bounds array access warning (-Warray-bounds). + */ + if ( node >= MAX_NUMNODES ) + break; + if ( node == local_node ) break;
When building for Power with CONFIG_DEBUG unset, a compiler error gets raised inside page_alloc.c's node_to_scrub function: common/page_alloc.c: In function 'node_to_scrub.part.0': common/page_alloc.c:1217:29: error: array subscript 1 is above array bounds of 'long unsigned int[1]' [-Werror=array-bounds] 1217 | if ( node_need_scrub[node] ) It appears that this is a false positive, given that in practice cycle_node should never return a node ID >= MAX_NUMNODES as long as the architecture's node_online_map is properly defined and initialized, so this additional bounds check is only to satisfy GCC. Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> --- v2: Add comment to explain the bounds check. xen/common/page_alloc.c | 8 ++++++++ 1 file changed, 8 insertions(+) -- 2.30.2