diff mbox

[v9,00/12] Support PPTT for ARM64

Message ID 20180529150823.GD17159@arm.com (mailing list archive)
State Not Applicable
Delegated to: Geert Uytterhoeven
Headers show

Commit Message

Will Deacon May 29, 2018, 3:08 p.m. UTC
On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote:
> On 29/05/18 12:56, Geert Uytterhoeven wrote:
> > On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:
> >> On 29/05/18 11:48, Geert Uytterhoeven wrote:
> >>> System supend still works fine on systems with big cores only:
> >>>
> >>>     R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware))
> >>>     R-Car M3-N (2xCA57)
> >>>
> >>> Reverting this commit fixes the issue for me.
> >>
> >> I can't find anything that relates to system suspend in these patches
> >> unless they are messing with something during CPU hot plug-in back
> >> during resume.
> > 
> > It's only the last patch that introduces the breakage.
> > 
> 
> As specified in the commit log, it won't change any behavior for DT
> systems if it's non-NUMA or single node system. So I am still wondering
> what could trigger this regression.

I wonder if we're somehow giving an uninitialised/invalid NUMA configuration
to the scheduler, although I can't see how this would happen.

Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below
do you see anything shouting in dmesg?

Will

--->8

Comments

Geert Uytterhoeven May 29, 2018, 3:51 p.m. UTC | #1
Hi Will,

On Tue, May 29, 2018 at 5:08 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote:
>> On 29/05/18 12:56, Geert Uytterhoeven wrote:
>> > On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:
>> >> On 29/05/18 11:48, Geert Uytterhoeven wrote:
>> >>> System supend still works fine on systems with big cores only:
>> >>>
>> >>>     R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware))
>> >>>     R-Car M3-N (2xCA57)
>> >>>
>> >>> Reverting this commit fixes the issue for me.
>> >>
>> >> I can't find anything that relates to system suspend in these patches
>> >> unless they are messing with something during CPU hot plug-in back
>> >> during resume.
>> >
>> > It's only the last patch that introduces the breakage.
>> >
>>
>> As specified in the commit log, it won't change any behavior for DT
>> systems if it's non-NUMA or single node system. So I am still wondering
>> what could trigger this regression.
>
> I wonder if we're somehow giving an uninitialised/invalid NUMA configuration
> to the scheduler, although I can't see how this would happen.
>
> Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below
> do you see anything shouting in dmesg?

Thanks, but unfortunately it doesn't help.
I added some debug code to print cpumask, but so far I don't see anything
suspicious.

Gr{oetje,eeting}s,

                        Geert
Robin Murphy May 29, 2018, 5:08 p.m. UTC | #2
On 29/05/18 16:51, Geert Uytterhoeven wrote:
> Hi Will,
> 
> On Tue, May 29, 2018 at 5:08 PM, Will Deacon <will.deacon@arm.com> wrote:
>> On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote:
>>> On 29/05/18 12:56, Geert Uytterhoeven wrote:
>>>> On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>>>> On 29/05/18 11:48, Geert Uytterhoeven wrote:
>>>>>> System supend still works fine on systems with big cores only:
>>>>>>
>>>>>>      R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware))
>>>>>>      R-Car M3-N (2xCA57)
>>>>>>
>>>>>> Reverting this commit fixes the issue for me.
>>>>>
>>>>> I can't find anything that relates to system suspend in these patches
>>>>> unless they are messing with something during CPU hot plug-in back
>>>>> during resume.
>>>>
>>>> It's only the last patch that introduces the breakage.
>>>>
>>>
>>> As specified in the commit log, it won't change any behavior for DT
>>> systems if it's non-NUMA or single node system. So I am still wondering
>>> what could trigger this regression.
>>
>> I wonder if we're somehow giving an uninitialised/invalid NUMA configuration
>> to the scheduler, although I can't see how this would happen.
>>
>> Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below
>> do you see anything shouting in dmesg?
> 
> Thanks, but unfortunately it doesn't help.
> I added some debug code to print cpumask, but so far I don't see anything
> suspicious.

Do you have CONFIG_NUMA enabled? On a hunch I've managed to reproduce 
what looks like the same thing on a Juno board with NUMA=n; going in 
with external debug it seems to be stuck in the loop in 
init_sched_groups_capacity(), with an approximate stack trace of:


init_sched_groups_capacity()
partition_sched_domains()
cpuset_cpu_active()
sched_cpu_activate()
cpuhp_invoke_callback()
cpuhp_thread_fn()

My hunch is based on the fact that it looks like we can, under the right 
circumstances, end up with default_topology picking up cpu_online_mask 
as a sibling mask via cpu_coregroup_mask(), and given the great 
coincidence that that's going to change when hotplugging out CPUs on 
suspend, things might not react too well to that. Things also look to go 
utterly haywire once into a full-blown systemd userspace with cpuidle, 
but I haven't got a clear picture of that yet.

Robin.
Geert Uytterhoeven May 29, 2018, 5:18 p.m. UTC | #3
Hi Robin,

On Tue, May 29, 2018 at 7:08 PM, Robin Murphy <robin.murphy@arm.com> wrote:
> On 29/05/18 16:51, Geert Uytterhoeven wrote:
>> On Tue, May 29, 2018 at 5:08 PM, Will Deacon <will.deacon@arm.com> wrote:
>>> On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote:
>>>> On 29/05/18 12:56, Geert Uytterhoeven wrote:
>>>>> On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla <sudeep.holla@arm.com>
>>>>> wrote:
>>>>>> On 29/05/18 11:48, Geert Uytterhoeven wrote:
>>>>>>> System supend still works fine on systems with big cores only:
>>>>>>>
>>>>>>>      R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware))
>>>>>>>      R-Car M3-N (2xCA57)
>>>>>>>
>>>>>>> Reverting this commit fixes the issue for me.
>>>>>>
>>>>>> I can't find anything that relates to system suspend in these patches
>>>>>> unless they are messing with something during CPU hot plug-in back
>>>>>> during resume.
>>>>>
>>>>> It's only the last patch that introduces the breakage.
>>>>
>>>> As specified in the commit log, it won't change any behavior for DT
>>>> systems if it's non-NUMA or single node system. So I am still wondering
>>>> what could trigger this regression.
>>>
>>> I wonder if we're somehow giving an uninitialised/invalid NUMA
>>> configuration
>>> to the scheduler, although I can't see how this would happen.
>>>
>>> Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff
>>> below
>>> do you see anything shouting in dmesg?
>>
>> Thanks, but unfortunately it doesn't help.
>> I added some debug code to print cpumask, but so far I don't see anything
>> suspicious.
>
> Do you have CONFIG_NUMA enabled? On a hunch I've managed to reproduce what
> looks like the same thing on a Juno board with NUMA=n; going in with
> external debug it seems to be stuck in the loop in
> init_sched_groups_capacity(), with an approximate stack trace of:

CONFIG_NUMA is not set.
I'm basically using renesas_defconfig from
https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas.git/log/?h=topic/renesas-defconfig

Gr{oetje,eeting}s,

                        Geert
Sudeep Holla May 29, 2018, 5:31 p.m. UTC | #4
On 29/05/18 18:08, Robin Murphy wrote:
> On 29/05/18 16:51, Geert Uytterhoeven wrote:
>> Hi Will,
>>
>> On Tue, May 29, 2018 at 5:08 PM, Will Deacon <will.deacon@arm.com> wrote:
>>> On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote:
>>>> On 29/05/18 12:56, Geert Uytterhoeven wrote:
>>>>> On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla
>>>>> <sudeep.holla@arm.com> wrote:
>>>>>> On 29/05/18 11:48, Geert Uytterhoeven wrote:
>>>>>>> System supend still works fine on systems with big cores only:
>>>>>>>
>>>>>>>      R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware))
>>>>>>>      R-Car M3-N (2xCA57)
>>>>>>>
>>>>>>> Reverting this commit fixes the issue for me.
>>>>>>
>>>>>> I can't find anything that relates to system suspend in these patches
>>>>>> unless they are messing with something during CPU hot plug-in back
>>>>>> during resume.
>>>>>
>>>>> It's only the last patch that introduces the breakage.
>>>>>
>>>>
>>>> As specified in the commit log, it won't change any behavior for DT
>>>> systems if it's non-NUMA or single node system. So I am still wondering
>>>> what could trigger this regression.
>>>
>>> I wonder if we're somehow giving an uninitialised/invalid NUMA
>>> configuration
>>> to the scheduler, although I can't see how this would happen.
>>>
>>> Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff
>>> below
>>> do you see anything shouting in dmesg?
>>
>> Thanks, but unfortunately it doesn't help.
>> I added some debug code to print cpumask, but so far I don't see anything
>> suspicious.
> 
> Do you have CONFIG_NUMA enabled? On a hunch I've managed to reproduce
> what looks like the same thing on a Juno board with NUMA=n; going in
> with external debug it seems to be stuck in the loop in
> init_sched_groups_capacity(), with an approximate stack trace of:
> 
> 
> init_sched_groups_capacity()
> partition_sched_domains()
> cpuset_cpu_active()
> sched_cpu_activate()
> cpuhp_invoke_callback()
> cpuhp_thread_fn()
> 
> My hunch is based on the fact that it looks like we can, under the right
> circumstances, end up with default_topology picking up cpu_online_mask
> as a sibling mask via cpu_coregroup_mask(), and given the great
> coincidence that that's going to change when hotplugging out CPUs on
> suspend, things might not react too well to that. Things also look to go
> utterly haywire once into a full-blown systemd userspace with cpuidle,
> but I haven't got a clear picture of that yet.
> 

Yes, I too observed the same. I was able to suspend resume if I have
cpuidle disabled.
Will Deacon May 29, 2018, 8:16 p.m. UTC | #5
Hi Geert,

On Tue, May 29, 2018 at 05:51:29PM +0200, Geert Uytterhoeven wrote:
> On Tue, May 29, 2018 at 5:08 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote:
> >> On 29/05/18 12:56, Geert Uytterhoeven wrote:
> >> > On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:
> >> >> On 29/05/18 11:48, Geert Uytterhoeven wrote:
> >> >>> System supend still works fine on systems with big cores only:
> >> >>>
> >> >>>     R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware))
> >> >>>     R-Car M3-N (2xCA57)
> >> >>>
> >> >>> Reverting this commit fixes the issue for me.
> >> >>
> >> >> I can't find anything that relates to system suspend in these patches
> >> >> unless they are messing with something during CPU hot plug-in back
> >> >> during resume.
> >> >
> >> > It's only the last patch that introduces the breakage.
> >> >
> >>
> >> As specified in the commit log, it won't change any behavior for DT
> >> systems if it's non-NUMA or single node system. So I am still wondering
> >> what could trigger this regression.
> >
> > I wonder if we're somehow giving an uninitialised/invalid NUMA configuration
> > to the scheduler, although I can't see how this would happen.
> >
> > Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below
> > do you see anything shouting in dmesg?
> 
> Thanks, but unfortunately it doesn't help.
> I added some debug code to print cpumask, but so far I don't see anything
> suspicious.

Damn, sorry for wasting your time. For the record, Catalin's been seeing
boot failures under KVM on a non-big/LITTLE machine that bisect reliably
to this patch, but we've also not been able to explain them. Worse, adding
so much as a printk makes the problem disappear.

Will
Jeremy Linton May 29, 2018, 8:48 p.m. UTC | #6
On 05/29/2018 03:16 PM, Will Deacon wrote:
> Hi Geert,
> 
> On Tue, May 29, 2018 at 05:51:29PM +0200, Geert Uytterhoeven wrote:
>> On Tue, May 29, 2018 at 5:08 PM, Will Deacon <will.deacon@arm.com> wrote:
>>> On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote:
>>>> On 29/05/18 12:56, Geert Uytterhoeven wrote:
>>>>> On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>>>>> On 29/05/18 11:48, Geert Uytterhoeven wrote:
>>>>>>> System supend still works fine on systems with big cores only:
>>>>>>>
>>>>>>>      R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware))
>>>>>>>      R-Car M3-N (2xCA57)
>>>>>>>
>>>>>>> Reverting this commit fixes the issue for me.
>>>>>>
>>>>>> I can't find anything that relates to system suspend in these patches
>>>>>> unless they are messing with something during CPU hot plug-in back
>>>>>> during resume.
>>>>>
>>>>> It's only the last patch that introduces the breakage.
>>>>>
>>>>
>>>> As specified in the commit log, it won't change any behavior for DT
>>>> systems if it's non-NUMA or single node system. So I am still wondering
>>>> what could trigger this regression.
>>>
>>> I wonder if we're somehow giving an uninitialised/invalid NUMA configuration
>>> to the scheduler, although I can't see how this would happen.
>>>
>>> Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below
>>> do you see anything shouting in dmesg?
>>
>> Thanks, but unfortunately it doesn't help.
>> I added some debug code to print cpumask, but so far I don't see anything
>> suspicious.
> 
> Damn, sorry for wasting your time. For the record, Catalin's been seeing
> boot failures under KVM on a non-big/LITTLE machine that bisect reliably
> to this patch, but we've also not been able to explain them. Worse, adding
> so much as a printk makes the problem disappear.



I was about to post a patch to remove the numa check if CONFIG_NUMA 
disabled. But that seems pointless if the its happening with numa 
enabled. So assuming, its the removal of the core from the numa mask 
which is causing problems. It looks like numa_clear_node() might cause 
similar problems when numa is enabled. In my case the problem I see is 
NULL dereference in __bitmap_intersect called from select_task_rq_fair. 
That said, I only see the problem when CONFIG_NUMA isn't set.

So, I've also got another work around which caches the numa node to the 
cpu_topology and then only builds it when store_cpu_topology() is 
called. That should stabilize the numa mask, and assure that the bit 
maps are correct when the scheduler requests them.

Do you guys want that patch, or are we looking for a deeper root cause?
diff mbox

Patch

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index dad128ba98bf..e3de033339b4 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -58,7 +58,7 @@  EXPORT_SYMBOL(node_to_cpumask_map);
  */
 const struct cpumask *cpumask_of_node(int node)
 {
-	if (WARN_ON(node >= nr_node_ids))
+	if (WARN_ON((unsigned)node >= nr_node_ids))
 		return cpu_none_mask;
 
 	if (WARN_ON(node_to_cpumask_map[node] == NULL))