diff mbox series

[4/6] xen: don't free percpu areas during suspend

Message ID 20190318131155.29450-5-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series xen: simplify suspend/resume handling | expand

Commit Message

Juergen Gross March 18, 2019, 1:11 p.m. UTC
Instead of freeing percpu areas during suspend and allocating them
again when resuming keep them. Only free an area in case a cpu didn't
come up again when resuming.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/percpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Dario Faggioli March 25, 2019, 6:14 p.m. UTC | #1
On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote:
> Instead of freeing percpu areas during suspend and allocating them
> again when resuming keep them. Only free an area in case a cpu didn't
> come up again when resuming.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Dario
Andrew Cooper March 27, 2019, 3:55 p.m. UTC | #2
On 18/03/2019 13:11, Juergen Gross wrote:
> Instead of freeing percpu areas during suspend and allocating them
> again when resuming keep them. Only free an area in case a cpu didn't
> come up again when resuming.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Hmm - this is slightly problematic, given the dual nature of this code.

I agree that it this change is beneficial for the suspend case, but it
is a problem when we are parking an individual CPU for smt=0 or
xen-hptool reasons.

Do we have any hint we can use when taking the CPU down as to whether
we're expecting it to come straight back up again?

~Andrew
Juergen Gross March 27, 2019, 4:18 p.m. UTC | #3
On 27/03/2019 16:55, Andrew Cooper wrote:
> On 18/03/2019 13:11, Juergen Gross wrote:
>> Instead of freeing percpu areas during suspend and allocating them
>> again when resuming keep them. Only free an area in case a cpu didn't
>> come up again when resuming.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Hmm - this is slightly problematic, given the dual nature of this code.
> 
> I agree that it this change is beneficial for the suspend case, but it
> is a problem when we are parking an individual CPU for smt=0 or
> xen-hptool reasons.
> 
> Do we have any hint we can use when taking the CPU down as to whether
> we're expecting it to come straight back up again?

Did you look into the patch? I did this by testing system_state.


Juergen
Jan Beulich March 27, 2019, 4:38 p.m. UTC | #4
>>> On 27.03.19 at 17:18, <jgross@suse.com> wrote:
> On 27/03/2019 16:55, Andrew Cooper wrote:
>> On 18/03/2019 13:11, Juergen Gross wrote:
>>> Instead of freeing percpu areas during suspend and allocating them
>>> again when resuming keep them. Only free an area in case a cpu didn't
>>> come up again when resuming.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> 
>> Hmm - this is slightly problematic, given the dual nature of this code.
>> 
>> I agree that it this change is beneficial for the suspend case, but it
>> is a problem when we are parking an individual CPU for smt=0 or
>> xen-hptool reasons.
>> 
>> Do we have any hint we can use when taking the CPU down as to whether
>> we're expecting it to come straight back up again?
> 
> Did you look into the patch? I did this by testing system_state.

I think there's a wider problem here: enable_nonboot_cpus()
only brings back up the CPUs that were previously online.
Parked ones would be left alone, yet after resume they'd
need to be put back into parked state.

Jan
Juergen Gross March 27, 2019, 4:52 p.m. UTC | #5
On 27/03/2019 17:38, Jan Beulich wrote:
>>>> On 27.03.19 at 17:18, <jgross@suse.com> wrote:
>> On 27/03/2019 16:55, Andrew Cooper wrote:
>>> On 18/03/2019 13:11, Juergen Gross wrote:
>>>> Instead of freeing percpu areas during suspend and allocating them
>>>> again when resuming keep them. Only free an area in case a cpu didn't
>>>> come up again when resuming.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>>> Hmm - this is slightly problematic, given the dual nature of this code.
>>>
>>> I agree that it this change is beneficial for the suspend case, but it
>>> is a problem when we are parking an individual CPU for smt=0 or
>>> xen-hptool reasons.
>>>
>>> Do we have any hint we can use when taking the CPU down as to whether
>>> we're expecting it to come straight back up again?
>>
>> Did you look into the patch? I did this by testing system_state.
> 
> I think there's a wider problem here: enable_nonboot_cpus()
> only brings back up the CPUs that were previously online.
> Parked ones would be left alone, yet after resume they'd
> need to be put back into parked state.

I can add that handling in the respin of the series.


Juergen
Juergen Gross March 28, 2019, 6:59 a.m. UTC | #6
On 27/03/2019 17:52, Juergen Gross wrote:
> On 27/03/2019 17:38, Jan Beulich wrote:
>>>>> On 27.03.19 at 17:18, <jgross@suse.com> wrote:
>>> On 27/03/2019 16:55, Andrew Cooper wrote:
>>>> On 18/03/2019 13:11, Juergen Gross wrote:
>>>>> Instead of freeing percpu areas during suspend and allocating them
>>>>> again when resuming keep them. Only free an area in case a cpu didn't
>>>>> come up again when resuming.
>>>>>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> Hmm - this is slightly problematic, given the dual nature of this code.
>>>>
>>>> I agree that it this change is beneficial for the suspend case, but it
>>>> is a problem when we are parking an individual CPU for smt=0 or
>>>> xen-hptool reasons.
>>>>
>>>> Do we have any hint we can use when taking the CPU down as to whether
>>>> we're expecting it to come straight back up again?
>>>
>>> Did you look into the patch? I did this by testing system_state.
>>
>> I think there's a wider problem here: enable_nonboot_cpus()
>> only brings back up the CPUs that were previously online.
>> Parked ones would be left alone, yet after resume they'd
>> need to be put back into parked state.
> 
> I can add that handling in the respin of the series.

Looking deeper into that mess I believe that should be a series of its
own. Cpu parking needs to be handled for cpu hotplug and core parking
(XENPF_core_parking), too.


Juergen
Jan Beulich March 28, 2019, 7:46 a.m. UTC | #7
>>> On 18.03.19 at 14:11, <jgross@suse.com> wrote:
> Instead of freeing percpu areas during suspend and allocating them
> again when resuming keep them. Only free an area in case a cpu didn't
> come up again when resuming.

There's another aspect here which needs at least mentioning:
Right now, CPUs coming back up have their per-CPU data
cleared. Not doing so may indeed be an improvement in
some cases (like statistics collected), but may also get in the
way in other cases.

Jan
Juergen Gross March 28, 2019, 7:53 a.m. UTC | #8
On 28/03/2019 08:46, Jan Beulich wrote:
>>>> On 18.03.19 at 14:11, <jgross@suse.com> wrote:
>> Instead of freeing percpu areas during suspend and allocating them
>> again when resuming keep them. Only free an area in case a cpu didn't
>> come up again when resuming.
> 
> There's another aspect here which needs at least mentioning:
> Right now, CPUs coming back up have their per-CPU data
> cleared. Not doing so may indeed be an improvement in
> some cases (like statistics collected), but may also get in the
> way in other cases.

I have checked the respective cpu notifier hooks and I think this is no
problem. I hope I didn't overlook something.

I'll add a note to the commit message.


Juergen
Jan Beulich March 28, 2019, 8:03 a.m. UTC | #9
>>> On 28.03.19 at 07:59, <jgross@suse.com> wrote:
> On 27/03/2019 17:52, Juergen Gross wrote:
>> On 27/03/2019 17:38, Jan Beulich wrote:
>>>>>> On 27.03.19 at 17:18, <jgross@suse.com> wrote:
>>>> On 27/03/2019 16:55, Andrew Cooper wrote:
>>>>> On 18/03/2019 13:11, Juergen Gross wrote:
>>>>>> Instead of freeing percpu areas during suspend and allocating them
>>>>>> again when resuming keep them. Only free an area in case a cpu didn't
>>>>>> come up again when resuming.
>>>>>>
>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>
>>>>> Hmm - this is slightly problematic, given the dual nature of this code.
>>>>>
>>>>> I agree that it this change is beneficial for the suspend case, but it
>>>>> is a problem when we are parking an individual CPU for smt=0 or
>>>>> xen-hptool reasons.
>>>>>
>>>>> Do we have any hint we can use when taking the CPU down as to whether
>>>>> we're expecting it to come straight back up again?
>>>>
>>>> Did you look into the patch? I did this by testing system_state.
>>>
>>> I think there's a wider problem here: enable_nonboot_cpus()
>>> only brings back up the CPUs that were previously online.
>>> Parked ones would be left alone, yet after resume they'd
>>> need to be put back into parked state.
>> 
>> I can add that handling in the respin of the series.
> 
> Looking deeper into that mess I believe that should be a series of its
> own. Cpu parking needs to be handled for cpu hotplug and core parking
> (XENPF_core_parking), too.

What issue do you see for CPU hotplug? cpu_up_helper() has
been modified by the parking series.

For core parking I wonder whether core_parking_helper()
shouldn't, first of all, invoke cpu_{up,down}_helper(). This
wouldn't be enough, though - the policy hooks need to honor
opt_smt as well.

As to this wanting to be a patch / series of its own - I don't
mind, but preferably it would come ahead of your changes
here, so that it can be backported independently and
(sufficiently) easily (unless of course there's really no
collision between the two).

Jan
Jan Beulich March 28, 2019, 8:04 a.m. UTC | #10
>>> On 28.03.19 at 08:53, <jgross@suse.com> wrote:
> On 28/03/2019 08:46, Jan Beulich wrote:
>>>>> On 18.03.19 at 14:11, <jgross@suse.com> wrote:
>>> Instead of freeing percpu areas during suspend and allocating them
>>> again when resuming keep them. Only free an area in case a cpu didn't
>>> come up again when resuming.
>> 
>> There's another aspect here which needs at least mentioning:
>> Right now, CPUs coming back up have their per-CPU data
>> cleared. Not doing so may indeed be an improvement in
>> some cases (like statistics collected), but may also get in the
>> way in other cases.
> 
> I have checked the respective cpu notifier hooks and I think this is no
> problem. I hope I didn't overlook something.

Is checking the hooks sufficient? I'd rather expect we need to
check all per-CPU data objects.

Jan
Juergen Gross March 28, 2019, 8:35 a.m. UTC | #11
On 28/03/2019 09:04, Jan Beulich wrote:
>>>> On 28.03.19 at 08:53, <jgross@suse.com> wrote:
>> On 28/03/2019 08:46, Jan Beulich wrote:
>>>>>> On 18.03.19 at 14:11, <jgross@suse.com> wrote:
>>>> Instead of freeing percpu areas during suspend and allocating them
>>>> again when resuming keep them. Only free an area in case a cpu didn't
>>>> come up again when resuming.
>>>
>>> There's another aspect here which needs at least mentioning:
>>> Right now, CPUs coming back up have their per-CPU data
>>> cleared. Not doing so may indeed be an improvement in
>>> some cases (like statistics collected), but may also get in the
>>> way in other cases.
>>
>> I have checked the respective cpu notifier hooks and I think this is no
>> problem. I hope I didn't overlook something.
> 
> Is checking the hooks sufficient? I'd rather expect we need to
> check all per-CPU data objects.

Why? The main concern would be a hook not doing its job due to some
data not being zeroed.

In case there is a dependency somewhere else I'd rather expect subtle
negative effects with today's solution as suddenly percpu data will
be zero after suspend/resume without a component having been involved
with suspending.

Sure, there could be negative effects with my series, but I'd asset the
chances lower as without my series. Suspend/resume tends to be not
tested very often, it seems. I guess I'm the only one having it tried
recently (it was broken for 3 months at least before my patch repairing
an issue with IOMMU).


Juergen
Juergen Gross March 28, 2019, 8:35 a.m. UTC | #12
On 28/03/2019 09:03, Jan Beulich wrote:
>>>> On 28.03.19 at 07:59, <jgross@suse.com> wrote:
>> On 27/03/2019 17:52, Juergen Gross wrote:
>>> On 27/03/2019 17:38, Jan Beulich wrote:
>>>>>>> On 27.03.19 at 17:18, <jgross@suse.com> wrote:
>>>>> On 27/03/2019 16:55, Andrew Cooper wrote:
>>>>>> On 18/03/2019 13:11, Juergen Gross wrote:
>>>>>>> Instead of freeing percpu areas during suspend and allocating them
>>>>>>> again when resuming keep them. Only free an area in case a cpu didn't
>>>>>>> come up again when resuming.
>>>>>>>
>>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>>
>>>>>> Hmm - this is slightly problematic, given the dual nature of this code.
>>>>>>
>>>>>> I agree that it this change is beneficial for the suspend case, but it
>>>>>> is a problem when we are parking an individual CPU for smt=0 or
>>>>>> xen-hptool reasons.
>>>>>>
>>>>>> Do we have any hint we can use when taking the CPU down as to whether
>>>>>> we're expecting it to come straight back up again?
>>>>>
>>>>> Did you look into the patch? I did this by testing system_state.
>>>>
>>>> I think there's a wider problem here: enable_nonboot_cpus()
>>>> only brings back up the CPUs that were previously online.
>>>> Parked ones would be left alone, yet after resume they'd
>>>> need to be put back into parked state.
>>>
>>> I can add that handling in the respin of the series.
>>
>> Looking deeper into that mess I believe that should be a series of its
>> own. Cpu parking needs to be handled for cpu hotplug and core parking
>> (XENPF_core_parking), too.
> 
> What issue do you see for CPU hotplug? cpu_up_helper() has
> been modified by the parking series.

I was thinking of hot unplug. cpu_down() won't do the job for a parked
cpu.

> For core parking I wonder whether core_parking_helper()
> shouldn't, first of all, invoke cpu_{up,down}_helper(). This
> wouldn't be enough, though - the policy hooks need to honor
> opt_smt as well.

Right.

> As to this wanting to be a patch / series of its own - I don't
> mind, but preferably it would come ahead of your changes
> here, so that it can be backported independently and
> (sufficiently) easily (unless of course there's really no
> collision between the two).

In case there is a collision it should be fairly minimal.

I'd prefer not to block my series as it is a prerequisite for my core
scheduling series, which I believe should go in rather sooner than later
as it probably should see lot of testing before the next release.


Juergen
Jan Beulich March 28, 2019, 9:36 a.m. UTC | #13
>>> On 28.03.19 at 09:35, <jgross@suse.com> wrote:
> On 28/03/2019 09:03, Jan Beulich wrote:
>>>>> On 28.03.19 at 07:59, <jgross@suse.com> wrote:
>>> On 27/03/2019 17:52, Juergen Gross wrote:
>>>> On 27/03/2019 17:38, Jan Beulich wrote:
>>>>>>>> On 27.03.19 at 17:18, <jgross@suse.com> wrote:
>>>>>> On 27/03/2019 16:55, Andrew Cooper wrote:
>>>>>>> On 18/03/2019 13:11, Juergen Gross wrote:
>>>>>>>> Instead of freeing percpu areas during suspend and allocating them
>>>>>>>> again when resuming keep them. Only free an area in case a cpu didn't
>>>>>>>> come up again when resuming.
>>>>>>>>
>>>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>>>
>>>>>>> Hmm - this is slightly problematic, given the dual nature of this code.
>>>>>>>
>>>>>>> I agree that it this change is beneficial for the suspend case, but it
>>>>>>> is a problem when we are parking an individual CPU for smt=0 or
>>>>>>> xen-hptool reasons.
>>>>>>>
>>>>>>> Do we have any hint we can use when taking the CPU down as to whether
>>>>>>> we're expecting it to come straight back up again?
>>>>>>
>>>>>> Did you look into the patch? I did this by testing system_state.
>>>>>
>>>>> I think there's a wider problem here: enable_nonboot_cpus()
>>>>> only brings back up the CPUs that were previously online.
>>>>> Parked ones would be left alone, yet after resume they'd
>>>>> need to be put back into parked state.
>>>>
>>>> I can add that handling in the respin of the series.
>>>
>>> Looking deeper into that mess I believe that should be a series of its
>>> own. Cpu parking needs to be handled for cpu hotplug and core parking
>>> (XENPF_core_parking), too.
>> 
>> What issue do you see for CPU hotplug? cpu_up_helper() has
>> been modified by the parking series.
> 
> I was thinking of hot unplug. cpu_down() won't do the job for a parked
> cpu.

There's nothing to be done when soft-offlining a CPU. And I'm
not convinced physical CPU unplug was actually ever tested to
work.

>> As to this wanting to be a patch / series of its own - I don't
>> mind, but preferably it would come ahead of your changes
>> here, so that it can be backported independently and
>> (sufficiently) easily (unless of course there's really no
>> collision between the two).
> 
> In case there is a collision it should be fairly minimal.
> 
> I'd prefer not to block my series as it is a prerequisite for my core
> scheduling series, which I believe should go in rather sooner than later
> as it probably should see lot of testing before the next release.

Understood.

Jan
Jan Beulich April 11, 2019, 9:49 a.m. UTC | #14
>>> On 28.03.19 at 09:03, <JBeulich@suse.com> wrote:
> For core parking I wonder whether core_parking_helper()
> shouldn't, first of all, invoke cpu_{up,down}_helper(). This
> wouldn't be enough, though - the policy hooks need to honor
> opt_smt as well.

Actually no, there was no problem at the time there: With
opt_smt set to false, no secondary thread would ever have
made it into core_parking_cpunum[] (that's where CPU numbers
to be passed to cpu_up() get taken from). A problem here was
introduced only by Andrew's 2bed1bc241, making it possible for
opt_smt to change at runtime. I think I'll make a patch to have
smt_up_down_helper() call into core-parking code to purge
CPUs from core_parking_cpunum[] as needed.

The interaction of core-parking and xen-hptool activities is up
for discussion anyway, I think. At least to me it's not
immediately clear which of the two should take priority.
Allowing admins to shoot themselves in the foot (as we appear
to do now) is a reasonable possibility, but not the only one, the
more that the platform is liable to notice higher power
consumption and to subsequently request offlining of CPUs
again anyway.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/percpu.c b/xen/arch/x86/percpu.c
index 8be4ebddf4..5ea14b6ec3 100644
--- a/xen/arch/x86/percpu.c
+++ b/xen/arch/x86/percpu.c
@@ -76,7 +76,8 @@  static int cpu_percpu_callback(
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        if ( !park_offline_cpus )
+    case CPU_RESUME_FAILED:
+        if ( !park_offline_cpus && system_state != SYS_STATE_suspend )
             free_percpu_area(cpu);
         break;
     case CPU_REMOVE: