Message ID | 5CAF1AD502000078002268CA@prv1-mh.provo.novell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | timers: move back migrate_timers_from_cpu() invocation | expand |
On 11/04/2019 11:45, Jan Beulich wrote: > Commit 597fbb8be6 ("xen/timers: Fix memory leak with cpu unplug/plug") > went a little too far: Migrating timers away from a CPU being offlined > needs to heppen independent of whether it get parked or fully offlined. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/common/timer.c > +++ b/xen/common/timer.c > @@ -619,8 +619,6 @@ static void free_percpu_timers(unsigned > { > struct timers *ts = &per_cpu(timers, cpu); > > - migrate_timers_from_cpu(cpu); > - > ASSERT(heap_metadata(ts->heap)->size == 0); > if ( heap_metadata(ts->heap)->limit ) > { > @@ -648,6 +646,8 @@ static int cpu_callback( > case CPU_UP_CANCELED: > case CPU_DEAD: > case CPU_RESUME_FAILED: > + migrate_timers_from_cpu(cpu); > + > if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) > free_percpu_timers(cpu); > break; I'm pretty sure you also need a call in the REMOVE_CASE. The cpu comes online for long enough to potentially gain a timer. ~Andrew
>>> On 11.04.19 at 12:47, <andrew.cooper3@citrix.com> wrote: > On 11/04/2019 11:45, Jan Beulich wrote: >> @@ -648,6 +646,8 @@ static int cpu_callback( >> case CPU_UP_CANCELED: >> case CPU_DEAD: >> case CPU_RESUME_FAILED: >> + migrate_timers_from_cpu(cpu); >> + >> if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) >> free_percpu_timers(cpu); >> break; > > I'm pretty sure you also need a call in the REMOVE_CASE. The cpu comes > online for long enough to potentially gain a timer. What do you mean by "comes online for long enough"? There's no call site for the notifier with CPU_REMOVE right now, so what the eventual behavior is going to be is simply unknown. I for one don't expect the CPU to be brought back up again in that case. I'm wondering if you're mixing this up with CPU_RESUME_FAILED. Jan
On 11/04/2019 11:55, Jan Beulich wrote: >>>> On 11.04.19 at 12:47, <andrew.cooper3@citrix.com> wrote: >> On 11/04/2019 11:45, Jan Beulich wrote: >>> @@ -648,6 +646,8 @@ static int cpu_callback( >>> case CPU_UP_CANCELED: >>> case CPU_DEAD: >>> case CPU_RESUME_FAILED: >>> + migrate_timers_from_cpu(cpu); >>> + >>> if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) >>> free_percpu_timers(cpu); >>> break; >> I'm pretty sure you also need a call in the REMOVE_CASE. The cpu comes >> online for long enough to potentially gain a timer. > What do you mean by "comes online for long enough"? There's > no call site for the notifier with CPU_REMOVE right now, so what > the eventual behavior is going to be is simply unknown. I for one > don't expect the CPU to be brought back up again in that case. > I'm wondering if you're mixing this up with CPU_RESUME_FAILED. Now I'm even more confused, but yes clearly - it wasn't the CPU_REMOVE case. That said, by making this adjustment, you are now liable to hit an assertion in the REMOVE case, which on a release build will result in complete memory corruption, as it frees an in-use datastructure. ~Andrew
>>> On 11.04.19 at 22:03, <andrew.cooper3@citrix.com> wrote: > On 11/04/2019 11:55, Jan Beulich wrote: >>>>> On 11.04.19 at 12:47, <andrew.cooper3@citrix.com> wrote: >>> On 11/04/2019 11:45, Jan Beulich wrote: >>>> @@ -648,6 +646,8 @@ static int cpu_callback( >>>> case CPU_UP_CANCELED: >>>> case CPU_DEAD: >>>> case CPU_RESUME_FAILED: >>>> + migrate_timers_from_cpu(cpu); >>>> + >>>> if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) >>>> free_percpu_timers(cpu); >>>> break; >>> I'm pretty sure you also need a call in the REMOVE_CASE. The cpu comes >>> online for long enough to potentially gain a timer. >> What do you mean by "comes online for long enough"? There's >> no call site for the notifier with CPU_REMOVE right now, so what >> the eventual behavior is going to be is simply unknown. I for one >> don't expect the CPU to be brought back up again in that case. >> I'm wondering if you're mixing this up with CPU_RESUME_FAILED. > > Now I'm even more confused, but yes clearly - it wasn't the CPU_REMOVE case. > > That said, by making this adjustment, you are now liable to hit an > assertion in the REMOVE case, which on a release build will result in > complete memory corruption, as it frees an in-use datastructure. Now I'm afraid I'm confused as well: Which assertion would trigger, and which in-use data structure would get freed? For an offline (and parked) CPU, the heap pointer would consistently point to dummy_heap at all times. Jan
--- a/xen/common/timer.c +++ b/xen/common/timer.c @@ -619,8 +619,6 @@ static void free_percpu_timers(unsigned { struct timers *ts = &per_cpu(timers, cpu); - migrate_timers_from_cpu(cpu); - ASSERT(heap_metadata(ts->heap)->size == 0); if ( heap_metadata(ts->heap)->limit ) { @@ -648,6 +646,8 @@ static int cpu_callback( case CPU_UP_CANCELED: case CPU_DEAD: case CPU_RESUME_FAILED: + migrate_timers_from_cpu(cpu); + if ( !park_offline_cpus && system_state != SYS_STATE_suspend ) free_percpu_timers(cpu); break;
Commit 597fbb8be6 ("xen/timers: Fix memory leak with cpu unplug/plug") went a little too far: Migrating timers away from a CPU being offlined needs to heppen independent of whether it get parked or fully offlined. Signed-off-by: Jan Beulich <jbeulich@suse.com>