diff mbox series

xen/timers: Fix memory leak with cpu hot unplug

Message ID 1553878635-11959-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen/timers: Fix memory leak with cpu hot unplug | expand

Commit Message

Andrew Cooper March 29, 2019, 4:57 p.m. UTC
timer_softirq_action() realloc's itself a larger timer heap whenever
necessary, which includes bootstrapping from the empty dummy_heap.  Nothing
ever freed this allocation.

CPU hot unplug and plug has the side effect of zeroing the percpu data area,
which clears ts->heap.  This in turn causes new timers to be put on the list
rather than the heap, and for timer_softirq_action() to bootstrap itself
again.

This in practice leaks ts->heap every time a CPU is hot unplugged and
replugged.  In the cpu notifier, free the heap after migrating all other
timers away.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Julien Grall <julien.grall@arm.com>

This texturally depends on "xen/timers: Document and improve the
representation of the timer heap metadata" which was necessary to understand
the problem well enough to fix it, but isn't backporting over this change
isn't too complicated (should the cleanup patch not want to be backported).
---
 xen/common/timer.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jan Beulich April 1, 2019, 8:27 a.m. UTC | #1
>>> On 29.03.19 at 17:57, <andrew.cooper3@citrix.com> wrote:
> timer_softirq_action() realloc's itself a larger timer heap whenever
> necessary, which includes bootstrapping from the empty dummy_heap.  Nothing
> ever freed this allocation.
> 
> CPU hot unplug and plug has the side effect of zeroing the percpu data area,
> which clears ts->heap.  This in turn causes new timers to be put on the list
> rather than the heap, and for timer_softirq_action() to bootstrap itself
> again.
> 
> This in practice leaks ts->heap every time a CPU is hot unplugged and
> replugged.  In the cpu notifier, free the heap after migrating all other
> timers away.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Could I talk you into using online/offline instead of plug/unplug,
as the latter is pretty tied to the physical operation of adding /
removing a CPU to / from a system, whereas the issue here
also surfaces for pure software actions like suspend/resume
or the xen-hptool operations?

> --- a/xen/common/timer.c
> +++ b/xen/common/timer.c
> @@ -631,6 +631,10 @@ static int cpu_callback(
>      case CPU_UP_CANCELED:
>      case CPU_DEAD:
>          migrate_timers_from_cpu(cpu);
> +        ASSERT(heap_metadata(ts->heap)->size == 0);
> +        if ( heap_metadata(ts->heap)->limit )
> +            xfree(ts->heap);
> +        ts->heap = dummy_heap;
>          break;

I think it would be worthwhile to add a comment to clarify that
the updating of per-CPU data here is not entirely pointless,
despite the zeroing done when bringing a CPU back up.

Additionally - is this really the right thing to do uniformly during
CPU_UP_CANCELED / CPU_DEAD? Shouldn't this be done
conditionally upon park_offline_cpus there, and get done for
CPU_REMOVE in the opposite case (like we do elsewhere, in
particular in cpu_percpu_callback() itself)?

Jan
Andrew Cooper April 1, 2019, 4:59 p.m. UTC | #2
On 01/04/2019 09:27, Jan Beulich wrote:
>>>> On 29.03.19 at 17:57, <andrew.cooper3@citrix.com> wrote:
>> timer_softirq_action() realloc's itself a larger timer heap whenever
>> necessary, which includes bootstrapping from the empty dummy_heap.  Nothing
>> ever freed this allocation.
>>
>> CPU hot unplug and plug has the side effect of zeroing the percpu data area,
>> which clears ts->heap.  This in turn causes new timers to be put on the list
>> rather than the heap, and for timer_softirq_action() to bootstrap itself
>> again.
>>
>> This in practice leaks ts->heap every time a CPU is hot unplugged and
>> replugged.  In the cpu notifier, free the heap after migrating all other
>> timers away.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Could I talk you into using online/offline instead of plug/unplug,
> as the latter is pretty tied to the physical operation of adding /
> removing a CPU to / from a system, whereas the issue here
> also surfaces for pure software actions like suspend/resume
> or the xen-hptool operations?

Ok.

>
>> --- a/xen/common/timer.c
>> +++ b/xen/common/timer.c
>> @@ -631,6 +631,10 @@ static int cpu_callback(
>>      case CPU_UP_CANCELED:
>>      case CPU_DEAD:
>>          migrate_timers_from_cpu(cpu);
>> +        ASSERT(heap_metadata(ts->heap)->size == 0);
>> +        if ( heap_metadata(ts->heap)->limit )
>> +            xfree(ts->heap);
>> +        ts->heap = dummy_heap;
>>          break;
> I think it would be worthwhile to add a comment to clarify that
> the updating of per-CPU data here is not entirely pointless,
> despite the zeroing done when bringing a CPU back up.

What kind of comment do you think would be useful here? 

ts->heap obviously shouldn’t be left as a wild pointer, and I don't see
why this point is worthy of comment.

>
> Additionally - is this really the right thing to do uniformly during
> CPU_UP_CANCELED / CPU_DEAD? Shouldn't this be done
> conditionally upon park_offline_cpus there, and get done for
> CPU_REMOVE in the opposite case (like we do elsewhere, in
> particular in cpu_percpu_callback() itself)?

Its certainly the safest course of action, and obviously needs to follow
migrate_timers_from_cpu()

Do parked CPUs actually get entered into the online map?  It appears so.

Either way, the actual semantics of park_offline_cpus are undocumented,
and if the intended semantics are to use that bifurcated logic
everywhere, then I think the semantics want rethinking.

~Andrew
Jan Beulich April 2, 2019, 10:32 a.m. UTC | #3
>>> On 01.04.19 at 18:59, <andrew.cooper3@citrix.com> wrote:
> On 01/04/2019 09:27, Jan Beulich wrote:
>>>>> On 29.03.19 at 17:57, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/common/timer.c
>>> +++ b/xen/common/timer.c
>>> @@ -631,6 +631,10 @@ static int cpu_callback(
>>>      case CPU_UP_CANCELED:
>>>      case CPU_DEAD:
>>>          migrate_timers_from_cpu(cpu);
>>> +        ASSERT(heap_metadata(ts->heap)->size == 0);
>>> +        if ( heap_metadata(ts->heap)->limit )
>>> +            xfree(ts->heap);
>>> +        ts->heap = dummy_heap;
>>>          break;
>> I think it would be worthwhile to add a comment to clarify that
>> the updating of per-CPU data here is not entirely pointless,
>> despite the zeroing done when bringing a CPU back up.
> 
> What kind of comment do you think would be useful here? 
> 
> ts->heap obviously shouldn’t be left as a wild pointer, and I don't see
> why this point is worthy of comment.

Oh, I wasn't suggesting to comment the freeing. It's the storing
of dummy_heap's address which might look unnecessary when
being aware of the clearing of per-CPU data.

>> Additionally - is this really the right thing to do uniformly during
>> CPU_UP_CANCELED / CPU_DEAD? Shouldn't this be done
>> conditionally upon park_offline_cpus there, and get done for
>> CPU_REMOVE in the opposite case (like we do elsewhere, in
>> particular in cpu_percpu_callback() itself)?
> 
> Its certainly the safest course of action, and obviously needs to follow
> migrate_timers_from_cpu()
> 
> Do parked CPUs actually get entered into the online map?  It appears so.

No, they don't: It's a full cpu_down() they go through. That's
why the new CPU_REMOVE notification was introduced, for
handlers to be able to free per-CPU data at the appropriate
point in time.

> Either way, the actual semantics of park_offline_cpus are undocumented,
> and if the intended semantics are to use that bifurcated logic
> everywhere, then I think the semantics want rethinking.

Well, I don't mind finding a different, perhaps easier to use
model. Back then, when introducing parking, I couldn't think
of one.

Jan
diff mbox series

Patch

diff --git a/xen/common/timer.c b/xen/common/timer.c
index 98f2c48..afcb1b0 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -631,6 +631,10 @@  static int cpu_callback(
     case CPU_UP_CANCELED:
     case CPU_DEAD:
         migrate_timers_from_cpu(cpu);
+        ASSERT(heap_metadata(ts->heap)->size == 0);
+        if ( heap_metadata(ts->heap)->limit )
+            xfree(ts->heap);
+        ts->heap = dummy_heap;
         break;
     default:
         break;