Message ID | 1553878635-11959-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen/timers: Fix memory leak with cpu hot unplug | expand |
>>> On 29.03.19 at 17:57, <andrew.cooper3@citrix.com> wrote: > timer_softirq_action() realloc's itself a larger timer heap whenever > necessary, which includes bootstrapping from the empty dummy_heap. Nothing > ever freed this allocation. > > CPU hot unplug and plug has the side effect of zeroing the percpu data area, > which clears ts->heap. This in turn causes new timers to be put on the list > rather than the heap, and for timer_softirq_action() to bootstrap itself > again. > > This in practice leaks ts->heap every time a CPU is hot unplugged and > replugged. In the cpu notifier, free the heap after migrating all other > timers away. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Could I talk you into using online/offline instead of plug/unplug, as the latter is pretty tied to the physical operation of adding / removing a CPU to / from a system, whereas the issue here also surfaces for pure software actions like suspend/resume or the xen-hptool operations? > --- a/xen/common/timer.c > +++ b/xen/common/timer.c > @@ -631,6 +631,10 @@ static int cpu_callback( > case CPU_UP_CANCELED: > case CPU_DEAD: > migrate_timers_from_cpu(cpu); > + ASSERT(heap_metadata(ts->heap)->size == 0); > + if ( heap_metadata(ts->heap)->limit ) > + xfree(ts->heap); > + ts->heap = dummy_heap; > break; I think it would be worthwhile to add a comment to clarify that the updating of per-CPU data here is not entirely pointless, despite the zeroing done when bringing a CPU back up. Additionally - is this really the right thing to do uniformly during CPU_UP_CANCELED / CPU_DEAD? Shouldn't this be done conditionally upon park_offline_cpus there, and get done for CPU_REMOVE in the opposite case (like we do elsewhere, in particular in cpu_percpu_callback() itself)? Jan
On 01/04/2019 09:27, Jan Beulich wrote: >>>> On 29.03.19 at 17:57, <andrew.cooper3@citrix.com> wrote: >> timer_softirq_action() realloc's itself a larger timer heap whenever >> necessary, which includes bootstrapping from the empty dummy_heap. Nothing >> ever freed this allocation. >> >> CPU hot unplug and plug has the side effect of zeroing the percpu data area, >> which clears ts->heap. This in turn causes new timers to be put on the list >> rather than the heap, and for timer_softirq_action() to bootstrap itself >> again. >> >> This in practice leaks ts->heap every time a CPU is hot unplugged and >> replugged. In the cpu notifier, free the heap after migrating all other >> timers away. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Could I talk you into using online/offline instead of plug/unplug, > as the latter is pretty tied to the physical operation of adding / > removing a CPU to / from a system, whereas the issue here > also surfaces for pure software actions like suspend/resume > or the xen-hptool operations? Ok. > >> --- a/xen/common/timer.c >> +++ b/xen/common/timer.c >> @@ -631,6 +631,10 @@ static int cpu_callback( >> case CPU_UP_CANCELED: >> case CPU_DEAD: >> migrate_timers_from_cpu(cpu); >> + ASSERT(heap_metadata(ts->heap)->size == 0); >> + if ( heap_metadata(ts->heap)->limit ) >> + xfree(ts->heap); >> + ts->heap = dummy_heap; >> break; > I think it would be worthwhile to add a comment to clarify that > the updating of per-CPU data here is not entirely pointless, > despite the zeroing done when bringing a CPU back up. What kind of comment do you think would be useful here? ts->heap obviously shouldn’t be left as a wild pointer, and I don't see why this point is worthy of comment. > > Additionally - is this really the right thing to do uniformly during > CPU_UP_CANCELED / CPU_DEAD? Shouldn't this be done > conditionally upon park_offline_cpus there, and get done for > CPU_REMOVE in the opposite case (like we do elsewhere, in > particular in cpu_percpu_callback() itself)? Its certainly the safest course of action, and obviously needs to follow migrate_timers_from_cpu() Do parked CPUs actually get entered into the online map? It appears so. Either way, the actual semantics of park_offline_cpus are undocumented, and if the intended semantics are to use that bifurcated logic everywhere, then I think the semantics want rethinking. ~Andrew
>>> On 01.04.19 at 18:59, <andrew.cooper3@citrix.com> wrote: > On 01/04/2019 09:27, Jan Beulich wrote: >>>>> On 29.03.19 at 17:57, <andrew.cooper3@citrix.com> wrote: >>> --- a/xen/common/timer.c >>> +++ b/xen/common/timer.c >>> @@ -631,6 +631,10 @@ static int cpu_callback( >>> case CPU_UP_CANCELED: >>> case CPU_DEAD: >>> migrate_timers_from_cpu(cpu); >>> + ASSERT(heap_metadata(ts->heap)->size == 0); >>> + if ( heap_metadata(ts->heap)->limit ) >>> + xfree(ts->heap); >>> + ts->heap = dummy_heap; >>> break; >> I think it would be worthwhile to add a comment to clarify that >> the updating of per-CPU data here is not entirely pointless, >> despite the zeroing done when bringing a CPU back up. > > What kind of comment do you think would be useful here? > > ts->heap obviously shouldn’t be left as a wild pointer, and I don't see > why this point is worthy of comment. Oh, I wasn't suggesting to comment the freeing. It's the storing of dummy_heap's address which might look unnecessary when being aware of the clearing of per-CPU data. >> Additionally - is this really the right thing to do uniformly during >> CPU_UP_CANCELED / CPU_DEAD? Shouldn't this be done >> conditionally upon park_offline_cpus there, and get done for >> CPU_REMOVE in the opposite case (like we do elsewhere, in >> particular in cpu_percpu_callback() itself)? > > Its certainly the safest course of action, and obviously needs to follow > migrate_timers_from_cpu() > > Do parked CPUs actually get entered into the online map? It appears so. No, they don't: It's a full cpu_down() they go through. That's why the new CPU_REMOVE notification was introduced, for handlers to be able to free per-CPU data at the appropriate point in time. > Either way, the actual semantics of park_offline_cpus are undocumented, > and if the intended semantics are to use that bifurcated logic > everywhere, then I think the semantics want rethinking. Well, I don't mind finding a different, perhaps easier to use model. Back then, when introducing parking, I couldn't think of one. Jan
diff --git a/xen/common/timer.c b/xen/common/timer.c index 98f2c48..afcb1b0 100644 --- a/xen/common/timer.c +++ b/xen/common/timer.c @@ -631,6 +631,10 @@ static int cpu_callback( case CPU_UP_CANCELED: case CPU_DEAD: migrate_timers_from_cpu(cpu); + ASSERT(heap_metadata(ts->heap)->size == 0); + if ( heap_metadata(ts->heap)->limit ) + xfree(ts->heap); + ts->heap = dummy_heap; break; default: break;
timer_softirq_action() realloc's itself a larger timer heap whenever necessary, which includes bootstrapping from the empty dummy_heap. Nothing ever freed this allocation. CPU hot unplug and plug has the side effect of zeroing the percpu data area, which clears ts->heap. This in turn causes new timers to be put on the list rather than the heap, and for timer_softirq_action() to bootstrap itself again. This in practice leaks ts->heap every time a CPU is hot unplugged and replugged. In the cpu notifier, free the heap after migrating all other timers away. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: George Dunlap <George.Dunlap@eu.citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Jan Beulich <JBeulich@suse.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Tim Deegan <tim@xen.org> CC: Wei Liu <wei.liu2@citrix.com> CC: Julien Grall <julien.grall@arm.com> This texturally depends on "xen/timers: Document and improve the representation of the timer heap metadata" which was necessary to understand the problem well enough to fix it, but isn't backporting over this change isn't too complicated (should the cleanup patch not want to be backported). --- xen/common/timer.c | 4 ++++ 1 file changed, 4 insertions(+)