Message ID | 152901304242.252222.9947658955703347553.stgit@bahia.lan (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > The spapr_realize_vcpu() function doesn't rollback in case of error. > This isn't a problem with coldplugged CPUs because the machine won't > start and QEMU will exit. Hotplug is a different story though: the > CPU thread is started under object_property_set_bool() and it assumes > it can access the CPU object. > > If icp_create() fails, we return an error without unregistering the > reset handler for this CPU, and we let the underlying QEMU thread for > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > already realized CPUs either, but happily frees all of them anyway, the > CPU thread crashes instantly: > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > GKU: failing icp_create (cpu 0x11497fd0) > ^^^^^^^^^^ > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > ^^^^^^^^^^^^^^ > pointer to the CPU object > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > (gdb) p obj->class->type > $1 = (Type) 0x0 > (gdb) p * obj > $2 = {class = 0x10ea9c10, free = 0x11244620, > ^^^^^^^^^^ > should be g_free > (gdb) p g_free > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > obj is a dangling pointer to the CPU that was just destroyed in > spapr_cpu_core_realize(). > > This patch adds proper rollback to both spapr_realize_vcpu() and > spapr_cpu_core_realize(). > > Signed-off-by: Greg Kurz <groug@kaod.org> Applied to ppc-for-3.0, since it definitely looks to fix some problems. > --- > hw/ppc/spapr_cpu_core.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index 003c4c5a79d2..04c818a6ecac 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -159,12 +159,16 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr, > spapr_cpu->icp = icp_create(OBJECT(cpu), spapr->icp_type, > XICS_FABRIC(spapr), &local_err); > if (local_err) { > - goto error; > + goto error_unregister; > } > > return; > > +error_unregister: > + qemu_unregister_reset(spapr_cpu_reset, cpu); > + cpu_remove_sync(CPU(cpu)); I'm a little unclear on exactly what init the cpu_remove_sync() is mirroring, though. > error: > + g_free(spapr_cpu); > error_propagate(errp, local_err); > } > > @@ -222,11 +226,15 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp) > for (j = 0; j < cc->nr_threads; j++) { > spapr_realize_vcpu(sc->threads[j], spapr, &local_err); > if (local_err) { > - goto err; > + goto err_unrealize; > } > } > return; > > +err_unrealize: > + while (--j >= 0) { > + spapr_unrealize_vcpu(sc->threads[i]); > + } > err: > while (--i >= 0) { > obj = OBJECT(sc->threads[i]); >
On Fri, Jun 15, 2018 at 10:02:25AM +1000, David Gibson wrote: > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > The spapr_realize_vcpu() function doesn't rollback in case of error. > > This isn't a problem with coldplugged CPUs because the machine won't > > start and QEMU will exit. Hotplug is a different story though: the > > CPU thread is started under object_property_set_bool() and it assumes > > it can access the CPU object. > > > > If icp_create() fails, we return an error without unregistering the > > reset handler for this CPU, and we let the underlying QEMU thread for > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > > already realized CPUs either, but happily frees all of them anyway, the > > CPU thread crashes instantly: > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > GKU: failing icp_create (cpu 0x11497fd0) > > ^^^^^^^^^^ > > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > > ^^^^^^^^^^^^^^ > > pointer to the CPU object > > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > > (gdb) p obj->class->type > > $1 = (Type) 0x0 > > (gdb) p * obj > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > ^^^^^^^^^^ > > should be g_free > > (gdb) p g_free > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > obj is a dangling pointer to the CPU that was just destroyed in > > spapr_cpu_core_realize(). > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > spapr_cpu_core_realize(). > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > Applied to ppc-for-3.0, since it definitely looks to fix some > problems. Uh.. actually it has a definite bug - the first exit point will call g_free() on an uninitialized spapr_cpu. I fixed it up with a NULL initialization in my tree. > > > --- > > hw/ppc/spapr_cpu_core.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > > index 003c4c5a79d2..04c818a6ecac 100644 > > --- a/hw/ppc/spapr_cpu_core.c > > +++ b/hw/ppc/spapr_cpu_core.c > > @@ -159,12 +159,16 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr, > > spapr_cpu->icp = icp_create(OBJECT(cpu), spapr->icp_type, > > XICS_FABRIC(spapr), &local_err); > > if (local_err) { > > - goto error; > > + goto error_unregister; > > } > > > > return; > > > > +error_unregister: > > + qemu_unregister_reset(spapr_cpu_reset, cpu); > > + cpu_remove_sync(CPU(cpu)); > > I'm a little unclear on exactly what init the cpu_remove_sync() is > mirroring, though. > > > error: > > + g_free(spapr_cpu); > > error_propagate(errp, local_err); > > } > > > > @@ -222,11 +226,15 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp) > > for (j = 0; j < cc->nr_threads; j++) { > > spapr_realize_vcpu(sc->threads[j], spapr, &local_err); > > if (local_err) { > > - goto err; > > + goto err_unrealize; > > } > > } > > return; > > > > +err_unrealize: > > + while (--j >= 0) { > > + spapr_unrealize_vcpu(sc->threads[i]); > > + } > > err: > > while (--i >= 0) { > > obj = OBJECT(sc->threads[i]); > > >
On Fri, 15 Jun 2018 10:02:25 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > The spapr_realize_vcpu() function doesn't rollback in case of error. > > This isn't a problem with coldplugged CPUs because the machine won't > > start and QEMU will exit. Hotplug is a different story though: the > > CPU thread is started under object_property_set_bool() and it assumes > > it can access the CPU object. > > > > If icp_create() fails, we return an error without unregistering the > > reset handler for this CPU, and we let the underlying QEMU thread for > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > > already realized CPUs either, but happily frees all of them anyway, the > > CPU thread crashes instantly: > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > GKU: failing icp_create (cpu 0x11497fd0) > > ^^^^^^^^^^ > > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > > ^^^^^^^^^^^^^^ > > pointer to the CPU object > > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > > (gdb) p obj->class->type > > $1 = (Type) 0x0 > > (gdb) p * obj > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > ^^^^^^^^^^ > > should be g_free > > (gdb) p g_free > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > obj is a dangling pointer to the CPU that was just destroyed in > > spapr_cpu_core_realize(). > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > spapr_cpu_core_realize(). > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > Applied to ppc-for-3.0, since it definitely looks to fix some > problems. > > > --- > > hw/ppc/spapr_cpu_core.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > > index 003c4c5a79d2..04c818a6ecac 100644 > > --- a/hw/ppc/spapr_cpu_core.c > > +++ b/hw/ppc/spapr_cpu_core.c > > @@ -159,12 +159,16 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr, > > spapr_cpu->icp = icp_create(OBJECT(cpu), spapr->icp_type, > > XICS_FABRIC(spapr), &local_err); > > if (local_err) { > > - goto error; > > + goto error_unregister; > > } > > > > return; > > > > +error_unregister: > > + qemu_unregister_reset(spapr_cpu_reset, cpu); > > + cpu_remove_sync(CPU(cpu)); > > I'm a little unclear on exactly what init the cpu_remove_sync() is > mirroring, though. > We have the same call in spapr_unrealize_vcpu(). IIUC it is mirroring object_property_set_bool(OBJECT(cpu), true, "realized", &local_err). > > error: > > + g_free(spapr_cpu); > > error_propagate(errp, local_err); > > } > > > > @@ -222,11 +226,15 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp) > > for (j = 0; j < cc->nr_threads; j++) { > > spapr_realize_vcpu(sc->threads[j], spapr, &local_err); > > if (local_err) { > > - goto err; > > + goto err_unrealize; > > } > > } > > return; > > > > +err_unrealize: > > + while (--j >= 0) { > > + spapr_unrealize_vcpu(sc->threads[i]); > > + } > > err: > > while (--i >= 0) { > > obj = OBJECT(sc->threads[i]); > > >
On Fri, 15 Jun 2018 10:14:31 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > On Fri, Jun 15, 2018 at 10:02:25AM +1000, David Gibson wrote: > > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > > The spapr_realize_vcpu() function doesn't rollback in case of error. > > > This isn't a problem with coldplugged CPUs because the machine won't > > > start and QEMU will exit. Hotplug is a different story though: the > > > CPU thread is started under object_property_set_bool() and it assumes > > > it can access the CPU object. > > > > > > If icp_create() fails, we return an error without unregistering the > > > reset handler for this CPU, and we let the underlying QEMU thread for > > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > > > already realized CPUs either, but happily frees all of them anyway, the > > > CPU thread crashes instantly: > > > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > > GKU: failing icp_create (cpu 0x11497fd0) > > > ^^^^^^^^^^ > > > Program received signal SIGSEGV, Segmentation fault. > > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > > > ^^^^^^^^^^^^^^ > > > pointer to the CPU object > > > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > > > (gdb) p obj->class->type > > > $1 = (Type) 0x0 > > > (gdb) p * obj > > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > > ^^^^^^^^^^ > > > should be g_free > > > (gdb) p g_free > > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > > > obj is a dangling pointer to the CPU that was just destroyed in > > > spapr_cpu_core_realize(). > > > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > > spapr_cpu_core_realize(). > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > > Applied to ppc-for-3.0, since it definitely looks to fix some > > problems. > > Uh.. actually it has a definite bug - the first exit point will call > g_free() on an uninitialized spapr_cpu. I fixed it up with a NULL > initialization in my tree. > Ah... as said in the cover letter, all the series is based on machine_data being set before the call to object_property_set_bool()... Maybe I should have made that explicit with a preparatory patch... Sorry. > > > > > --- > > > hw/ppc/spapr_cpu_core.c | 12 ++++++++++-- > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > > > index 003c4c5a79d2..04c818a6ecac 100644 > > > --- a/hw/ppc/spapr_cpu_core.c > > > +++ b/hw/ppc/spapr_cpu_core.c > > > @@ -159,12 +159,16 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr, > > > spapr_cpu->icp = icp_create(OBJECT(cpu), spapr->icp_type, > > > XICS_FABRIC(spapr), &local_err); > > > if (local_err) { > > > - goto error; > > > + goto error_unregister; > > > } > > > > > > return; > > > > > > +error_unregister: > > > + qemu_unregister_reset(spapr_cpu_reset, cpu); > > > + cpu_remove_sync(CPU(cpu)); > > > > I'm a little unclear on exactly what init the cpu_remove_sync() is > > mirroring, though. > > > > > error: > > > + g_free(spapr_cpu); > > > error_propagate(errp, local_err); > > > } > > > > > > @@ -222,11 +226,15 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp) > > > for (j = 0; j < cc->nr_threads; j++) { > > > spapr_realize_vcpu(sc->threads[j], spapr, &local_err); > > > if (local_err) { > > > - goto err; > > > + goto err_unrealize; > > > } > > > } > > > return; > > > > > > +err_unrealize: > > > + while (--j >= 0) { > > > + spapr_unrealize_vcpu(sc->threads[i]); > > > + } > > > err: > > > while (--i >= 0) { > > > obj = OBJECT(sc->threads[i]); > > > > > > > >
On Fri, Jun 15, 2018 at 07:53:37AM +0200, Greg Kurz wrote: > On Fri, 15 Jun 2018 10:02:25 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > > The spapr_realize_vcpu() function doesn't rollback in case of error. > > > This isn't a problem with coldplugged CPUs because the machine won't > > > start and QEMU will exit. Hotplug is a different story though: the > > > CPU thread is started under object_property_set_bool() and it assumes > > > it can access the CPU object. > > > > > > If icp_create() fails, we return an error without unregistering the > > > reset handler for this CPU, and we let the underlying QEMU thread for > > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > > > already realized CPUs either, but happily frees all of them anyway, the > > > CPU thread crashes instantly: > > > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > > GKU: failing icp_create (cpu 0x11497fd0) > > > ^^^^^^^^^^ > > > Program received signal SIGSEGV, Segmentation fault. > > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > > > ^^^^^^^^^^^^^^ > > > pointer to the CPU object > > > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > > > (gdb) p obj->class->type > > > $1 = (Type) 0x0 > > > (gdb) p * obj > > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > > ^^^^^^^^^^ > > > should be g_free > > > (gdb) p g_free > > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > > > obj is a dangling pointer to the CPU that was just destroyed in > > > spapr_cpu_core_realize(). > > > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > > spapr_cpu_core_realize(). > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > > Applied to ppc-for-3.0, since it definitely looks to fix some > > problems. > > > > > --- > > > hw/ppc/spapr_cpu_core.c | 12 ++++++++++-- > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > > > index 003c4c5a79d2..04c818a6ecac 100644 > > > --- a/hw/ppc/spapr_cpu_core.c > > > +++ b/hw/ppc/spapr_cpu_core.c > > > @@ -159,12 +159,16 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr, > > > spapr_cpu->icp = icp_create(OBJECT(cpu), spapr->icp_type, > > > XICS_FABRIC(spapr), &local_err); > > > if (local_err) { > > > - goto error; > > > + goto error_unregister; > > > } > > > > > > return; > > > > > > +error_unregister: > > > + qemu_unregister_reset(spapr_cpu_reset, cpu); > > > + cpu_remove_sync(CPU(cpu)); > > > > I'm a little unclear on exactly what init the cpu_remove_sync() is > > mirroring, though. > > > > We have the same call in spapr_unrealize_vcpu(). IIUC it is mirroring > object_property_set_bool(OBJECT(cpu), true, "realized", &local_err). Ok. > > > > error: > > > + g_free(spapr_cpu); > > > error_propagate(errp, local_err); > > > } > > > > > > @@ -222,11 +226,15 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp) > > > for (j = 0; j < cc->nr_threads; j++) { > > > spapr_realize_vcpu(sc->threads[j], spapr, &local_err); > > > if (local_err) { > > > - goto err; > > > + goto err_unrealize; > > > } > > > } > > > return; > > > > > > +err_unrealize: > > > + while (--j >= 0) { > > > + spapr_unrealize_vcpu(sc->threads[i]); > > > + } > > > err: > > > while (--i >= 0) { > > > obj = OBJECT(sc->threads[i]); > > > > > >
On Fri, Jun 15, 2018 at 07:58:05AM +0200, Greg Kurz wrote: > On Fri, 15 Jun 2018 10:14:31 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Fri, Jun 15, 2018 at 10:02:25AM +1000, David Gibson wrote: > > > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > > > The spapr_realize_vcpu() function doesn't rollback in case of error. > > > > This isn't a problem with coldplugged CPUs because the machine won't > > > > start and QEMU will exit. Hotplug is a different story though: the > > > > CPU thread is started under object_property_set_bool() and it assumes > > > > it can access the CPU object. > > > > > > > > If icp_create() fails, we return an error without unregistering the > > > > reset handler for this CPU, and we let the underlying QEMU thread for > > > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > > > > already realized CPUs either, but happily frees all of them anyway, the > > > > CPU thread crashes instantly: > > > > > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > > > GKU: failing icp_create (cpu 0x11497fd0) > > > > ^^^^^^^^^^ > > > > Program received signal SIGSEGV, Segmentation fault. > > > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > > > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > > > > ^^^^^^^^^^^^^^ > > > > pointer to the CPU object > > > > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > > > > (gdb) p obj->class->type > > > > $1 = (Type) 0x0 > > > > (gdb) p * obj > > > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > > > ^^^^^^^^^^ > > > > should be g_free > > > > (gdb) p g_free > > > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > > > > > obj is a dangling pointer to the CPU that was just destroyed in > > > > spapr_cpu_core_realize(). > > > > > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > > > spapr_cpu_core_realize(). > > > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > > > > Applied to ppc-for-3.0, since it definitely looks to fix some > > > problems. > > > > Uh.. actually it has a definite bug - the first exit point will call > > g_free() on an uninitialized spapr_cpu. I fixed it up with a NULL > > initialization in my tree. > > Ah... as said in the cover letter, all the series is based on machine_data > being set before the call to object_property_set_bool()... Maybe I should > have made that explicit with a preparatory patch... Sorry. Ah, that makes sense. So, I ended up having to rework a little differently, after I yanked by intc -> machine_data patch because it broke things for clg. I think I've fixed it up correctly now - if you can check the latest ppc-for-3.0 I pushed out, that would be great.
On Fri, 15 Jun 2018 16:29:15 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > On Fri, Jun 15, 2018 at 07:58:05AM +0200, Greg Kurz wrote: > > On Fri, 15 Jun 2018 10:14:31 +1000 > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > On Fri, Jun 15, 2018 at 10:02:25AM +1000, David Gibson wrote: > > > > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > > > > The spapr_realize_vcpu() function doesn't rollback in case of error. > > > > > This isn't a problem with coldplugged CPUs because the machine won't > > > > > start and QEMU will exit. Hotplug is a different story though: the > > > > > CPU thread is started under object_property_set_bool() and it assumes > > > > > it can access the CPU object. > > > > > > > > > > If icp_create() fails, we return an error without unregistering the > > > > > reset handler for this CPU, and we let the underlying QEMU thread for > > > > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > > > > > already realized CPUs either, but happily frees all of them anyway, the > > > > > CPU thread crashes instantly: > > > > > > > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > > > > GKU: failing icp_create (cpu 0x11497fd0) > > > > > ^^^^^^^^^^ > > > > > Program received signal SIGSEGV, Segmentation fault. > > > > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > > > > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > > > > > ^^^^^^^^^^^^^^ > > > > > pointer to the CPU object > > > > > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > > > > > (gdb) p obj->class->type > > > > > $1 = (Type) 0x0 > > > > > (gdb) p * obj > > > > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > > > > ^^^^^^^^^^ > > > > > should be g_free > > > > > (gdb) p g_free > > > > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > > > > > > > obj is a dangling pointer to the CPU that was just destroyed in > > > > > spapr_cpu_core_realize(). > > > > > > > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > > > > spapr_cpu_core_realize(). > > > > > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > > > > > > Applied to ppc-for-3.0, since it definitely looks to fix some > > > > problems. > > > > > > Uh.. actually it has a definite bug - the first exit point will call > > > g_free() on an uninitialized spapr_cpu. I fixed it up with a NULL > > > initialization in my tree. > > > > Ah... as said in the cover letter, all the series is based on machine_data > > being set before the call to object_property_set_bool()... Maybe I should > > have made that explicit with a preparatory patch... Sorry. > > Ah, that makes sense. > > So, I ended up having to rework a little differently, after I yanked > by intc -> machine_data patch because it broke things for clg. I > think I've fixed it up correctly now - if you can check the latest > ppc-for-3.0 I pushed out, that would be great. > I'll do this ASAP.
On Fri, 15 Jun 2018 09:07:24 +0200 Greg Kurz <groug@kaod.org> wrote: > On Fri, 15 Jun 2018 16:29:15 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Fri, Jun 15, 2018 at 07:58:05AM +0200, Greg Kurz wrote: > > > On Fri, 15 Jun 2018 10:14:31 +1000 > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > On Fri, Jun 15, 2018 at 10:02:25AM +1000, David Gibson wrote: > > > > > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > > > > > The spapr_realize_vcpu() function doesn't rollback in case of error. > > > > > > This isn't a problem with coldplugged CPUs because the machine won't > > > > > > start and QEMU will exit. Hotplug is a different story though: the > > > > > > CPU thread is started under object_property_set_bool() and it assumes > > > > > > it can access the CPU object. > > > > > > > > > > > > If icp_create() fails, we return an error without unregistering the > > > > > > reset handler for this CPU, and we let the underlying QEMU thread for > > > > > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > > > > > > already realized CPUs either, but happily frees all of them anyway, the > > > > > > CPU thread crashes instantly: > > > > > > > > > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > > > > > GKU: failing icp_create (cpu 0x11497fd0) > > > > > > ^^^^^^^^^^ > > > > > > Program received signal SIGSEGV, Segmentation fault. > > > > > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > > > > > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > > > > > > ^^^^^^^^^^^^^^ > > > > > > pointer to the CPU object > > > > > > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > > > > > > (gdb) p obj->class->type > > > > > > $1 = (Type) 0x0 > > > > > > (gdb) p * obj > > > > > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > > > > > ^^^^^^^^^^ > > > > > > should be g_free > > > > > > (gdb) p g_free > > > > > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > > > > > > > > > obj is a dangling pointer to the CPU that was just destroyed in > > > > > > spapr_cpu_core_realize(). > > > > > > > > > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > > > > > spapr_cpu_core_realize(). > > > > > > > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > > > > > > > > Applied to ppc-for-3.0, since it definitely looks to fix some > > > > > problems. > > > > > > > > Uh.. actually it has a definite bug - the first exit point will call > > > > g_free() on an uninitialized spapr_cpu. I fixed it up with a NULL > > > > initialization in my tree. > > > > > > Ah... as said in the cover letter, all the series is based on machine_data > > > being set before the call to object_property_set_bool()... Maybe I should > > > have made that explicit with a preparatory patch... Sorry. > > > > Ah, that makes sense. > > > > So, I ended up having to rework a little differently, after I yanked > > by intc -> machine_data patch because it broke things for clg. I > > think I've fixed it up correctly now - if you can check the latest > > ppc-for-3.0 I pushed out, that would be great. > > > > I'll do this ASAP. Oops, I've just spotted a nit in my original patch, that causes QEMU to crash if threads > 1... but I had only tested with single threaded cores :) > +err_unrealize: > + while (--j >= 0) { > + spapr_unrealize_vcpu(sc->threads[i]); ^^^ should be j Appart from that, it looks good.
On Fri, Jun 15, 2018 at 10:01:47AM +0200, Greg Kurz wrote: > On Fri, 15 Jun 2018 09:07:24 +0200 > Greg Kurz <groug@kaod.org> wrote: > > > On Fri, 15 Jun 2018 16:29:15 +1000 > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > On Fri, Jun 15, 2018 at 07:58:05AM +0200, Greg Kurz wrote: > > > > On Fri, 15 Jun 2018 10:14:31 +1000 > > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > On Fri, Jun 15, 2018 at 10:02:25AM +1000, David Gibson wrote: > > > > > > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > > > > > > The spapr_realize_vcpu() function doesn't rollback in case of error. > > > > > > > This isn't a problem with coldplugged CPUs because the machine won't > > > > > > > start and QEMU will exit. Hotplug is a different story though: the > > > > > > > CPU thread is started under object_property_set_bool() and it assumes > > > > > > > it can access the CPU object. > > > > > > > > > > > > > > If icp_create() fails, we return an error without unregistering the > > > > > > > reset handler for this CPU, and we let the underlying QEMU thread for > > > > > > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > > > > > > > already realized CPUs either, but happily frees all of them anyway, the > > > > > > > CPU thread crashes instantly: > > > > > > > > > > > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > > > > > > GKU: failing icp_create (cpu 0x11497fd0) > > > > > > > ^^^^^^^^^^ > > > > > > > Program received signal SIGSEGV, Segmentation fault. > > > > > > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > > > > > > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > > > > > > > ^^^^^^^^^^^^^^ > > > > > > > pointer to the CPU object > > > > > > > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > > > > > > > (gdb) p obj->class->type > > > > > > > $1 = (Type) 0x0 > > > > > > > (gdb) p * obj > > > > > > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > > > > > > ^^^^^^^^^^ > > > > > > > should be g_free > > > > > > > (gdb) p g_free > > > > > > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > > > > > > > > > > > obj is a dangling pointer to the CPU that was just destroyed in > > > > > > > spapr_cpu_core_realize(). > > > > > > > > > > > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > > > > > > spapr_cpu_core_realize(). > > > > > > > > > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > > > > > > > > > > Applied to ppc-for-3.0, since it definitely looks to fix some > > > > > > problems. > > > > > > > > > > Uh.. actually it has a definite bug - the first exit point will call > > > > > g_free() on an uninitialized spapr_cpu. I fixed it up with a NULL > > > > > initialization in my tree. > > > > > > > > Ah... as said in the cover letter, all the series is based on machine_data > > > > being set before the call to object_property_set_bool()... Maybe I should > > > > have made that explicit with a preparatory patch... Sorry. > > > > > > Ah, that makes sense. > > > > > > So, I ended up having to rework a little differently, after I yanked > > > by intc -> machine_data patch because it broke things for clg. I > > > think I've fixed it up correctly now - if you can check the latest > > > ppc-for-3.0 I pushed out, that would be great. > > > > > > > I'll do this ASAP. > > Oops, I've just spotted a nit in my original patch, that causes > QEMU to crash if threads > 1... but I had only tested with single > threaded cores :) > > > +err_unrealize: > > + while (--j >= 0) { > > + spapr_unrealize_vcpu(sc->threads[i]); > ^^^ > should be j Ah, yes. I've fixed that up in my tree. > > Appart from that, it looks good.
On Fri, 15 Jun 2018 22:32:44 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > On Fri, Jun 15, 2018 at 10:01:47AM +0200, Greg Kurz wrote: > > On Fri, 15 Jun 2018 09:07:24 +0200 > > Greg Kurz <groug@kaod.org> wrote: > > > > > On Fri, 15 Jun 2018 16:29:15 +1000 > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > On Fri, Jun 15, 2018 at 07:58:05AM +0200, Greg Kurz wrote: > > > > > On Fri, 15 Jun 2018 10:14:31 +1000 > > > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > > > On Fri, Jun 15, 2018 at 10:02:25AM +1000, David Gibson wrote: > > > > > > > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > > > > > > > The spapr_realize_vcpu() function doesn't rollback in case of error. > > > > > > > > This isn't a problem with coldplugged CPUs because the machine won't > > > > > > > > start and QEMU will exit. Hotplug is a different story though: the > > > > > > > > CPU thread is started under object_property_set_bool() and it assumes > > > > > > > > it can access the CPU object. > > > > > > > > > > > > > > > > If icp_create() fails, we return an error without unregistering the > > > > > > > > reset handler for this CPU, and we let the underlying QEMU thread for > > > > > > > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > > > > > > > > already realized CPUs either, but happily frees all of them anyway, the > > > > > > > > CPU thread crashes instantly: > > > > > > > > > > > > > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > > > > > > > GKU: failing icp_create (cpu 0x11497fd0) > > > > > > > > ^^^^^^^^^^ > > > > > > > > Program received signal SIGSEGV, Segmentation fault. > > > > > > > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > > > > > > > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > > > > > > > > ^^^^^^^^^^^^^^ > > > > > > > > pointer to the CPU object > > > > > > > > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > > > > > > > > (gdb) p obj->class->type > > > > > > > > $1 = (Type) 0x0 > > > > > > > > (gdb) p * obj > > > > > > > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > > > > > > > ^^^^^^^^^^ > > > > > > > > should be g_free > > > > > > > > (gdb) p g_free > > > > > > > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > > > > > > > > > > > > > obj is a dangling pointer to the CPU that was just destroyed in > > > > > > > > spapr_cpu_core_realize(). > > > > > > > > > > > > > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > > > > > > > spapr_cpu_core_realize(). > > > > > > > > > > > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > > > > > > > > > > > > Applied to ppc-for-3.0, since it definitely looks to fix some > > > > > > > problems. > > > > > > > > > > > > Uh.. actually it has a definite bug - the first exit point will call > > > > > > g_free() on an uninitialized spapr_cpu. I fixed it up with a NULL > > > > > > initialization in my tree. > > > > > > > > > > Ah... as said in the cover letter, all the series is based on machine_data > > > > > being set before the call to object_property_set_bool()... Maybe I should > > > > > have made that explicit with a preparatory patch... Sorry. > > > > > > > > Ah, that makes sense. > > > > > > > > So, I ended up having to rework a little differently, after I yanked > > > > by intc -> machine_data patch because it broke things for clg. I > > > > think I've fixed it up correctly now - if you can check the latest > > > > ppc-for-3.0 I pushed out, that would be great. > > > > > > > > > > I'll do this ASAP. > > > > Oops, I've just spotted a nit in my original patch, that causes > > QEMU to crash if threads > 1... but I had only tested with single > > threaded cores :) > > > > > > +err_unrealize: > > > + while (--j >= 0) { > > > + spapr_unrealize_vcpu(sc->threads[i]); > > ^^^ > > should be j > > Ah, yes. I've fixed that up in my tree. > + spapr_unrealize_vcpu(sc->threads[j); Almost fixed ;) > > > > > Appart from that, it looks good. > > >
On Fri, Jun 15, 2018 at 03:24:18PM +0200, Greg Kurz wrote: > On Fri, 15 Jun 2018 22:32:44 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Fri, Jun 15, 2018 at 10:01:47AM +0200, Greg Kurz wrote: > > > On Fri, 15 Jun 2018 09:07:24 +0200 > > > Greg Kurz <groug@kaod.org> wrote: > > > > > > > On Fri, 15 Jun 2018 16:29:15 +1000 > > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > On Fri, Jun 15, 2018 at 07:58:05AM +0200, Greg Kurz wrote: > > > > > > On Fri, 15 Jun 2018 10:14:31 +1000 > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > > > > > On Fri, Jun 15, 2018 at 10:02:25AM +1000, David Gibson wrote: > > > > > > > > On Thu, Jun 14, 2018 at 11:50:42PM +0200, Greg Kurz wrote: > > > > > > > > > The spapr_realize_vcpu() function doesn't rollback in case of error. > > > > > > > > > This isn't a problem with coldplugged CPUs because the machine won't > > > > > > > > > start and QEMU will exit. Hotplug is a different story though: the > > > > > > > > > CPU thread is started under object_property_set_bool() and it assumes > > > > > > > > > it can access the CPU object. > > > > > > > > > > > > > > > > > > If icp_create() fails, we return an error without unregistering the > > > > > > > > > reset handler for this CPU, and we let the underlying QEMU thread for > > > > > > > > > this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize > > > > > > > > > already realized CPUs either, but happily frees all of them anyway, the > > > > > > > > > CPU thread crashes instantly: > > > > > > > > > > > > > > > > > > (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku > > > > > > > > > GKU: failing icp_create (cpu 0x11497fd0) > > > > > > > > > ^^^^^^^^^^ > > > > > > > > > Program received signal SIGSEGV, Segmentation fault. > > > > > > > > > [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] > > > > > > > > > 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, > > > > > > > > > ^^^^^^^^^^^^^^ > > > > > > > > > pointer to the CPU object > > > > > > > > > 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name > > > > > > > > > (gdb) p obj->class->type > > > > > > > > > $1 = (Type) 0x0 > > > > > > > > > (gdb) p * obj > > > > > > > > > $2 = {class = 0x10ea9c10, free = 0x11244620, > > > > > > > > > ^^^^^^^^^^ > > > > > > > > > should be g_free > > > > > > > > > (gdb) p g_free > > > > > > > > > $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> > > > > > > > > > > > > > > > > > > obj is a dangling pointer to the CPU that was just destroyed in > > > > > > > > > spapr_cpu_core_realize(). > > > > > > > > > > > > > > > > > > This patch adds proper rollback to both spapr_realize_vcpu() and > > > > > > > > > spapr_cpu_core_realize(). > > > > > > > > > > > > > > > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > > > > > > > > > > > > > > > Applied to ppc-for-3.0, since it definitely looks to fix some > > > > > > > > problems. > > > > > > > > > > > > > > Uh.. actually it has a definite bug - the first exit point will call > > > > > > > g_free() on an uninitialized spapr_cpu. I fixed it up with a NULL > > > > > > > initialization in my tree. > > > > > > > > > > > > Ah... as said in the cover letter, all the series is based on machine_data > > > > > > being set before the call to object_property_set_bool()... Maybe I should > > > > > > have made that explicit with a preparatory patch... Sorry. > > > > > > > > > > Ah, that makes sense. > > > > > > > > > > So, I ended up having to rework a little differently, after I yanked > > > > > by intc -> machine_data patch because it broke things for clg. I > > > > > think I've fixed it up correctly now - if you can check the latest > > > > > ppc-for-3.0 I pushed out, that would be great. > > > > > > > > > > > > > I'll do this ASAP. > > > > > > Oops, I've just spotted a nit in my original patch, that causes > > > QEMU to crash if threads > 1... but I had only tested with single > > > threaded cores :) > > > > > > > > > +err_unrealize: > > > > + while (--j >= 0) { > > > > + spapr_unrealize_vcpu(sc->threads[i]); > > > ^^^ > > > should be j > > > > Ah, yes. I've fixed that up in my tree. > > > > + spapr_unrealize_vcpu(sc->threads[j); > > Almost fixed ;) Oops, fixed now. > > > > > > > > > Appart from that, it looks good. > > > > > > >
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 003c4c5a79d2..04c818a6ecac 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -159,12 +159,16 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr, spapr_cpu->icp = icp_create(OBJECT(cpu), spapr->icp_type, XICS_FABRIC(spapr), &local_err); if (local_err) { - goto error; + goto error_unregister; } return; +error_unregister: + qemu_unregister_reset(spapr_cpu_reset, cpu); + cpu_remove_sync(CPU(cpu)); error: + g_free(spapr_cpu); error_propagate(errp, local_err); } @@ -222,11 +226,15 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp) for (j = 0; j < cc->nr_threads; j++) { spapr_realize_vcpu(sc->threads[j], spapr, &local_err); if (local_err) { - goto err; + goto err_unrealize; } } return; +err_unrealize: + while (--j >= 0) { + spapr_unrealize_vcpu(sc->threads[i]); + } err: while (--i >= 0) { obj = OBJECT(sc->threads[i]);
The spapr_realize_vcpu() function doesn't rollback in case of error. This isn't a problem with coldplugged CPUs because the machine won't start and QEMU will exit. Hotplug is a different story though: the CPU thread is started under object_property_set_bool() and it assumes it can access the CPU object. If icp_create() fails, we return an error without unregistering the reset handler for this CPU, and we let the underlying QEMU thread for this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize already realized CPUs either, but happily frees all of them anyway, the CPU thread crashes instantly: (qemu) device_add host-spapr-cpu-core,core-id=1,id=gku GKU: failing icp_create (cpu 0x11497fd0) ^^^^^^^^^^ Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffee3feaa0 (LWP 24725)] 0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0, ^^^^^^^^^^^^^^ pointer to the CPU object 623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name (gdb) p obj->class->type $1 = (Type) 0x0 (gdb) p * obj $2 = {class = 0x10ea9c10, free = 0x11244620, ^^^^^^^^^^ should be g_free (gdb) p g_free $3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free> obj is a dangling pointer to the CPU that was just destroyed in spapr_cpu_core_realize(). This patch adds proper rollback to both spapr_realize_vcpu() and spapr_cpu_core_realize(). Signed-off-by: Greg Kurz <groug@kaod.org> --- hw/ppc/spapr_cpu_core.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)