diff mbox

[v4,3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined

Message ID 1397488277-14865-4-git-send-email-imammedo@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Igor Mammedov April 14, 2014, 3:11 p.m. UTC
acpi_processor_add() assumes that present at boot CPUs
are always onlined, it is not so if a CPU failed to become
onlined. As result acpi_processor_add() will mark such CPU
device as onlined in sysfs and following attempts to
online/offline it using /sys/device/system/cpu/cpuX/online
attribute will fail.

Do not poke into device internals in acpi_processor_add()
and touch "struct device { .offline }" attribute, since
for CPUs onlined at boot it's set by:
  topology_init() -> arch_register_cpu() -> register_cpu()
before ACPI device tree is parsed, and for hotplugged
CPUs it's set when userspace onlines CPU via sysfs.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v2:
 - fix regression in v1 leading to NULL pointer dereference
   on CPU unplug, do not remove "pr->dev = dev;"
---
 drivers/acpi/acpi_processor.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

Comments

Rafael J. Wysocki April 15, 2014, 5:48 a.m. UTC | #1
On Monday, April 14, 2014 05:11:15 PM Igor Mammedov wrote:
> acpi_processor_add() assumes that present at boot CPUs
> are always onlined, it is not so if a CPU failed to become
> onlined. As result acpi_processor_add() will mark such CPU
> device as onlined in sysfs and following attempts to
> online/offline it using /sys/device/system/cpu/cpuX/online
> attribute will fail.
> 
> Do not poke into device internals in acpi_processor_add()
> and touch "struct device { .offline }" attribute, since
> for CPUs onlined at boot it's set by:
>   topology_init() -> arch_register_cpu() -> register_cpu()
> before ACPI device tree is parsed, and for hotplugged
> CPUs it's set when userspace onlines CPU via sysfs.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v2:
>  - fix regression in v1 leading to NULL pointer dereference
>    on CPU unplug, do not remove "pr->dev = dev;"

Yeah.

Does this patch depend on any other patches in the series?

I don't think so, but just asking.

If it doesn't, why is it part of this series at all?

> ---
>  drivers/acpi/acpi_processor.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index c29c2c3..42d66f8 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
>  		goto err;
>  
>  	pr->dev = dev;
> -	dev->offline = pr->flags.need_hotplug_init;
>  
>  	/* Trigger the processor driver's .probe() if present. */
>  	if (device_attach(dev) >= 0)
>
Rafael J. Wysocki April 15, 2014, 5:53 a.m. UTC | #2
On Monday, April 14, 2014 05:11:15 PM Igor Mammedov wrote:
> acpi_processor_add() assumes that present at boot CPUs
> are always onlined, it is not so if a CPU failed to become
> onlined. As result acpi_processor_add() will mark such CPU
> device as onlined in sysfs and following attempts to
> online/offline it using /sys/device/system/cpu/cpuX/online
> attribute will fail.
> 
> Do not poke into device internals in acpi_processor_add()
> and touch "struct device { .offline }" attribute, since
> for CPUs onlined at boot it's set by:
>   topology_init() -> arch_register_cpu() -> register_cpu()
> before ACPI device tree is parsed, and for hotplugged
> CPUs it's set when userspace onlines CPU via sysfs.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v2:
>  - fix regression in v1 leading to NULL pointer dereference
>    on CPU unplug, do not remove "pr->dev = dev;"
> ---
>  drivers/acpi/acpi_processor.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index c29c2c3..42d66f8 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
>  		goto err;
>  
>  	pr->dev = dev;
> -	dev->offline = pr->flags.need_hotplug_init;

This line is to ensure that dev->offline and pr->flags.need_hotplug_init are
consistent with each other.  If you remove it, you need to ensure that they
will be consistent in some other way.

>  
>  	/* Trigger the processor driver's .probe() if present. */
>  	if (device_attach(dev) >= 0)
>
Igor Mammedov April 15, 2014, 6 a.m. UTC | #3
On Tue, 15 Apr 2014 07:48:30 +0200
"Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:

> On Monday, April 14, 2014 05:11:15 PM Igor Mammedov wrote:
> > acpi_processor_add() assumes that present at boot CPUs
> > are always onlined, it is not so if a CPU failed to become
> > onlined. As result acpi_processor_add() will mark such CPU
> > device as onlined in sysfs and following attempts to
> > online/offline it using /sys/device/system/cpu/cpuX/online
> > attribute will fail.
> > 
> > Do not poke into device internals in acpi_processor_add()
> > and touch "struct device { .offline }" attribute, since
> > for CPUs onlined at boot it's set by:
> >   topology_init() -> arch_register_cpu() -> register_cpu()
> > before ACPI device tree is parsed, and for hotplugged
> > CPUs it's set when userspace onlines CPU via sysfs.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v2:
> >  - fix regression in v1 leading to NULL pointer dereference
> >    on CPU unplug, do not remove "pr->dev = dev;"
> 
> Yeah.
> 
> Does this patch depend on any other patches in the series?
> 
> I don't think so, but just asking.
> 
> If it doesn't, why is it part of this series at all?
It's doesn't depend on any other patches in here, it was just
convenient to post it as a part of fixes found in CPU hotplug
code and nothing more.

> 
> > ---
> >  drivers/acpi/acpi_processor.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index c29c2c3..42d66f8 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
> >  		goto err;
> >  
> >  	pr->dev = dev;
> > -	dev->offline = pr->flags.need_hotplug_init;
> >  
> >  	/* Trigger the processor driver's .probe() if present. */
> >  	if (device_attach(dev) >= 0)
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar April 15, 2014, 6:04 a.m. UTC | #4
* Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

> On Monday, April 14, 2014 05:11:15 PM Igor Mammedov wrote:
> > acpi_processor_add() assumes that present at boot CPUs
> > are always onlined, it is not so if a CPU failed to become
> > onlined. As result acpi_processor_add() will mark such CPU
> > device as onlined in sysfs and following attempts to
> > online/offline it using /sys/device/system/cpu/cpuX/online
> > attribute will fail.
> > 
> > Do not poke into device internals in acpi_processor_add()
> > and touch "struct device { .offline }" attribute, since
> > for CPUs onlined at boot it's set by:
> >   topology_init() -> arch_register_cpu() -> register_cpu()
> > before ACPI device tree is parsed, and for hotplugged
> > CPUs it's set when userspace onlines CPU via sysfs.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v2:
> >  - fix regression in v1 leading to NULL pointer dereference
> >    on CPU unplug, do not remove "pr->dev = dev;"
> 
> Yeah.
> 
> Does this patch depend on any other patches in the series?
> 
> I don't think so, but just asking.
> 
> If it doesn't, why is it part of this series at all?

I suspect because Igor was rigorously stress-testing CPU hotplug, and 
was fixing all the bugs he saw, before adding the one feature he is 
interested in.

The feature cannot be guaranteed to be correct, without having a 
stable base to work on.

As such this series makes sense, as long as the fixes precede the 
feature, and as long as the fixes are correct.

Consider it work in progress, with you being one of the reviewers who 
makes sure the fixes are correct.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki April 15, 2014, 3:48 p.m. UTC | #5
On Tuesday, April 15, 2014 08:04:11 AM Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> 
> > On Monday, April 14, 2014 05:11:15 PM Igor Mammedov wrote:
> > > acpi_processor_add() assumes that present at boot CPUs
> > > are always onlined, it is not so if a CPU failed to become
> > > onlined. As result acpi_processor_add() will mark such CPU
> > > device as onlined in sysfs and following attempts to
> > > online/offline it using /sys/device/system/cpu/cpuX/online
> > > attribute will fail.
> > > 
> > > Do not poke into device internals in acpi_processor_add()
> > > and touch "struct device { .offline }" attribute, since
> > > for CPUs onlined at boot it's set by:
> > >   topology_init() -> arch_register_cpu() -> register_cpu()
> > > before ACPI device tree is parsed, and for hotplugged
> > > CPUs it's set when userspace onlines CPU via sysfs.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > > v2:
> > >  - fix regression in v1 leading to NULL pointer dereference
> > >    on CPU unplug, do not remove "pr->dev = dev;"
> > 
> > Yeah.
> > 
> > Does this patch depend on any other patches in the series?
> > 
> > I don't think so, but just asking.
> > 
> > If it doesn't, why is it part of this series at all?
> 
> I suspect because Igor was rigorously stress-testing CPU hotplug, and 
> was fixing all the bugs he saw, before adding the one feature he is 
> interested in.
> 
> The feature cannot be guaranteed to be correct, without having a 
> stable base to work on.
> 
> As such this series makes sense, as long as the fixes precede the 
> feature, and as long as the fixes are correct.
> 
> Consider it work in progress, with you being one of the reviewers who 
> makes sure the fixes are correct.

Fair enough.

So perhaps the subject of the whole series should be changed, because x86 is
not the only architecture affected by this particular patch?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani April 30, 2014, 9:25 p.m. UTC | #6
On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> acpi_processor_add() assumes that present at boot CPUs
> are always onlined, it is not so if a CPU failed to become
> onlined. As result acpi_processor_add() will mark such CPU
> device as onlined in sysfs and following attempts to
> online/offline it using /sys/device/system/cpu/cpuX/online
> attribute will fail.
> 
> Do not poke into device internals in acpi_processor_add()
> and touch "struct device { .offline }" attribute, since
> for CPUs onlined at boot it's set by:
>   topology_init() -> arch_register_cpu() -> register_cpu()
> before ACPI device tree is parsed, and for hotplugged
> CPUs it's set when userspace onlines CPU via sysfs.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v2:
>  - fix regression in v1 leading to NULL pointer dereference
>    on CPU unplug, do not remove "pr->dev = dev;"
> ---
>  drivers/acpi/acpi_processor.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index c29c2c3..42d66f8 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
>  		goto err;
>  
>  	pr->dev = dev;
> -	dev->offline = pr->flags.need_hotplug_init;

IIRC, this change was necessary to handle the case when maxcpus=X is
specified at boot.  In this case, excessive CPU's dev->offline needs to
be set to 1.  Can you verify this?

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Igor Mammedov May 2, 2014, 11:32 a.m. UTC | #7
On Wed, 30 Apr 2014 15:25:51 -0600
Toshi Kani <toshi.kani@hp.com> wrote:

> On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> > acpi_processor_add() assumes that present at boot CPUs
> > are always onlined, it is not so if a CPU failed to become
> > onlined. As result acpi_processor_add() will mark such CPU
> > device as onlined in sysfs and following attempts to
> > online/offline it using /sys/device/system/cpu/cpuX/online
> > attribute will fail.
> > 
> > Do not poke into device internals in acpi_processor_add()
> > and touch "struct device { .offline }" attribute, since
> > for CPUs onlined at boot it's set by:
> >   topology_init() -> arch_register_cpu() -> register_cpu()
> > before ACPI device tree is parsed, and for hotplugged
> > CPUs it's set when userspace onlines CPU via sysfs.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v2:
> >  - fix regression in v1 leading to NULL pointer dereference
> >    on CPU unplug, do not remove "pr->dev = dev;"
> > ---
> >  drivers/acpi/acpi_processor.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index c29c2c3..42d66f8 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c:q
> > @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
> >  		goto err;
> >  
> >  	pr->dev = dev;
> > -	dev->offline = pr->flags.need_hotplug_init;
> 
> IIRC, this change was necessary to handle the case when maxcpus=X is
> specified at boot.  In this case, excessive CPU's dev->offline needs to
> be set to 1.  Can you verify this?
Option 'maxcpus' works just fine without and with this patch since a bit
earlier in acpi_processor_add() it exits in case of extra present CPUs:

#ifdef CONFIG_SMP
        if (pr->id >= setup_max_cpus && pr->id != 0)
                return 0;
#endif

and execution doesn't get to the point the patch touches.

The point is that acpi_processor_add() shouldn't touch
dev->offline at all and allow register_cpu() handle it.

> 
> Thanks,
> -Toshi
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Toshi Kani May 2, 2014, 5:23 p.m. UTC | #8
On Fri, 2014-05-02 at 13:32 +0200, Igor Mammedov wrote:
> On Wed, 30 Apr 2014 15:25:51 -0600
> Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > On Mon, 2014-04-14 at 17:11 +0200, Igor Mammedov wrote:
> > > acpi_processor_add() assumes that present at boot CPUs
> > > are always onlined, it is not so if a CPU failed to become
> > > onlined. As result acpi_processor_add() will mark such CPU
> > > device as onlined in sysfs and following attempts to
> > > online/offline it using /sys/device/system/cpu/cpuX/online
> > > attribute will fail.
> > > 
> > > Do not poke into device internals in acpi_processor_add()
> > > and touch "struct device { .offline }" attribute, since
> > > for CPUs onlined at boot it's set by:
> > >   topology_init() -> arch_register_cpu() -> register_cpu()
> > > before ACPI device tree is parsed, and for hotplugged
> > > CPUs it's set when userspace onlines CPU via sysfs.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > > v2:
> > >  - fix regression in v1 leading to NULL pointer dereference
> > >    on CPU unplug, do not remove "pr->dev = dev;"
> > > ---
> > >  drivers/acpi/acpi_processor.c |    1 -
> > >  1 files changed, 0 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > index c29c2c3..42d66f8 100644
> > > --- a/drivers/acpi/acpi_processor.c
> > > +++ b/drivers/acpi/acpi_processor.c:q
> > > @@ -404,7 +404,6 @@ static int acpi_processor_add(struct acpi_device *device,
> > >  		goto err;
> > >  
> > >  	pr->dev = dev;
> > > -	dev->offline = pr->flags.need_hotplug_init;
> > 
> > IIRC, this change was necessary to handle the case when maxcpus=X is
> > specified at boot.  In this case, excessive CPU's dev->offline needs to
> > be set to 1.  Can you verify this?
> Option 'maxcpus' works just fine without and with this patch since a bit
> earlier in acpi_processor_add() it exits in case of extra present CPUs:
> 
> #ifdef CONFIG_SMP
>         if (pr->id >= setup_max_cpus && pr->id != 0)
>                 return 0;
> #endif
> 
> and execution doesn't get to the point the patch touches.

This is a separate topic, but I feel that the above code should not be
necessary...

> The point is that acpi_processor_add() shouldn't touch
> dev->offline at all and allow register_cpu() handle it.

Sorry, I had confused with cpu->dev.offline in register_cpu() in my
recollection.  Yes, I agree that we should let register_cpu() to handle
it.

Acked-by: Toshi Kani <toshi.kani@hp.com>

Thanks,
-Toshi





--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c29c2c3..42d66f8 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -404,7 +404,6 @@  static int acpi_processor_add(struct acpi_device *device,
 		goto err;
 
 	pr->dev = dev;
-	dev->offline = pr->flags.need_hotplug_init;
 
 	/* Trigger the processor driver's .probe() if present. */
 	if (device_attach(dev) >= 0)