diff mbox series

[RFC,v3,17/21] ACPI: add support to register CPUs based on the _STA enabled bit

Message ID E1rDOhC-00DvlI-Pp@rmk-PC.armlinux.org.uk (mailing list archive)
State New, archived
Headers show
Series ACPI/arm64: add support for virtual cpu hotplug | expand

Commit Message

Russell King (Oracle) Dec. 13, 2023, 12:50 p.m. UTC
From: James Morse <james.morse@arm.com>

acpi_processor_get_info() registers all present CPUs. Registering a
CPU is what creates the sysfs entries and triggers the udev
notifications.

arm64 virtual machines that support 'virtual cpu hotplug' use the
enabled bit to indicate whether the CPU can be brought online, as
the existing ACPI tables require all hardware to be described and
present.

If firmware describes a CPU as present, but disabled, skip the
registration. Such CPUs are present, but can't be brought online for
whatever reason. (e.g. firmware/hypervisor policy).

Once firmware sets the enabled bit, the CPU can be registered and
brought online by user-space. Online CPUs, or CPUs that are missing
an _STA method must always be registered.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/acpi/acpi_processor.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

Comments

Russell King (Oracle) Dec. 18, 2023, 1:03 p.m. UTC | #1
On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:
> From: James Morse <james.morse@arm.com>
> 
> acpi_processor_get_info() registers all present CPUs. Registering a
> CPU is what creates the sysfs entries and triggers the udev
> notifications.
> 
> arm64 virtual machines that support 'virtual cpu hotplug' use the
> enabled bit to indicate whether the CPU can be brought online, as
> the existing ACPI tables require all hardware to be described and
> present.
> 
> If firmware describes a CPU as present, but disabled, skip the
> registration. Such CPUs are present, but can't be brought online for
> whatever reason. (e.g. firmware/hypervisor policy).
> 
> Once firmware sets the enabled bit, the CPU can be registered and
> brought online by user-space. Online CPUs, or CPUs that are missing
> an _STA method must always be registered.

...

> @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
>  		acpi_processor_make_not_present(device);
>  		return;
>  	}
> +
> +	if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> +		arch_unregister_cpu(pr->id);

This change isn't described in the commit log, but seems to be the cause
of the build error identified by the kernel build bot that is fixed
later in this series. I'm wondering whether this should be in a
different patch, maybe "ACPI: Check _STA present bit before making CPUs
not present" ?

Or maybe my brain isn't working properly (due to being Covid positive.)
Any thoughts, Jonathan?
Jonathan Cameron Jan. 2, 2024, 2:53 p.m. UTC | #2
On Mon, 18 Dec 2023 13:03:32 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:
> > From: James Morse <james.morse@arm.com>
> > 
> > acpi_processor_get_info() registers all present CPUs. Registering a
> > CPU is what creates the sysfs entries and triggers the udev
> > notifications.
> > 
> > arm64 virtual machines that support 'virtual cpu hotplug' use the
> > enabled bit to indicate whether the CPU can be brought online, as
> > the existing ACPI tables require all hardware to be described and
> > present.
> > 
> > If firmware describes a CPU as present, but disabled, skip the
> > registration. Such CPUs are present, but can't be brought online for
> > whatever reason. (e.g. firmware/hypervisor policy).
> > 
> > Once firmware sets the enabled bit, the CPU can be registered and
> > brought online by user-space. Online CPUs, or CPUs that are missing
> > an _STA method must always be registered.  
> 
> ...
> 
> > @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> >  		acpi_processor_make_not_present(device);
> >  		return;
> >  	}
> > +
> > +	if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> > +		arch_unregister_cpu(pr->id);  
> 
> This change isn't described in the commit log, but seems to be the cause
> of the build error identified by the kernel build bot that is fixed
> later in this series. I'm wondering whether this should be in a
> different patch, maybe "ACPI: Check _STA present bit before making CPUs
> not present" ?

Would seem a bit odd to call arch_unregister_cpu() way before the code
is added to call the matching arch_registers_cpu()

Mind you this eject doesn't just apply to those CPUs that are registered
later I think, but instead to all.  So we run into the spec hole that
there is no way to identify initially 'enabled' CPUs that might be disabled
later.

> 
> Or maybe my brain isn't working properly (due to being Covid positive.)
> Any thoughts, Jonathan?

I'll go with a resounding 'not sure' on where this change belongs.
I blame my non existent start of the year hangover.
Hope you have recovered!

Jonathan
Jonathan Cameron Jan. 23, 2024, 10:26 a.m. UTC | #3
On Tue, 2 Jan 2024 14:53:20 +0000
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Mon, 18 Dec 2023 13:03:32 +0000
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> > On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:  
> > > From: James Morse <james.morse@arm.com>
> > > 
> > > acpi_processor_get_info() registers all present CPUs. Registering a
> > > CPU is what creates the sysfs entries and triggers the udev
> > > notifications.
> > > 
> > > arm64 virtual machines that support 'virtual cpu hotplug' use the
> > > enabled bit to indicate whether the CPU can be brought online, as
> > > the existing ACPI tables require all hardware to be described and
> > > present.
> > > 
> > > If firmware describes a CPU as present, but disabled, skip the
> > > registration. Such CPUs are present, but can't be brought online for
> > > whatever reason. (e.g. firmware/hypervisor policy).
> > > 
> > > Once firmware sets the enabled bit, the CPU can be registered and
> > > brought online by user-space. Online CPUs, or CPUs that are missing
> > > an _STA method must always be registered.    
> > 
> > ...
> >   
> > > @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> > >  		acpi_processor_make_not_present(device);
> > >  		return;
> > >  	}
> > > +
> > > +	if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> > > +		arch_unregister_cpu(pr->id);    
> > 
> > This change isn't described in the commit log, but seems to be the cause
> > of the build error identified by the kernel build bot that is fixed
> > later in this series. I'm wondering whether this should be in a
> > different patch, maybe "ACPI: Check _STA present bit before making CPUs
> > not present" ?  
> 
> Would seem a bit odd to call arch_unregister_cpu() way before the code
> is added to call the matching arch_registers_cpu()
> 
> Mind you this eject doesn't just apply to those CPUs that are registered
> later I think, but instead to all.  So we run into the spec hole that
> there is no way to identify initially 'enabled' CPUs that might be disabled
> later.
> 
> > 
> > Or maybe my brain isn't working properly (due to being Covid positive.)
> > Any thoughts, Jonathan?  
> 
> I'll go with a resounding 'not sure' on where this change belongs.
> I blame my non existent start of the year hangover.
> Hope you have recovered!

Looking again, I think you were right, move it to that earlier patch.

J
> 
> Jonathan
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Russell King (Oracle) Jan. 23, 2024, 1:10 p.m. UTC | #4
On Tue, Jan 23, 2024 at 10:26:03AM +0000, Jonathan Cameron wrote:
> On Tue, 2 Jan 2024 14:53:20 +0000
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> > On Mon, 18 Dec 2023 13:03:32 +0000
> > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > 
> > > On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:  
> > > > From: James Morse <james.morse@arm.com>
> > > > 
> > > > acpi_processor_get_info() registers all present CPUs. Registering a
> > > > CPU is what creates the sysfs entries and triggers the udev
> > > > notifications.
> > > > 
> > > > arm64 virtual machines that support 'virtual cpu hotplug' use the
> > > > enabled bit to indicate whether the CPU can be brought online, as
> > > > the existing ACPI tables require all hardware to be described and
> > > > present.
> > > > 
> > > > If firmware describes a CPU as present, but disabled, skip the
> > > > registration. Such CPUs are present, but can't be brought online for
> > > > whatever reason. (e.g. firmware/hypervisor policy).
> > > > 
> > > > Once firmware sets the enabled bit, the CPU can be registered and
> > > > brought online by user-space. Online CPUs, or CPUs that are missing
> > > > an _STA method must always be registered.    
> > > 
> > > ...
> > >   
> > > > @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> > > >  		acpi_processor_make_not_present(device);
> > > >  		return;
> > > >  	}
> > > > +
> > > > +	if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> > > > +		arch_unregister_cpu(pr->id);    
> > > 
> > > This change isn't described in the commit log, but seems to be the cause
> > > of the build error identified by the kernel build bot that is fixed
> > > later in this series. I'm wondering whether this should be in a
> > > different patch, maybe "ACPI: Check _STA present bit before making CPUs
> > > not present" ?  
> > 
> > Would seem a bit odd to call arch_unregister_cpu() way before the code
> > is added to call the matching arch_registers_cpu()
> > 
> > Mind you this eject doesn't just apply to those CPUs that are registered
> > later I think, but instead to all.  So we run into the spec hole that
> > there is no way to identify initially 'enabled' CPUs that might be disabled
> > later.
> > 
> > > 
> > > Or maybe my brain isn't working properly (due to being Covid positive.)
> > > Any thoughts, Jonathan?  
> > 
> > I'll go with a resounding 'not sure' on where this change belongs.
> > I blame my non existent start of the year hangover.
> > Hope you have recovered!
> 
> Looking again, I think you were right, move it to that earlier patch.

I'm having second thoughts - because this patch introduces the
arch_register_cpu() into the acpi_processor_add() path (via
acpi_processor_get_info() and acpi_processor_make_enabled(), so isn't
it also correct to add arch_unregister_cpu() to the detach/post_eject
path as well? If we add one without the other, doesn't stuff become
a bit asymetric?

Looking more deeply at these changes, I'm finding it isn't easy to
keep track of everything that's going on here.

We had attach()/detach() callbacks that were nice and symetrical.
How we have this post_eject() callback that makes things asymetrical.

We have the attach() method that registers the CPU, but no detach
method, instead having the post_eject() method. On the face of it,
arch_unregister_cpu() doesn't look symetric unless one goes digging
more in the code - by that, I mean arch_register_cpu() only gets
called of present=1 _and_ enabled=1. However, arch_unregister_cpu()
gets called buried in acpi_processor_make_not_present(), called when
present=0, and then we have this new one to handle the case where
enabled=0. It is not obvious that arch_unregister_cpu() is the reverse
of what happens with arch_register_cpu() here.

Then we have the add() method allocating pr->throttling.shared_cpu_map,
and acpi_processor_make_not_present() freeing it. From what I read in
ACPI v6.5, enabled is not allowed to be set without present. So, if
_STA reports that a CPU that had present=1 enabled=1, but then is
later reported to be enabled=0 (which we handle by calling only
arch_unregister_cpu()) then what happens when _STA changes to
enabled=1 later? Does add() get called? If it does, this would cause
a new acpi_processor structure to be allocated and the old one to be
leaked... I hope I'm wrong about add() being called - but if it isn't,
how does enabled going from 0->1 get handled... and if we are handling
its 1->0 transition separately from present, then surely we should be
handling that.

Maybe I'm just getting confused, but I've spent much of this morning
trying to unravel all this... and I'm of the opinion that this isn't a
sign of a good approach.
Jonathan Cameron Jan. 23, 2024, 2:22 p.m. UTC | #5
On Tue, 23 Jan 2024 13:10:44 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Tue, Jan 23, 2024 at 10:26:03AM +0000, Jonathan Cameron wrote:
> > On Tue, 2 Jan 2024 14:53:20 +0000
> > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> >   
> > > On Mon, 18 Dec 2023 13:03:32 +0000
> > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > >   
> > > > On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:    
> > > > > From: James Morse <james.morse@arm.com>
> > > > > 
> > > > > acpi_processor_get_info() registers all present CPUs. Registering a
> > > > > CPU is what creates the sysfs entries and triggers the udev
> > > > > notifications.
> > > > > 
> > > > > arm64 virtual machines that support 'virtual cpu hotplug' use the
> > > > > enabled bit to indicate whether the CPU can be brought online, as
> > > > > the existing ACPI tables require all hardware to be described and
> > > > > present.
> > > > > 
> > > > > If firmware describes a CPU as present, but disabled, skip the
> > > > > registration. Such CPUs are present, but can't be brought online for
> > > > > whatever reason. (e.g. firmware/hypervisor policy).
> > > > > 
> > > > > Once firmware sets the enabled bit, the CPU can be registered and
> > > > > brought online by user-space. Online CPUs, or CPUs that are missing
> > > > > an _STA method must always be registered.      
> > > > 
> > > > ...
> > > >     
> > > > > @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> > > > >  		acpi_processor_make_not_present(device);
> > > > >  		return;
> > > > >  	}
> > > > > +
> > > > > +	if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> > > > > +		arch_unregister_cpu(pr->id);      
> > > > 
> > > > This change isn't described in the commit log, but seems to be the cause
> > > > of the build error identified by the kernel build bot that is fixed
> > > > later in this series. I'm wondering whether this should be in a
> > > > different patch, maybe "ACPI: Check _STA present bit before making CPUs
> > > > not present" ?    
> > > 
> > > Would seem a bit odd to call arch_unregister_cpu() way before the code
> > > is added to call the matching arch_registers_cpu()
> > > 
> > > Mind you this eject doesn't just apply to those CPUs that are registered
> > > later I think, but instead to all.  So we run into the spec hole that
> > > there is no way to identify initially 'enabled' CPUs that might be disabled
> > > later.
> > >   
> > > > 
> > > > Or maybe my brain isn't working properly (due to being Covid positive.)
> > > > Any thoughts, Jonathan?    
> > > 
> > > I'll go with a resounding 'not sure' on where this change belongs.
> > > I blame my non existent start of the year hangover.
> > > Hope you have recovered!  
> > 
> > Looking again, I think you were right, move it to that earlier patch.  
> 
> I'm having second thoughts - because this patch introduces the
> arch_register_cpu() into the acpi_processor_add() path (via
> acpi_processor_get_info() and acpi_processor_make_enabled(), so isn't
> it also correct to add arch_unregister_cpu() to the detach/post_eject
> path as well? If we add one without the other, doesn't stuff become
> a bit asymetric?
> 
> Looking more deeply at these changes, I'm finding it isn't easy to
> keep track of everything that's going on here.

I can sympathize.

> 
> We had attach()/detach() callbacks that were nice and symetrical.
> How we have this post_eject() callback that makes things asymetrical.
> 
> We have the attach() method that registers the CPU, but no detach
> method, instead having the post_eject() method. On the face of it,
> arch_unregister_cpu() doesn't look symetric unless one goes digging
> more in the code - by that, I mean arch_register_cpu() only gets
> called of present=1 _and_ enabled=1. However, arch_unregister_cpu()
> gets called buried in acpi_processor_make_not_present(), called when
> present=0, and then we have this new one to handle the case where
> enabled=0. It is not obvious that arch_unregister_cpu() is the reverse
> of what happens with arch_register_cpu() here.

One option would be to pull the arch_unregister_cpu() out so it
happens in one place in both the present = 0 and enabled = 0 cases but
I'm not sure if it's safe to reorder the contents of 
acpi_processor_not_present() as it's followed by a bunch of things.

Would looks something like

if (cpu_present(pr->id)) {
	if (!(sta & ACPI_STA_DEVICE_PRESENT)) {
		acpi_processor_make_not_present(device); /* Remove arch_cpu_unregister() */
	} else if (!(sta & ACPI_STA_DEVICE_ENABLED)) {
		/* Nothing to do in this case */
	} else {
		return; /* Firmware did something silly - probably racing */
	}
	arch_unregister_cpu(pr->id);

	return;
}

> 
> Then we have the add() method allocating pr->throttling.shared_cpu_map,
> and acpi_processor_make_not_present() freeing it. From what I read in
> ACPI v6.5, enabled is not allowed to be set without present. So, if
> _STA reports that a CPU that had present=1 enabled=1, but then is
> later reported to be enabled=0 (which we handle by calling only
> arch_unregister_cpu()) then what happens when _STA changes to
> enabled=1 later? Does add() get called? 

yes it does (I poked it to see) which indeed isn't good (unless I've
broken my setup in some obscure way).

Seems we need a few more things than arch_unregister_cpu() pulled out
in the above code.


> If it does, this would cause
> a new acpi_processor structure to be allocated and the old one to be
> leaked... I hope I'm wrong about add() being called - but if it isn't,
> how does enabled going from 0->1 get handled... and if we are handling
> its 1->0 transition separately from present, then surely we should be
> handling that.
> 
> Maybe I'm just getting confused, but I've spent much of this morning
> trying to unravel all this... and I'm of the opinion that this isn't a
> sign of a good approach.

It's all annoyingly messy at the root of things, but indeed you've found
some issues in current implementation.  Feels like just ripping out
a bunch of stuff from acpi_processor_make_not_present() and calling it
for both paths will probably work, but I've not tested that yet.

Jonathan
>
Russell King (Oracle) Jan. 23, 2024, 2:59 p.m. UTC | #6
On Tue, Jan 23, 2024 at 02:22:18PM +0000, Jonathan Cameron wrote:
> On Tue, 23 Jan 2024 13:10:44 +0000
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> > On Tue, Jan 23, 2024 at 10:26:03AM +0000, Jonathan Cameron wrote:
> > > On Tue, 2 Jan 2024 14:53:20 +0000
> > > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> > >   
> > > > On Mon, 18 Dec 2023 13:03:32 +0000
> > > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > > >   
> > > > > On Wed, Dec 13, 2023 at 12:50:38PM +0000, Russell King wrote:    
> > > > > > From: James Morse <james.morse@arm.com>
> > > > > > 
> > > > > > acpi_processor_get_info() registers all present CPUs. Registering a
> > > > > > CPU is what creates the sysfs entries and triggers the udev
> > > > > > notifications.
> > > > > > 
> > > > > > arm64 virtual machines that support 'virtual cpu hotplug' use the
> > > > > > enabled bit to indicate whether the CPU can be brought online, as
> > > > > > the existing ACPI tables require all hardware to be described and
> > > > > > present.
> > > > > > 
> > > > > > If firmware describes a CPU as present, but disabled, skip the
> > > > > > registration. Such CPUs are present, but can't be brought online for
> > > > > > whatever reason. (e.g. firmware/hypervisor policy).
> > > > > > 
> > > > > > Once firmware sets the enabled bit, the CPU can be registered and
> > > > > > brought online by user-space. Online CPUs, or CPUs that are missing
> > > > > > an _STA method must always be registered.      
> > > > > 
> > > > > ...
> > > > >     
> > > > > > @@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
> > > > > >  		acpi_processor_make_not_present(device);
> > > > > >  		return;
> > > > > >  	}
> > > > > > +
> > > > > > +	if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
> > > > > > +		arch_unregister_cpu(pr->id);      
> > > > > 
> > > > > This change isn't described in the commit log, but seems to be the cause
> > > > > of the build error identified by the kernel build bot that is fixed
> > > > > later in this series. I'm wondering whether this should be in a
> > > > > different patch, maybe "ACPI: Check _STA present bit before making CPUs
> > > > > not present" ?    
> > > > 
> > > > Would seem a bit odd to call arch_unregister_cpu() way before the code
> > > > is added to call the matching arch_registers_cpu()
> > > > 
> > > > Mind you this eject doesn't just apply to those CPUs that are registered
> > > > later I think, but instead to all.  So we run into the spec hole that
> > > > there is no way to identify initially 'enabled' CPUs that might be disabled
> > > > later.
> > > >   
> > > > > 
> > > > > Or maybe my brain isn't working properly (due to being Covid positive.)
> > > > > Any thoughts, Jonathan?    
> > > > 
> > > > I'll go with a resounding 'not sure' on where this change belongs.
> > > > I blame my non existent start of the year hangover.
> > > > Hope you have recovered!  
> > > 
> > > Looking again, I think you were right, move it to that earlier patch.  
> > 
> > I'm having second thoughts - because this patch introduces the
> > arch_register_cpu() into the acpi_processor_add() path (via
> > acpi_processor_get_info() and acpi_processor_make_enabled(), so isn't
> > it also correct to add arch_unregister_cpu() to the detach/post_eject
> > path as well? If we add one without the other, doesn't stuff become
> > a bit asymetric?
> > 
> > Looking more deeply at these changes, I'm finding it isn't easy to
> > keep track of everything that's going on here.
> 
> I can sympathize.
> 
> > 
> > We had attach()/detach() callbacks that were nice and symetrical.
> > How we have this post_eject() callback that makes things asymetrical.
> > 
> > We have the attach() method that registers the CPU, but no detach
> > method, instead having the post_eject() method. On the face of it,
> > arch_unregister_cpu() doesn't look symetric unless one goes digging
> > more in the code - by that, I mean arch_register_cpu() only gets
> > called of present=1 _and_ enabled=1. However, arch_unregister_cpu()
> > gets called buried in acpi_processor_make_not_present(), called when
> > present=0, and then we have this new one to handle the case where
> > enabled=0. It is not obvious that arch_unregister_cpu() is the reverse
> > of what happens with arch_register_cpu() here.
> 
> One option would be to pull the arch_unregister_cpu() out so it
> happens in one place in both the present = 0 and enabled = 0 cases but
> I'm not sure if it's safe to reorder the contents of 
> acpi_processor_not_present() as it's followed by a bunch of things.
> 
> Would looks something like
> 
> if (cpu_present(pr->id)) {
> 	if (!(sta & ACPI_STA_DEVICE_PRESENT)) {
> 		acpi_processor_make_not_present(device); /* Remove arch_cpu_unregister() */
> 	} else if (!(sta & ACPI_STA_DEVICE_ENABLED)) {
> 		/* Nothing to do in this case */
> 	} else {
> 		return; /* Firmware did something silly - probably racing */
> 	}
> 	arch_unregister_cpu(pr->id);
> 
> 	return;
> }
> 
> > 
> > Then we have the add() method allocating pr->throttling.shared_cpu_map,
> > and acpi_processor_make_not_present() freeing it. From what I read in
> > ACPI v6.5, enabled is not allowed to be set without present. So, if
> > _STA reports that a CPU that had present=1 enabled=1, but then is
> > later reported to be enabled=0 (which we handle by calling only
> > arch_unregister_cpu()) then what happens when _STA changes to
> > enabled=1 later? Does add() get called? 
> 
> yes it does (I poked it to see) which indeed isn't good (unless I've
> broken my setup in some obscure way).

Thanks for confirming - I haven't had a chance to do any testing (late
lunch because of spending so long looking at this...)

> Seems we need a few more things than arch_unregister_cpu() pulled out
> in the above code.

Yes, and I also wonder whether we should be doing any of that
unconditionally...

> > If it does, this would cause
> > a new acpi_processor structure to be allocated and the old one to be
> > leaked... I hope I'm wrong about add() being called - but if it isn't,
> > how does enabled going from 0->1 get handled... and if we are handling
> > its 1->0 transition separately from present, then surely we should be
> > handling that.
> > 
> > Maybe I'm just getting confused, but I've spent much of this morning
> > trying to unravel all this... and I'm of the opinion that this isn't a
> > sign of a good approach.
> 
> It's all annoyingly messy at the root of things, but indeed you've found
> some issues in current implementation.  Feels like just ripping out
> a bunch of stuff from acpi_processor_make_not_present() and calling it
> for both paths will probably work, but I've not tested that yet.

... since surely if we've already got to the point of issuing a
post_eject() callback, the device has already been ejected
and thus has gone - and if it is ever "replaced" we will get an
attach() callback.

Moreover, looking at acpi_scan_hot_remove(), if we are the device
being ejected, then after ej0 is evaluated, _STA is checked, and
acpi_bus_post_eject() called only if enabled=0. (This will also
end up calling post_eject() for any children as well which won't
have their _STA evaluated.)

So this has got me wondering whether acpi_processor_post_eject()
should be doing all the cleanup in acpi_processor_make_not_present()
except if we believe the call is in error (e.g.
!ACPI_HOTPLUG_PRESENT_CPU and present=0) - thus preparing the way
for a future attach() callback.

Hmm. I wonder if Rafael has any input on this.
diff mbox series

Patch

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index b7a94c1348b0..5dabb426481f 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -228,6 +228,32 @@  static int acpi_processor_make_present(struct acpi_processor *pr)
 	return ret;
 }
 
+static int acpi_processor_make_enabled(struct acpi_processor *pr)
+{
+	unsigned long long sta;
+	acpi_status status;
+	bool present, enabled;
+
+	if (!acpi_has_method(pr->handle, "_STA"))
+		return arch_register_cpu(pr->id);
+
+	status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	present = sta & ACPI_STA_DEVICE_PRESENT;
+	enabled = sta & ACPI_STA_DEVICE_ENABLED;
+
+	if (cpu_online(pr->id) && (!present || !enabled)) {
+		pr_err_once(FW_BUG "CPU %u is online, but described as not present or disabled!\n", pr->id);
+		add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
+	} else if (!present || !enabled) {
+		return -ENODEV;
+	}
+
+	return arch_register_cpu(pr->id);
+}
+
 static int acpi_processor_get_info(struct acpi_device *device)
 {
 	union acpi_object object = { 0 };
@@ -318,7 +344,7 @@  static int acpi_processor_get_info(struct acpi_device *device)
 	 */
 	if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
 	    !get_cpu_device(pr->id)) {
-		int ret = arch_register_cpu(pr->id);
+		int ret = acpi_processor_make_enabled(pr);
 
 		if (ret)
 			return ret;
@@ -526,6 +552,9 @@  static void acpi_processor_post_eject(struct acpi_device *device)
 		acpi_processor_make_not_present(device);
 		return;
 	}
+
+	if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
+		arch_unregister_cpu(pr->id);
 }
 
 #ifdef CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC