diff mbox

ACPI: bind workqueues to CPU 0 to avoid SMI corruption

Message ID 20090729215425.23674.80263.stgit@bob.kio (mailing list archive)
State Accepted
Headers show

Commit Message

Bjorn Helgaas July 29, 2009, 9:54 p.m. UTC
On some machines, a software-initiated SMI causes corruption unless the
SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically it's
done in GPE-related methods that are run via workqueues, so we can avoid
the known corruption cases by binding the workqueues to CPU 0.

References:
    http://bugzilla.kernel.org/show_bug.cgi?id=13751
    https://bugs.launchpad.net/bugs/157171
    https://bugs.launchpad.net/bugs/157691

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---
 drivers/acpi/osl.c |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Zhang, Rui July 30, 2009, 12:59 a.m. UTC | #1
On Thu, 2009-07-30 at 05:54 +0800, Bjorn Helgaas wrote:
> On some machines, a software-initiated SMI causes corruption unless the
> SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically it's
> done in GPE-related methods that are run via workqueues, so we can avoid
> the known corruption cases by binding the workqueues to CPU 0.
> 
> References:
>     http://bugzilla.kernel.org/show_bug.cgi?id=13751
>     https://bugs.launchpad.net/bugs/157171
>     https://bugs.launchpad.net/bugs/157691
> 
> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>

Acked-by: Zhang Rui <rui.zhang@intel.com>

> ---
>  drivers/acpi/osl.c |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 7167071..5691f16 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -189,11 +189,36 @@ acpi_status __init acpi_os_initialize(void)
>  	return AE_OK;
>  }
>  
> +static void bind_to_cpu0(struct work_struct *work)
> +{
> +	set_cpus_allowed(current, cpumask_of_cpu(0));
> +	kfree(work);
> +}
> +
> +static void bind_workqueue(struct workqueue_struct *wq)
> +{
> +	struct work_struct *work;
> +
> +	work = kzalloc(sizeof(struct work_struct), GFP_KERNEL);
> +	INIT_WORK(work, bind_to_cpu0);
> +	queue_work(wq, work);
> +}
> +
>  acpi_status acpi_os_initialize1(void)
>  {
> +	/*
> +	 * On some machines, a software-initiated SMI causes corruption unless
> +	 * the SMI runs on CPU 0.  An SMI can be initiated by any AML, but
> +	 * typically it's done in GPE-related methods that are run via
> +	 * workqueues, so we can avoid the known corruption cases by binding
> +	 * the workqueues to CPU 0.
> +	 */
>  	kacpid_wq = create_singlethread_workqueue("kacpid");
> +	bind_workqueue(kacpid_wq);
>  	kacpi_notify_wq = create_singlethread_workqueue("kacpi_notify");
> +	bind_workqueue(kacpi_notify_wq);
>  	kacpi_hotplug_wq = create_singlethread_workqueue("kacpi_hotplug");
> +	bind_workqueue(kacpi_hotplug_wq);
>  	BUG_ON(!kacpid_wq);
>  	BUG_ON(!kacpi_notify_wq);
>  	BUG_ON(!kacpi_hotplug_wq);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shaohua Li July 30, 2009, 2:43 a.m. UTC | #2
On Thu, Jul 30, 2009 at 05:54:25AM +0800, Bjorn Helgaas wrote:
> On some machines, a software-initiated SMI causes corruption unless the
> SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically it's
> done in GPE-related methods that are run via workqueues, so we can avoid
> the known corruption cases by binding the workqueues to CPU 0.
> 
> References:
>     http://bugzilla.kernel.org/show_bug.cgi?id=13751
>     https://bugs.launchpad.net/bugs/157171
>     https://bugs.launchpad.net/bugs/157691
Good job! Since any AML code can invoke a SMI, I wonder if all ACPICA should be
limited to run on CPU 0?
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett July 30, 2009, 2:55 a.m. UTC | #3
On Thu, Jul 30, 2009 at 10:43:00AM +0800, Shaohua Li wrote:
> On Thu, Jul 30, 2009 at 05:54:25AM +0800, Bjorn Helgaas wrote:
> > On some machines, a software-initiated SMI causes corruption unless the
> > SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically it's
> > done in GPE-related methods that are run via workqueues, so we can avoid
> > the known corruption cases by binding the workqueues to CPU 0.
> > 
> > References:
> >     http://bugzilla.kernel.org/show_bug.cgi?id=13751
> >     https://bugs.launchpad.net/bugs/157171
> >     https://bugs.launchpad.net/bugs/157691
> Good job! Since any AML code can invoke a SMI, I wonder if all ACPICA should be
> limited to run on CPU 0?

If ACPI is a performance bottleneck then we have other problems, so I 
suspect that we could live with that. We'd probably want to be able to 
disable it at runtime for the small number of users who have 
"interesting" performance requirements, but falling on the side of 
safety over slightly reduced latency under some circumstances seems fair 
to me. It'd be interesting to see if this helps with any of the other 
SMI-related hangs we've seen.
Shaohua Li July 30, 2009, 3:13 a.m. UTC | #4
On Thu, Jul 30, 2009 at 10:55:54AM +0800, Matthew Garrett wrote:
> On Thu, Jul 30, 2009 at 10:43:00AM +0800, Shaohua Li wrote:
> > On Thu, Jul 30, 2009 at 05:54:25AM +0800, Bjorn Helgaas wrote:
> > > On some machines, a software-initiated SMI causes corruption unless the
> > > SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically it's
> > > done in GPE-related methods that are run via workqueues, so we can avoid
> > > the known corruption cases by binding the workqueues to CPU 0.
> > > 
> > > References:
> > >     http://bugzilla.kernel.org/show_bug.cgi?id=13751
> > >     https://bugs.launchpad.net/bugs/157171
> > >     https://bugs.launchpad.net/bugs/157691
> > Good job! Since any AML code can invoke a SMI, I wonder if all ACPICA should be
> > limited to run on CPU 0?
> 
> If ACPI is a performance bottleneck then we have other problems, so I 
> suspect that we could live with that. We'd probably want to be able to 
> disable it at runtime for the small number of users who have 
> "interesting" performance requirements, but falling on the side of 
> safety over slightly reduced latency under some circumstances seems fair 
> to me. It'd be interesting to see if this helps with any of the other 
> SMI-related hangs we've seeni.
ACPICA isn't designed for performance. If it has performance issue, it should
already have.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett July 30, 2009, 3:17 a.m. UTC | #5
On Thu, Jul 30, 2009 at 11:13:48AM +0800, Shaohua Li wrote:

> ACPICA isn't designed for performance. If it has performance issue, it should
> already have.

Yeah. My point was just that we have some customers who like tuning 
systems heavily - I suspect they'd prefer to be able to control whether 
or not ACPI is running entirely on cpu 0 or not. As you say, it should 
make little difference in the real world but some people do have very 
specialised requirements.
Bjorn Helgaas July 30, 2009, 5:06 p.m. UTC | #6
On Wednesday 29 July 2009 08:43:00 pm Shaohua Li wrote:
> On Thu, Jul 30, 2009 at 05:54:25AM +0800, Bjorn Helgaas wrote:
> > On some machines, a software-initiated SMI causes corruption unless the
> > SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically it's
> > done in GPE-related methods that are run via workqueues, so we can avoid
> > the known corruption cases by binding the workqueues to CPU 0.
> > 
> > References:
> >     http://bugzilla.kernel.org/show_bug.cgi?id=13751
> >     https://bugs.launchpad.net/bugs/157171
> >     https://bugs.launchpad.net/bugs/157691
> Good job! Since any AML code can invoke a SMI, I wonder if all ACPICA should be
> limited to run on CPU 0?

I did look into doing that, but I didn't see an easy way to do it.

My first thought was that we could do a set_cpus_allowed() in
acpi_ex_enter_interpreter() and restore in acpi_ex_exit_interpreter().
But of course, those are ACPI CA functions, so to do it without an
ACPI CA change would mean some kind of hook in acpi_os_wait_semaphore(),
and there, we don't know *which* semaphore means "enter interpreter".

So I gave up for now.  But if somebody has a smarter idea, I agree
that it would be nice to at least have the option to run all AML on
CPU 0.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 31, 2009, 10:47 p.m. UTC | #7
On Wednesday 29 July 2009 06:59:59 pm Zhang Rui wrote:
> On Thu, 2009-07-30 at 05:54 +0800, Bjorn Helgaas wrote:
> > On some machines, a software-initiated SMI causes corruption unless the
> > SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically it's
> > done in GPE-related methods that are run via workqueues, so we can avoid
> > the known corruption cases by binding the workqueues to CPU 0.
> > 
> > References:
> >     http://bugzilla.kernel.org/show_bug.cgi?id=13751
> >     https://bugs.launchpad.net/bugs/157171
> >     https://bugs.launchpad.net/bugs/157691
> > 
> > Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
> 
> Acked-by: Zhang Rui <rui.zhang@intel.com>

In addition to the reports above, I think it's likely this patch
will fix the problems reported below:

  http://bugzilla.kernel.org/show_bug.cgi?id=13412
  http://bugzilla.kernel.org/show_bug.cgi?id=11259
  http://bugzilla.kernel.org/show_bug.cgi?id=12328
  http://bugzilla.kernel.org/show_bug.cgi?id=12106

I think we should consider this patch for 2.6.31.

(Rafael, 13751 is on your "2.6.29 -> 2.6.30" regression list.
I actually think it's been around much longer than that, but
there seem to be many things that affect whether it manifests.)

Bjorn

> > ---
> >  drivers/acpi/osl.c |   25 +++++++++++++++++++++++++
> >  1 files changed, 25 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> > index 7167071..5691f16 100644
> > --- a/drivers/acpi/osl.c
> > +++ b/drivers/acpi/osl.c
> > @@ -189,11 +189,36 @@ acpi_status __init acpi_os_initialize(void)
> >  	return AE_OK;
> >  }
> >  
> > +static void bind_to_cpu0(struct work_struct *work)
> > +{
> > +	set_cpus_allowed(current, cpumask_of_cpu(0));
> > +	kfree(work);
> > +}
> > +
> > +static void bind_workqueue(struct workqueue_struct *wq)
> > +{
> > +	struct work_struct *work;
> > +
> > +	work = kzalloc(sizeof(struct work_struct), GFP_KERNEL);
> > +	INIT_WORK(work, bind_to_cpu0);
> > +	queue_work(wq, work);
> > +}
> > +
> >  acpi_status acpi_os_initialize1(void)
> >  {
> > +	/*
> > +	 * On some machines, a software-initiated SMI causes corruption unless
> > +	 * the SMI runs on CPU 0.  An SMI can be initiated by any AML, but
> > +	 * typically it's done in GPE-related methods that are run via
> > +	 * workqueues, so we can avoid the known corruption cases by binding
> > +	 * the workqueues to CPU 0.
> > +	 */
> >  	kacpid_wq = create_singlethread_workqueue("kacpid");
> > +	bind_workqueue(kacpid_wq);
> >  	kacpi_notify_wq = create_singlethread_workqueue("kacpi_notify");
> > +	bind_workqueue(kacpi_notify_wq);
> >  	kacpi_hotplug_wq = create_singlethread_workqueue("kacpi_hotplug");
> > +	bind_workqueue(kacpi_hotplug_wq);
> >  	BUG_ON(!kacpid_wq);
> >  	BUG_ON(!kacpi_notify_wq);
> >  	BUG_ON(!kacpi_hotplug_wq);
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael Wysocki Aug. 1, 2009, 11:01 a.m. UTC | #8
On Saturday 01 August 2009, Bjorn Helgaas wrote:
> On Wednesday 29 July 2009 06:59:59 pm Zhang Rui wrote:
> > On Thu, 2009-07-30 at 05:54 +0800, Bjorn Helgaas wrote:
> > > On some machines, a software-initiated SMI causes corruption unless the
> > > SMI runs on CPU 0.  An SMI can be initiated by any AML, but typically it's
> > > done in GPE-related methods that are run via workqueues, so we can avoid
> > > the known corruption cases by binding the workqueues to CPU 0.
> > > 
> > > References:
> > >     http://bugzilla.kernel.org/show_bug.cgi?id=13751
> > >     https://bugs.launchpad.net/bugs/157171
> > >     https://bugs.launchpad.net/bugs/157691
> > > 
> > > Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
> > 
> > Acked-by: Zhang Rui <rui.zhang@intel.com>
> 
> In addition to the reports above, I think it's likely this patch
> will fix the problems reported below:
> 
>   http://bugzilla.kernel.org/show_bug.cgi?id=13412
>   http://bugzilla.kernel.org/show_bug.cgi?id=11259
>   http://bugzilla.kernel.org/show_bug.cgi?id=12328
>   http://bugzilla.kernel.org/show_bug.cgi?id=12106
> 
> I think we should consider this patch for 2.6.31.
> 
> (Rafael, 13751 is on your "2.6.29 -> 2.6.30" regression list.
> I actually think it's been around much longer than that, but
> there seem to be many things that affect whether it manifests.)

I've dropped it from the list, thanks.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 7167071..5691f16 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -189,11 +189,36 @@  acpi_status __init acpi_os_initialize(void)
 	return AE_OK;
 }
 
+static void bind_to_cpu0(struct work_struct *work)
+{
+	set_cpus_allowed(current, cpumask_of_cpu(0));
+	kfree(work);
+}
+
+static void bind_workqueue(struct workqueue_struct *wq)
+{
+	struct work_struct *work;
+
+	work = kzalloc(sizeof(struct work_struct), GFP_KERNEL);
+	INIT_WORK(work, bind_to_cpu0);
+	queue_work(wq, work);
+}
+
 acpi_status acpi_os_initialize1(void)
 {
+	/*
+	 * On some machines, a software-initiated SMI causes corruption unless
+	 * the SMI runs on CPU 0.  An SMI can be initiated by any AML, but
+	 * typically it's done in GPE-related methods that are run via
+	 * workqueues, so we can avoid the known corruption cases by binding
+	 * the workqueues to CPU 0.
+	 */
 	kacpid_wq = create_singlethread_workqueue("kacpid");
+	bind_workqueue(kacpid_wq);
 	kacpi_notify_wq = create_singlethread_workqueue("kacpi_notify");
+	bind_workqueue(kacpi_notify_wq);
 	kacpi_hotplug_wq = create_singlethread_workqueue("kacpi_hotplug");
+	bind_workqueue(kacpi_hotplug_wq);
 	BUG_ON(!kacpid_wq);
 	BUG_ON(!kacpi_notify_wq);
 	BUG_ON(!kacpi_hotplug_wq);