[RFC,v3,07/12] genirq: Shutdown irq chips in suspend/resume during hibernation
diff mbox series

Message ID e782c510916c8c05dc95ace151aba4eced207b31.1581721799.git.anchalag@amazon.com
State RFC, archived
Headers show
Series
  • Enable PM hibernation on guest VMs
Related show

Commit Message

Anchal Agarwal Feb. 14, 2020, 11:25 p.m. UTC
There are no pm handlers for the legacy devices, so during tear down
stale event channel <> IRQ mapping may still remain in the image and
resume may fail. To avoid adding much code by implementing handlers for
legacy devices, add a new irq_chip flag IRQCHIP_SHUTDOWN_ON_SUSPEND which
when enabled on an irq-chip e.g xen-pirq, it will let core suspend/resume
irq code to shutdown and restart the active irqs. PM suspend/hibernation
code will rely on this.
Without this, in PM hibernation, information about the event channel
remains in hibernation image, but there is no guarantee that the same
event channel numbers are assigned to the devices when restoring the
system. This may cause conflict like the following and prevent some
devices from being restored correctly.

Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/xen/events/events_base.c |  1 +
 include/linux/irq.h              |  2 ++
 kernel/irq/chip.c                |  2 +-
 kernel/irq/internals.h           |  1 +
 kernel/irq/pm.c                  | 31 ++++++++++++++++++++++---------
 5 files changed, 27 insertions(+), 10 deletions(-)

Comments

Thomas Gleixner March 6, 2020, 11:03 p.m. UTC | #1
Anchal Agarwal <anchalag@amazon.com> writes:

> There are no pm handlers for the legacy devices, so during tear down
> stale event channel <> IRQ mapping may still remain in the image and
> resume may fail. To avoid adding much code by implementing handlers for
> legacy devices, add a new irq_chip flag IRQCHIP_SHUTDOWN_ON_SUSPEND which
> when enabled on an irq-chip e.g xen-pirq, it will let core suspend/resume
> irq code to shutdown and restart the active irqs. PM suspend/hibernation
> code will rely on this.
> Without this, in PM hibernation, information about the event channel
> remains in hibernation image, but there is no guarantee that the same
> event channel numbers are assigned to the devices when restoring the
> system. This may cause conflict like the following and prevent some
> devices from being restored correctly.

The above is just an agglomeration of words and acronyms and some of
these sentences do not even make sense. Anyone who is not aware of event
channels and whatever XENisms you talk about will be entirely
confused. Changelogs really need to be understandable for mere mortals
and there is no space restriction so acronyms can be written out.

Something like this:

  Many legacy device drivers do not implement power management (PM)
  functions which means that interrupts requested by these drivers stay
  in active state when the kernel is hibernated.

  This does not matter on bare metal and on most hypervisors because the
  interrupt is restored on resume without any noticable side effects as
  it stays connected to the same physical or virtual interrupt line.

  The XEN interrupt mechanism is different as it maintains a mapping
  between the Linux interrupt number and a XEN event channel. If the
  interrupt stays active on hibernation this mapping is preserved but
  there is unfortunately no guarantee that on resume the same event
  channels are reassigned to these devices. This can result in event
  channel conflicts which prevent the affected devices from being
  restored correctly.

  One way to solve this would be to add the necessary power management
  functions to all affected legacy device drivers, but that's a
  questionable effort which does not provide any benefits on non-XEN
  environments.

  The least intrusive and most efficient solution is to provide a
  mechanism which allows the core interrupt code to tear down these
  interrupts on hibernation and bring them back up again on resume. This
  allows the XEN event channel mechanism to assign an arbitrary event
  channel on resume without affecting the functionality of these
  devices.
  
  Fortunately all these device interrupts are handled by a dedicated XEN
  interrupt chip so the chip can be marked that all interrupts connected
  to it are handled this way. This is pretty much in line with the other
  interrupt chip specific quirks, e.g. IRQCHIP_MASK_ON_SUSPEND.

  Add a new quirk flag IRQCHIP_SHUTDOWN_ON_SUSPEND and add support for
  it the core interrupt suspend/resume paths.

Hmm?

> Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>

Not that I care much, but now that I've written both the patch and the
changelog you might change that attribution slightly. For completeness
sake:

 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Thanks,

        tglx
Anchal Agarwal March 9, 2020, 10:37 p.m. UTC | #2
On Sat, Mar 07, 2020 at 12:03:52AM +0100, Thomas Gleixner wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Anchal Agarwal <anchalag@amazon.com> writes:
> 
> > There are no pm handlers for the legacy devices, so during tear down
> > stale event channel <> IRQ mapping may still remain in the image and
> > resume may fail. To avoid adding much code by implementing handlers for
> > legacy devices, add a new irq_chip flag IRQCHIP_SHUTDOWN_ON_SUSPEND which
> > when enabled on an irq-chip e.g xen-pirq, it will let core suspend/resume
> > irq code to shutdown and restart the active irqs. PM suspend/hibernation
> > code will rely on this.
> > Without this, in PM hibernation, information about the event channel
> > remains in hibernation image, but there is no guarantee that the same
> > event channel numbers are assigned to the devices when restoring the
> > system. This may cause conflict like the following and prevent some
> > devices from being restored correctly.
> 
> The above is just an agglomeration of words and acronyms and some of
> these sentences do not even make sense. Anyone who is not aware of event
> channels and whatever XENisms you talk about will be entirely
> confused. Changelogs really need to be understandable for mere mortals
> and there is no space restriction so acronyms can be written out.
> 
I don't understand what does not makes sense here. Of course the one you
described is more elaborate and explanatory and I agree I just wrote a short 
one from perspective of PM hibernation related to Xen domU. 
All I explained was why teardown is needed, what is the solution and 
what will happen if we do not clear those mappings. 
> Something like this:
> 
>   Many legacy device drivers do not implement power management (PM)
>   functions which means that interrupts requested by these drivers stay
>   in active state when the kernel is hibernated.
> 
>   This does not matter on bare metal and on most hypervisors because the
>   interrupt is restored on resume without any noticable side effects as
>   it stays connected to the same physical or virtual interrupt line.
> 
>   The XEN interrupt mechanism is different as it maintains a mapping
>   between the Linux interrupt number and a XEN event channel. If the
>   interrupt stays active on hibernation this mapping is preserved but
>   there is unfortunately no guarantee that on resume the same event
>   channels are reassigned to these devices. This can result in event
>   channel conflicts which prevent the affected devices from being
>   restored correctly.
> 
>   One way to solve this would be to add the necessary power management
>   functions to all affected legacy device drivers, but that's a
>   questionable effort which does not provide any benefits on non-XEN
>   environments.
> 
>   The least intrusive and most efficient solution is to provide a
>   mechanism which allows the core interrupt code to tear down these
>   interrupts on hibernation and bring them back up again on resume. This
>   allows the XEN event channel mechanism to assign an arbitrary event
>   channel on resume without affecting the functionality of these
>   devices.
> 
>   Fortunately all these device interrupts are handled by a dedicated XEN
>   interrupt chip so the chip can be marked that all interrupts connected
>   to it are handled this way. This is pretty much in line with the other
>   interrupt chip specific quirks, e.g. IRQCHIP_MASK_ON_SUSPEND.
> 
>   Add a new quirk flag IRQCHIP_SHUTDOWN_ON_SUSPEND and add support for
>   it the core interrupt suspend/resume paths.
> 
> Hmm?
> 
Sure.
> > Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> 
> Not that I care much, but now that I've written both the patch and the
> changelog you might change that attribution slightly. For completeness
> sake:
> 
Why not. That's mandated now :)
>  Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> Thanks,
> 
>         tglx
Thanks,
Anchal

Patch
diff mbox series

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 6c8843968a52..e44f27b45bef 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1620,6 +1620,7 @@  static struct irq_chip xen_pirq_chip __read_mostly = {
 	.irq_set_affinity	= set_affinity_irq,
 
 	.irq_retrigger		= retrigger_dynirq,
+	.flags                  = IRQCHIP_SHUTDOWN_ON_SUSPEND,
 };
 
 static struct irq_chip xen_percpu_chip __read_mostly = {
diff --git a/include/linux/irq.h b/include/linux/irq.h
index fb301cf29148..2873a579fd9d 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -511,6 +511,7 @@  struct irq_chip {
  * IRQCHIP_EOI_THREADED:	Chip requires eoi() on unmask in threaded mode
  * IRQCHIP_SUPPORTS_LEVEL_MSI	Chip can provide two doorbells for Level MSIs
  * IRQCHIP_SUPPORTS_NMI:	Chip can deliver NMIs, only for root irqchips
+ * IRQCHIP_SHUTDOWN_ON_SUSPEND: Shutdown non wake irqs in the suspend path
  */
 enum {
 	IRQCHIP_SET_TYPE_MASKED		= (1 <<  0),
@@ -522,6 +523,7 @@  enum {
 	IRQCHIP_EOI_THREADED		= (1 <<  6),
 	IRQCHIP_SUPPORTS_LEVEL_MSI	= (1 <<  7),
 	IRQCHIP_SUPPORTS_NMI		= (1 <<  8),
+	IRQCHIP_SHUTDOWN_ON_SUSPEND     = (1 <<  9),
 };
 
 #include <linux/irqdesc.h>
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b76703b2c0af..a1e8df5193ba 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -233,7 +233,7 @@  __irq_startup_managed(struct irq_desc *desc, struct cpumask *aff, bool force)
 }
 #endif
 
-static int __irq_startup(struct irq_desc *desc)
+int __irq_startup(struct irq_desc *desc)
 {
 	struct irq_data *d = irq_desc_get_irq_data(desc);
 	int ret = 0;
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 3924fbe829d4..11c7c55bda63 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -80,6 +80,7 @@  extern void __enable_irq(struct irq_desc *desc);
 extern int irq_activate(struct irq_desc *desc);
 extern int irq_activate_and_startup(struct irq_desc *desc, bool resend);
 extern int irq_startup(struct irq_desc *desc, bool resend, bool force);
+extern int __irq_startup(struct irq_desc *desc);
 
 extern void irq_shutdown(struct irq_desc *desc);
 extern void irq_shutdown_and_deactivate(struct irq_desc *desc);
diff --git a/kernel/irq/pm.c b/kernel/irq/pm.c
index 8f557fa1f4fe..dc48a25f1756 100644
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -85,16 +85,25 @@  static bool suspend_device_irq(struct irq_desc *desc)
 	}
 
 	desc->istate |= IRQS_SUSPENDED;
-	__disable_irq(desc);
-
 	/*
-	 * Hardware which has no wakeup source configuration facility
-	 * requires that the non wakeup interrupts are masked at the
-	 * chip level. The chip implementation indicates that with
-	 * IRQCHIP_MASK_ON_SUSPEND.
+	 * Some irq chips (e.g. XEN PIRQ) require a full shutdown on suspend
+	 * as some of the legacy drivers(e.g. floppy) do nothing during the
+	 * suspend path
 	 */
-	if (irq_desc_get_chip(desc)->flags & IRQCHIP_MASK_ON_SUSPEND)
-		mask_irq(desc);
+	if (irq_desc_get_chip(desc)->flags & IRQCHIP_SHUTDOWN_ON_SUSPEND) {
+		irq_shutdown(desc);
+	} else {
+		__disable_irq(desc);
+
+	       /*
+		* Hardware which has no wakeup source configuration facility
+		* requires that the non wakeup interrupts are masked at the
+		* chip level. The chip implementation indicates that with
+		* IRQCHIP_MASK_ON_SUSPEND.
+		*/
+		if (irq_desc_get_chip(desc)->flags & IRQCHIP_MASK_ON_SUSPEND)
+			mask_irq(desc);
+	}
 	return true;
 }
 
@@ -152,7 +161,11 @@  static void resume_irq(struct irq_desc *desc)
 	irq_state_set_masked(desc);
 resume:
 	desc->istate &= ~IRQS_SUSPENDED;
-	__enable_irq(desc);
+
+	if (irq_desc_get_chip(desc)->flags & IRQCHIP_SHUTDOWN_ON_SUSPEND)
+		__irq_startup(desc);
+	else
+		__enable_irq(desc);
 }
 
 static void resume_irqs(bool want_early)