From patchwork Wed Jul 12 15:30:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 9837043 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CD42160363 for ; Wed, 12 Jul 2017 15:33:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BD9AF27FB3 for ; Wed, 12 Jul 2017 15:33:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AFD5228511; Wed, 12 Jul 2017 15:33:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 05EFC27FB3 for ; Wed, 12 Jul 2017 15:33:01 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dVJbJ-0005ep-KC; Wed, 12 Jul 2017 15:30:53 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dVJbI-0005ei-Uy for xen-devel@lists.xenproject.org; Wed, 12 Jul 2017 15:30:53 +0000 Received: from [85.158.139.211] by server-17.bemta-5.messagelabs.com id AB/C7-01735-CA046695; Wed, 12 Jul 2017 15:30:52 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpnkeJIrShJLcpLzFFi42KZxPDOTXe1Q1q kwa2V2hbft0xmcmD0OPzhCksAYxRrZl5SfkUCa0bf7hdsBVM0K6bc28HSwPhesYuRi0NI4A6j xP7VfUxdjBwcLAI6Egt6NboYOTnYBLQk7m79ygRiiwgoS3xs7WUHqWcWOMYkcfxLJyNIQljAU +LF3InsIDangI3E0bXLwRp4Bcwk7r3tYIFYcJBRonnhT7CEqIC2xMtf/1kgigQlTs58AmYzA2 17+OsWywRGnllIUrOQpBYwMq1i1ChOLSpLLdI1MtVLKspMzyjJTczM0TU0MNXLTS0uTkxPzUl MKtZLzs/dxAgMlHoGBsYdjLva/Q4xSnIwKYnyXv6TGinEl5SfUpmRWJwRX1Sak1p8iFGGg0NJ gneTfVqkkGBRanpqRVpmDjBkYdISHDxKIry3TYDSvMUFibnFmekQqVOMuhwbVq//wiTEkpefl yolzrsRZIYASFFGaR7cCFj8XGKUlRLmZWRgYBDiKUgtys0sQZV/xSjOwagkzLsPZApPZl4J3K ZXQEcwAR2xJjsF5IiSRISUVAOjVyoXj9Z85U8bDrdEbd/UpsIscDDhwQ+TC42CrV3clf3JYTO Ds+Zv0p/KqZi9eEXHTJPGO8JGbJPKhRUYxRkOlC+Se3NQ4HagRANXbMnmIKXkW3mzze398sWq Q21dn/NHvFk3YeNDkSYnd4PpTqdXWO+aGxV7756bwwqVH6ov96yqm8rQtV6JpTgj0VCLuag4E QCbFoIymgIAAA== X-Env-Sender: tglx@linutronix.de X-Msg-Ref: server-5.tower-206.messagelabs.com!1499873451!100966668!1 X-Originating-IP: [146.0.238.70] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.4.25; banners=-,-,- X-VirusChecked: Checked Received: (qmail 45000 invoked from network); 12 Jul 2017 15:30:51 -0000 Received: from galois.linutronix.de (HELO Galois.linutronix.de) (146.0.238.70) by server-5.tower-206.messagelabs.com with DHE-RSA-AES128-SHA encrypted SMTP; 12 Jul 2017 15:30:51 -0000 Received: from localhost ([127.0.0.1]) by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1dVJb6-0004ai-9E; Wed, 12 Jul 2017 17:30:40 +0200 Date: Wed, 12 Jul 2017 17:30:41 +0200 (CEST) From: Thomas Gleixner To: Juergen Gross In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Cc: Marc Zyngier , jeffy.chen@rock-chips.com, Linux Kernel Mailing List , Peter Zijlstra , xen-devel , Boris Ostrovsky Subject: Re: [Xen-devel] Problem with commit bf22ff45bed664aefb5c4e43029057a199b7070c X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 10 Jul 2017, Juergen Gross wrote: > On 07/07/17 19:11, Thomas Gleixner wrote: > > On Fri, 7 Jul 2017, Thomas Gleixner wrote: > > > >> On Fri, 7 Jul 2017, Juergen Gross wrote: > >> > >>> Commit bf22ff45bed664aefb5c4e43029057a199b7070c ("genirq: Avoid > >>> unnecessary low level irq function calls") breaks Xen guest > >>> save/restore handling. > >>> > >>> The main problem are the PV devices using Xen event channels as > >>> interrupt sources which are represented as an "irq chip" in the kernel. > >>> When saving the guest the event channels are masked internally. At > >>> restore time event channels are re-established and unmasked via > >>> irq_startup(). > > > > And how exactly gets irq_startup() invoked on those event channels? > > [ 30.791879] Call Trace: > [ 30.791883] ? irq_get_irq_data+0xe/0x20 > [ 30.791886] enable_dynirq+0x23/0x30 > [ 30.791888] unmask_irq.part.33+0x26/0x40 > [ 30.791890] irq_enable+0x65/0x70 > [ 30.791891] irq_startup+0x3c/0x110 > [ 30.791893] __enable_irq+0x37/0x60 > [ 30.791895] resume_irqs+0xbe/0xe0 > [ 30.791897] irq_pm_syscore_resume+0x13/0x20 > [ 30.791900] syscore_resume+0x50/0x1b0 > [ 30.791902] xen_suspend+0x76/0x140 > > > > >>> I have a patch repairing the issue, but I'm not sure if this way to do > >>> it would be accepted. I have exported mask_irq() and I'm doing the > >>> masking now through this function. Would the attached patch be > >>> acceptable? Or is there a better way to solve the problem? > >> > >> Without looking at the patch (too lazy to fiddle with attachments right > >> now), this is definitely wrong. I'll have a look later tonight. > > > > Not that I'm surprised, but that patch is exactly what I expected. Export a > > random function, which helps to paper over the real problem and run away. > > These functions are internal for a reason and we worked hard on making > > people understand that fiddling with the internals of interrupts is a > > NONO. If there are special requirements for a good reason, then we create > > proper interfaces and infrastructure, if there is no good reason, then the > > problematic code needs to be fixed. There is no exception for XEN. > > I'm absolutely on your side here. That was the reason I didn't send > the patch right away, but asked how to solve my issue in a way which > isn't "quick and dirty". The patch was just the easiest way to explain > what should be the result of the proper solution. Fair enough! > > Can you please explain how that save/restore stuff works and which > > functions are involved? > > It is based on suspend/resume framework. The main work to be done > additionally is to disconnect from the pv-backends at save time and > connect to the pv-backends again at restore time. > > The main function triggering all that is xen_suspend() (as seen in > above backtrace). The untested patch below should give you hooks to do what you need to do. Add the irq_suspend/resume callbacks and set the IRQCHIP_GENERIC_SUSPEND flag on your xen irqchip, so it actually gets invoked. I have to make that opt in right now because the callbacks are used in the generic irqchip implementation already. We can revisit that when you can confirm that this is actually solving the problem. Thanks, tglx 8<---------------------- --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -476,6 +476,8 @@ struct irq_chip { * IRQCHIP_SKIP_SET_WAKE: Skip chip.irq_set_wake(), for this irq chip * IRQCHIP_ONESHOT_SAFE: One shot does not require mask/unmask * IRQCHIP_EOI_THREADED: Chip requires eoi() on unmask in threaded mode + * IRQCHIP_GENERIC_SUSPEND: Use the suspend/resume callbacks in + * device_irq_suspend/resume */ enum { IRQCHIP_SET_TYPE_MASKED = (1 << 0), @@ -485,6 +487,7 @@ enum { IRQCHIP_SKIP_SET_WAKE = (1 << 4), IRQCHIP_ONESHOT_SAFE = (1 << 5), IRQCHIP_EOI_THREADED = (1 << 6), + IRQCHIP_GENERIC_SUSPEND = (1 << 7), }; #include --- a/kernel/irq/pm.c +++ b/kernel/irq/pm.c @@ -70,6 +70,8 @@ void irq_pm_remove_action(struct irq_des static bool suspend_device_irq(struct irq_desc *desc) { + struct irq_chip *chip; + if (!desc->action || irq_desc_is_chained(desc) || desc->no_suspend_depth) return false; @@ -94,8 +96,13 @@ static bool suspend_device_irq(struct ir * chip level. The chip implementation indicates that with * IRQCHIP_MASK_ON_SUSPEND. */ - if (irq_desc_get_chip(desc)->flags & IRQCHIP_MASK_ON_SUSPEND) + chip = irq_desc_get_chip(desc); + if (chip->flags & IRQCHIP_MASK_ON_SUSPEND) mask_irq(desc); + + if ((chip->flags & IRQCHIP_GENERIC_SUSPEND) && chip->irq_suspend) + chip->irq_suspend(&desc->irq_data); + return true; } @@ -138,6 +145,8 @@ EXPORT_SYMBOL_GPL(suspend_device_irqs); static void resume_irq(struct irq_desc *desc) { + struct irq_chip *chip; + irqd_clear(&desc->irq_data, IRQD_WAKEUP_ARMED); if (desc->istate & IRQS_SUSPENDED) @@ -150,6 +159,10 @@ static void resume_irq(struct irq_desc * /* Pretend that it got disabled ! */ desc->depth++; resume: + chip = irq_desc_get_chip(desc); + if ((chip->flags & IRQCHIP_GENERIC_SUSPEND) && chip->irq_resume) + chip->irq_resume(&desc->irq_data); + desc->istate &= ~IRQS_SUSPENDED; __enable_irq(desc); }