From patchwork Wed Dec 13 14:06:55 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 10109995 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7AC62602B3 for ; Wed, 13 Dec 2017 14:07:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 723B028863 for ; Wed, 13 Dec 2017 14:07:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 66C6028AE6; Wed, 13 Dec 2017 14:07:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3972628863 for ; Wed, 13 Dec 2017 14:07:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6A9656E50B; Wed, 13 Dec 2017 14:07:04 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a01:7a0:2:106d:700::1]) by gabe.freedesktop.org (Postfix) with ESMTPS id BD1256E50B for ; Wed, 13 Dec 2017 14:07:02 +0000 (UTC) Received: from hsi-kbw-5-158-153-52.hsi19.kabel-badenwuerttemberg.de ([5.158.153.52] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1eP7f1-00066b-QV; Wed, 13 Dec 2017 15:05:23 +0100 Date: Wed, 13 Dec 2017 15:06:55 +0100 (CET) From: Thomas Gleixner To: Takashi Iwai In-Reply-To: Message-ID: References: <20171208093323.2212-1-augustine.chen@intel.com> <20171208114404.GN10981@intel.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Cc: Juergen Gross , Dou Liyang , "alsa-devel@alsa-project.org" , "intel-gfx@lists.freedesktop.org" , "linux-kernel@vger.kernel.org" , Ingo Molnar , "H. Peter Anvin" , Jiang Liu , "Chen, Augustine" , "Bossart, Pierre-louis" Subject: Re: [Intel-gfx] [PATCH] drm/i915: Remove unused IRQ chip data of HDMI LPE audio X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP On Wed, 13 Dec 2017, Takashi Iwai wrote: > On Wed, 13 Dec 2017 12:35:54 +0100, > Thomas Gleixner wrote: > > > > > > On Mon, 11 Dec 2017, Anand, Jerome wrote: > > > > > > On Fri, 8 Dec 2017, Ville Syrjälä wrote: > > > > > > > > > > > > > On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote: > > > > > > > > The chip data of HDMI LPE audio is set to drm_i915_private which > > > > > > > > is not consistent with the expectation by x86 APIC driver. > > > > > > > > > > > > > > Hmm. Why is the apic code looking at data for an irq chip it > > > > > > > hasn't created? > > > > > > > > > > > > > > > > > apic code expects an irq domain to be place as generic approach. > > > > > > > > APIC code does not even see that interrupt at all. It's completely disconnected. > > > > > > > > > > That's the problem - APIC just converts the chip data to its internal > > > format and fails. > > > > How does APIC code end up to touch that interrupt at all? Call stack please. > > It's found in the bugzilla referred in the patch: > https://bugs.freedesktop.org/show_bug.cgi?id=103731 > > [ 87.353072] irq 298 idata->chip->name hdmi_lpe_audio_irqchip > [ 87.353072] irq 298 apic_chip_data > [ 87.353073] irq 298 data->domain is NULL > [ 87.353120] BUG: unable to handle kernel NULL pointer dereference at (null) > [ 87.353132] IP: setup_vector_irq+0x1ba/0x230 > [ 87.353133] PGD 0 > > If my understanding is correct, it happens only with 4.14 and earlier > kernels where __setup_vector_irq() loops over the all irqs: > > static void __setup_vector_irq(int cpu) > { > struct apic_chip_data *data; > struct irq_desc *desc; > int irq, vector; > > /* Mark the inuse vectors */ > for_each_irq_desc(irq, desc) { > struct irq_data *idata = irq_desc_get_irq_data(desc); > > data = apic_chip_data(idata); > if (!data || !cpumask_test_cpu(cpu, data->domain)) > continue; > .... > > And since we have assigned a non-APIC chip data in the driver, the > code above refers to a wrong object, leading to Oops. Bah crap. This information should have been provided earlier instead of handwavy 'doesnt work with CONFIG_FOO and hotplug'. > As a further note, the setup_vector_irq() code has been changed in > 4.15, and such a reference won't happen any longer. So the patch > isn't necessary for now, although it's not bad to take as a cleanup. > And we can eventually put Cc to stable there since it actually works > around the issue above for the older kernels -- of course, with more > detailed descriptions about the background. No, that's just tinkering. The proper fix is to make that code robust. Something like the completely untested patch below should do the trick. Thanks, tglx diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index f3557a1eb562..02e6a3cc0d74 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -58,6 +58,9 @@ static struct apic_chip_data *apic_chip_data(struct irq_data *irq_data) while (irq_data->parent_data) irq_data = irq_data->parent_data; + if (irq_data->domain != x86_vector_domain) + return NULL; + return irq_data->chip_data; }