From patchwork Wed Aug 21 14:50:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxime Chevallier X-Patchwork-Id: 13771865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 60FCEC5321D for ; Wed, 21 Aug 2024 16:00:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=W8hCqk9NjgpLa2l+icWWFrRXKEtWEtdAqPxiijLHIvQ=; b=p3YNq2OgPHmV9aKAQaRYeXhSC5 QJIsNgqwySMQHe8nXrwDPR2FUPbajjKZfCAlX8y1Lenbz8KMinUvVn7yWbKcog5+0UZFp+9GQZCXn 6iWGfGGgNPvHYqVgC5bWaHjoLAie2mXsWnBMGzbSovJpfj2sXU/NTuZXtaYqey4XyZz32cDDyg80/ j9aD6q8FX2PZlirg5gbl9iLCUdRZSW5mNu6UAP+GNjV6RQSKz1b/XGPlPcYtPFcUN+/dA9YIJFWWd JF7WLW1IYQzrvJd9VMOxNdTZN0J4EhjwGmXvVOA/EZgmUOK0rwYiThNdmdS35rq7MOCsl4vcTGmeg 4isdndcQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sgnlN-00000009eEW-3vv4; Wed, 21 Aug 2024 16:00:45 +0000 Received: from relay5-d.mail.gandi.net ([217.70.183.197]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sgmfa-00000009NWW-1uw7 for linux-arm-kernel@lists.infradead.org; Wed, 21 Aug 2024 14:50:44 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id 0151E1C0004; Wed, 21 Aug 2024 14:50:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bootlin.com; s=gm1; t=1724251838; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=W8hCqk9NjgpLa2l+icWWFrRXKEtWEtdAqPxiijLHIvQ=; b=VnSdu5K20s4xXDpQkhWRoSJKtcumgfTlqLCfS1Fiyt2uYdmL3QuRVrQ6XYGB56FK51r2YP XtFKHpY7AXAOFxLr45kSre1MYwmvaKzQVJckXH4maFZWe/c6mr+hqQIJAl6hi04uC99g1F MyKMlqSOK+k8yoSqfthaRJTMjC6cGbL4xpnjRyUYf260eCkX3XqtAMsFYizV8eSfSfu63h Y1JANRDrEGOinP4PTL4Vh5MABZR4hSEIH6iVFFVGHedvscwtsKfc7yEgdkY2XkR3j3J9c0 Ge70VsHZTRtKstyiRnBgJq5bGsHfh6N3jAtTICVFrFZaFTyouYjncVjtCSME7g== Date: Wed, 21 Aug 2024 16:50:34 +0200 From: Maxime Chevallier To: Thomas Gleixner , Andrew Lunn , Gregory Clement , Sebastian Hesselbarth , Russell King Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Thomas Petazzoni Subject: Regression on Macchiatobin from the irqchip driver Message-ID: <20240821165034.1af97bad@fedora-3.home> Organization: Bootlin X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) MIME-Version: 1.0 X-GND-Sasl: maxime.chevallier@bootlin.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240821_075043_126556_885CB29C X-CRM114-Status: GOOD ( 17.78 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi everyone, I've been testing out some network series on the Macchiatobin (Armada 8k SoC) and I stumbled upon a crash at boot, that showed-up on the latest net-next branch : [ 2.755698] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 2.757592] mmcblk0: mmc0:0001 8GME4R 7.28 GiB [ 2.766033] Mem abort info: [ 2.766036] ESR = 0x0000000096000004 [ 2.774534] mmcblk0: p1 [ 2.777086] EC = 0x25: DABT (current EL), IL = 32 bits [ 2.779893] mmcblk0boot0: mmc0:0001 8GME4R 4.00 MiB [ 2.784965] SET = 0, FnV = 0 [ 2.784969] EA = 0, S1PTW = 0 [ 2.784972] FSC = 0x04: level 0 translation fault [ 2.784976] Data abort info: [ 2.790648] mmcblk0boot1: mmc0:0001 8GME4R 4.00 MiB [ 2.792943] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 2.796867] mmcblk0rpmb: mmc0:0001 8GME4R 512 KiB, chardev (234:0) [ 2.801002] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 2.801006] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 2.830960] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101a75000 [ 2.837436] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [ 2.844265] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 2.850560] Modules linked in: [ 2.853631] CPU: 2 UID: 0 PID: 51 Comm: kworker/u18:2 Not tainted 6.10.0-12649-g25010bfdf8bb #10 [ 2.862457] Hardware name: Marvell 8040 MACCHIATOBin Double-shot (DT) [ 2.868926] Workqueue: events_unbound deferred_probe_work_func [ 2.874800] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2.881794] pc : msi_lib_irq_domain_select+0x28/0x58 [ 2.886786] lr : irq_find_matching_fwspec+0xf0/0x120 [ 2.891778] sp : ffff8000871f39e0 [ 2.895107] x29: ffff8000871f39e0 x28: 0000000000000000 x27: ffff00013f7e11e8 [ 2.902281] x26: 0000000000000004 x25: ffff00013f7e11d0 x24: ffff800086764dd0 [ 2.909453] x23: ffff800081633b28 x22: ffff8000871f3a68 x21: 0000000000000001 [ 2.916624] x20: ffff800086764df0 x19: ffff000101171000 x18: ffffffffffffffff [ 2.923797] x17: ffff000100de8a0c x16: ffff000100de8a00 x15: ffff000100da9cca [ 2.930969] x14: ffffffffffffffff x13: 0030354072656c6c x12: 6f72746e6f632d74 [ 2.938141] x11: 7075727265746e69 x10: 0000000000036018 x9 : 0000000000000001 [ 2.945314] x8 : ffff8000871f3ab8 x7 : 0000000000000000 x6 : 4d0c1e0cade3e5ec [ 2.952486] x5 : 6c65632d0c1e0c4d x4 : ffff00013f7e11e8 x3 : ffff000101171000 [ 2.959659] x2 : 0000000000000004 x1 : 0000000000000000 x0 : 0000000000000000 [ 2.966831] Call trace: [ 2.969288] msi_lib_irq_domain_select+0x28/0x58 [ 2.973928] irq_find_matching_fwnode+0x4c/0x78 [ 2.978484] of_msi_get_domain+0x11c/0x138 [ 2.982602] mvebu_icu_subset_probe+0x5c/0x124 [ 2.987068] platform_probe+0x68/0xdc [ 2.990748] really_probe+0xbc/0x2a4 [ 2.994343] __driver_probe_device+0x78/0x12c [ 2.998722] driver_probe_device+0xdc/0x160 [ 3.002926] __device_attach_driver+0xb8/0x134 [ 3.007392] bus_for_each_drv+0x80/0xdc [ 3.011248] __device_attach+0xa8/0x1b0 [ 3.015103] device_initial_probe+0x14/0x20 [ 3.019307] bus_probe_device+0xa8/0xac [ 3.023162] deferred_probe_work_func+0x88/0xc0 [ 3.027714] process_one_work+0x150/0x294 [ 3.031743] worker_thread+0x2e4/0x3ec [ 3.035510] kthread+0x118/0x11c [ 3.038756] ret_from_fork+0x10/0x20 [ 3.042353] Code: d65f03c0 b9400820 35ffffa0 f9404461 (b9400823) [ 3.048473] ---[ end trace 0000000000000000 ]--- I bisected the bug and the crash appeared at : fbdf14e90ce4 ("irqchip/irq-mvebu-sei: Switch to MSI parent") I've briefly looked at it, and it seems the NULL pointer that's being dereferenced here is the "ops" pointer in msi_lib_irq_domain_select [1] I'm not very familiar with the irqchip subsystem, my best guess is that this is being called for the ap_domain, in the irq-mvebu-sei driver, which doesn't have any msi_parent_ops set [2]. By looking at the msi_lib_irq_domain_select() implementation however, I notice that it appears to be expected that these ops can be NULL by looking at the check in the return line : return ops && !!(ops->bus_select_mask & busmask); However, the line above dereferences the ops pointer without prior check : /* Handle pure domain searches */ if (bus_token == ops->bus_select_token) return 1; As I said, this area of the kernel isn't very familiar to me, but I got my board to boot with the following patch : I have zero confidence that this is the correct solution to the issue so feel free to ditch that solution :) I'll gladly test any patch for that on the MCBIN. Let me know if you want me to run more tests. Thanks, Maxime [1] : https://elixir.bootlin.com/linux/v6.11-rc4/source/drivers/irqchip/irq-msi-lib.c#L125 [2] : https://elixir.bootlin.com/linux/v6.11-rc4/source/drivers/irqchip/irq-mvebu-sei.c#L423 --- a/drivers/irqchip/irq-msi-lib.c +++ b/drivers/irqchip/irq-msi-lib.c @@ -128,6 +128,9 @@ int msi_lib_irq_domain_select(struct irq_domain *d, struct irq_fwspec *fwspec, const struct msi_parent_ops *ops = d->msi_parent_ops; u32 busmask = BIT(bus_token); + if (!ops) + return 0; + if (fwspec->fwnode != d->fwnode || fwspec->param_count != 0) return 0; @@ -135,6 +138,6 @@ int msi_lib_irq_domain_select(struct irq_domain *d, struct irq_fwspec *fwspec, if (bus_token == ops->bus_select_token) return 1; - return ops && !!(ops->bus_select_mask & busmask); + return !!(ops->bus_select_mask & busmask); ----------------------------