From patchwork Mon Oct 7 08:23:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthias Schiffer X-Patchwork-Id: 13824312 X-Patchwork-Delegate: kuba@kernel.org Received: from mx1.tq-group.com (mx1.tq-group.com [93.104.207.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D93F17C203; Mon, 7 Oct 2024 08:24:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=93.104.207.81 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728289478; cv=none; b=bkDtcGHA1/jYlFammap4py1bPECnips4CNnA/a/UlLBmUMo2R+PsrA2LTXy+xTSdT/ml6OeojaMo8on4xStm10PNe9s91hdY0mwuECp43WTPhnS3wsNis7ua4BYE+9AHkXJ4tPRXWJ+55tM9le8jSeIWQn0Xsc1FD3BJ1rlW4lY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728289478; c=relaxed/simple; bh=2P/3SYlQzuEDT09aLRhadj45a9SWACw9nyneRFqqVtI=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=CWSTa29l38EwFScUy/evY4ZyskOCi6RLTa7gCVsPfpP1+okTpy/wL8CxkYnrXZhUm2yfZgI9BwJM86GZ3DVVCFuUa8YmV6qkCGFYxIbC3tUxJbROPa4EZrK7S7rwovHFfIcTUfEsZTz0W+MIlmyy3NVCoqegz0lxA15Rsii56+I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ew.tq-group.com; spf=pass smtp.mailfrom=ew.tq-group.com; dkim=pass (2048-bit key) header.d=tq-group.com header.i=@tq-group.com header.b=feRvz87w; dkim=fail (0-bit key) header.d=ew.tq-group.com header.i=@ew.tq-group.com header.b=aqsTDOeo reason="key not found in DNS"; arc=none smtp.client-ip=93.104.207.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ew.tq-group.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ew.tq-group.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tq-group.com header.i=@tq-group.com header.b="feRvz87w"; dkim=fail reason="key not found in DNS" (0-bit key) header.d=ew.tq-group.com header.i=@ew.tq-group.com header.b="aqsTDOeo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tq-group.com; i=@tq-group.com; q=dns/txt; s=key1; t=1728289474; x=1759825474; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=kZLHKkZcSRfUU6Wcyx/OfRhBUNPMTTzmQTipgLzwHz0=; b=feRvz87wV4upVY2GIBgEwwJjdb3GIhtnKXFCZusiBOlvGNGgZAzlG7NS 866+e93g3uBiSq+IInwLtBDPqKyx3OYULzNqkt9NH7NtRqY53UsUTjshd TWZcZp3yCT3LeLu13EoJb7eE7LDRK0E4uN2tY7x/s8WYWh36r/9DPIPnP 5LazwyD7CRUuGGtU83yCgv2+VvuMuVRS8TeVcYAmoISseeqMOQSH4LgMy qe9DvDE880XYGJXpyfp1XRkxQ52tQIuy3WD0OzVbPRwcEf/ZJ8tUlgrYQ tB3qkIKJI+aYCU9CL8w2LfH5pKk3zg/+U1pwigbB+7meAp9P0M8vn3A/+ g==; X-CSE-ConnectionGUID: rZ70ks/rTpKb0dn1sugSrg== X-CSE-MsgGUID: nPQYMujiT7aNJtDW504mMw== X-IronPort-AV: E=Sophos;i="6.11,183,1725314400"; d="scan'208";a="39298933" Received: from vmailcow01.tq-net.de ([10.150.86.48]) by mx1.tq-group.com with ESMTP; 07 Oct 2024 10:24:31 +0200 X-CheckPoint: {67039ABF-F-69B013CD-E6388B31} X-MAIL-CPID: A63DD35BB5EAE5D33262B5486BBD2C8B_1 X-Control-Analysis: str=0001.0A682F1C.67039ABF.0075,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0 Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 3145516BE23; Mon, 7 Oct 2024 10:24:19 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ew.tq-group.com; s=dkim; t=1728289466; h=from:subject:date:message-id:to:cc:mime-version:content-type: content-transfer-encoding; bh=kZLHKkZcSRfUU6Wcyx/OfRhBUNPMTTzmQTipgLzwHz0=; b=aqsTDOeooiTDdZxttXBMnP31t9Q92LYbn2zIf3FpQYAhjDQ5rbN9M7ITqk8HsPjPr1fhNo Gw19eXB8Sm4yfPEOua5SaYIXq+/2gfvq/S+PGCMOZI2R1hLTA5gzQNd2gdVixQ79mQyXoZ OS+LAcw8R0dj9FHeMCprGOeMHr4WRcnky8o+R4DRPDzzqne+vcjy55rxD9kG9Hz8qm8ubN gZ1I0RpfB6rdB20mFmcS9p3EsGenTY06IMofIuVO3vFsgcMpwEYXMaWMHLLq8RdruL3vbS ZX213a2UHReFujdkvguBY+5a1+gT4CB5JxuZ/un2HDNQdAFt2nUiPYo+VIHcWQ== From: Matthias Schiffer To: Chandrasekar Ramakrishnan , Marc Kleine-Budde , Vincent Mailhol Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , =?utf-8?q?Martin_Hundeb=C3=B8ll?= , Markus Schneider-Pargmann , "Felipe Balbi (Intel)" , Raymond Tan , Jarkko Nikula , linux-can@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux@ew.tq-group.com, Matthias Schiffer Subject: [PATCH v4 1/2] can: m_can: set init flag earlier in probe Date: Mon, 7 Oct 2024 10:23:58 +0200 Message-ID: X-Mailer: git-send-email 2.46.2 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Last-TLS-Session-Version: TLSv1.3 X-Patchwork-Delegate: kuba@kernel.org While an m_can controller usually already has the init flag from a hardware reset, no such reset happens on the integrated m_can_pci of the Intel Elkhart Lake. If the CAN controller is found in an active state, m_can_dev_setup() would fail because m_can_niso_supported() calls m_can_cccr_update_bits(), which refuses to modify any other configuration bits when CCCR_INIT is not set. To avoid this issue, set CCCR_INIT before attempting to modify any other configuration flags. Fixes: cd5a46ce6fa6 ("can: m_can: don't enable transceiver when probing") Signed-off-by: Matthias Schiffer Reviewed-by: Markus Schneider-Pargmann --- v2: no changes v3: updated comment to mention Elkhart Lake v4: added Reviewed-by drivers/net/can/m_can/m_can.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c index 012c3d22b01dd..c85ac1b15f723 100644 --- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -1681,6 +1681,14 @@ static int m_can_dev_setup(struct m_can_classdev *cdev) return -EINVAL; } + /* Write the INIT bit, in case no hardware reset has happened before + * the probe (for example, it was observed that the Intel Elkhart Lake + * SoCs do not properly reset the CAN controllers on reboot) + */ + err = m_can_cccr_update_bits(cdev, CCCR_INIT, CCCR_INIT); + if (err) + return err; + if (!cdev->is_peripheral) netif_napi_add(dev, &cdev->napi, m_can_poll); @@ -1732,11 +1740,7 @@ static int m_can_dev_setup(struct m_can_classdev *cdev) return -EINVAL; } - /* Forcing standby mode should be redundant, as the chip should be in - * standby after a reset. Write the INIT bit anyways, should the chip - * be configured by previous stage. - */ - return m_can_cccr_update_bits(cdev, CCCR_INIT, CCCR_INIT); + return 0; } static void m_can_stop(struct net_device *dev) From patchwork Mon Oct 7 08:23:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthias Schiffer X-Patchwork-Id: 13824313 X-Patchwork-Delegate: kuba@kernel.org Received: from mx1.tq-group.com (mx1.tq-group.com [93.104.207.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FB7A188703; Mon, 7 Oct 2024 08:24:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=93.104.207.81 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728289481; cv=none; b=dPdcj+FfZvu/x4315PMHYD/keL9q+OyTaJuw401kVQgDHYwmZ6Yn+w/uK5E0jNrrW2QYFZWay7ds5XTuH43C83+etxtJhCVCCuqru50MLlso4IuBRgn5DY6o+Bwd5h8pCr4dahWU9UWGux+hayxYRD19vSXsH1X1G6YNr85jrdo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728289481; c=relaxed/simple; bh=uAWY5GeBU1LHEXd3tdVJsfzxjfC5n+KBX+GF3q4EYh4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DESJBSH7uUdzHKwUoFu1htJNxLHAPeeR8O8u2hR5fGRK5bPYtIgFKNo5k+mEJOUGwbTXhXTXDIgf4WIhIg3o3vKtvR7fvOrVyHQJr0lJktlzGEXHlvj9atAoZRFhH1DjcXbquNU8KjhxVuny8r2bCUv7dZQNAm9KmBvgDFQp2g8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ew.tq-group.com; spf=pass smtp.mailfrom=ew.tq-group.com; dkim=pass (2048-bit key) header.d=tq-group.com header.i=@tq-group.com header.b=Fz6LWzd+; dkim=fail (0-bit key) header.d=ew.tq-group.com header.i=@ew.tq-group.com header.b=Yvtf+tpU reason="key not found in DNS"; arc=none smtp.client-ip=93.104.207.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ew.tq-group.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ew.tq-group.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tq-group.com header.i=@tq-group.com header.b="Fz6LWzd+"; dkim=fail reason="key not found in DNS" (0-bit key) header.d=ew.tq-group.com header.i=@ew.tq-group.com header.b="Yvtf+tpU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tq-group.com; i=@tq-group.com; q=dns/txt; s=key1; t=1728289478; x=1759825478; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PPDCreN8S5FVeojCwGe3k4aj+6euS02xfPX/V6waND4=; b=Fz6LWzd+X6aTp4fJanR+tuiq8JX416ntwUpvCg5ZKT6lKK39E9ZV14kG /fKauZbAQ60fo2iEsjbO1gQhBR7jfG4oIT0FK3LUma/JphAXFNaF8qDKU ONNgHQOU/hA6Z/dnmyqi5l8WmYm356ESibjyVwqIXTxCyC1LeU0svLXvC rGsVTLi3KSgykm7avmJEpJ6LULyV1vSGnBueiMYr5BBAZyDsGY2pf0hmL H44vxBgvRsmM9UM0ISD5wY389LEWHFFbDBak+1Zh6XeVpc4emk9Z6MQJW oPvzfg0kgRh60V+hG4/w5bmcAbNf2SzMAqct6D4gS7w5u33TzQRAFmhJx A==; X-CSE-ConnectionGUID: nys4hkqWSYqSH6CyI/oXiA== X-CSE-MsgGUID: uRnEfY3mT9eqxrnTnlCmvw== X-IronPort-AV: E=Sophos;i="6.11,183,1725314400"; d="scan'208";a="39298937" Received: from vmailcow01.tq-net.de ([10.150.86.48]) by mx1.tq-group.com with ESMTP; 07 Oct 2024 10:24:35 +0200 X-CheckPoint: {67039AC3-27-F5F7BCAD-D99768E6} X-MAIL-CPID: CD52C37E9DB2573E7B755B6C0D9C85D7_3 X-Control-Analysis: str=0001.0A682F26.67039AC4.001C,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0 Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id E2DAA16BDE5; Mon, 7 Oct 2024 10:24:30 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ew.tq-group.com; s=dkim; t=1728289471; h=from:subject:date:message-id:to:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references; bh=PPDCreN8S5FVeojCwGe3k4aj+6euS02xfPX/V6waND4=; b=Yvtf+tpU2ex4B8yuF+29woyM81KNY6dg0S847anb9qxZhPUUcTx5vaomi21ol+jrkBG3+3 bXA5eRrXqeBTF1SAUjbt1F1ivkE63WqTItRKjfnpJ6wZZwfm+rwb8ftTQLPGeswkp+0HrM j1FikjW5m3uFWacG0bn2lBu7NDXNaQ4Zq8weEdVu7DPJ3xT9qyFI8ny+VgI7qV/oHIwrrl bvaVxs3HFekhgJLe/ecHU2qQHDuNBS0p8miFOHj7g+0yCTXYMSTJHbwbKc8++6lQlg2Sk8 7bdY51OwmnTgQr+Kuzv+we6coqjr+Y3Vnx+AGOsWEugu1k17cfEfQ0aYnKyO9Q== From: Matthias Schiffer To: Chandrasekar Ramakrishnan , Marc Kleine-Budde , Vincent Mailhol Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , =?utf-8?q?Martin_Hundeb=C3=B8ll?= , Markus Schneider-Pargmann , "Felipe Balbi (Intel)" , Raymond Tan , Jarkko Nikula , linux-can@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux@ew.tq-group.com, Matthias Schiffer Subject: [PATCH v4 2/2] can: m_can: fix missed interrupts with m_can_pci Date: Mon, 7 Oct 2024 10:23:59 +0200 Message-ID: X-Mailer: git-send-email 2.46.2 In-Reply-To: References: Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Last-TLS-Session-Version: TLSv1.3 X-Patchwork-Delegate: kuba@kernel.org The interrupt line of PCI devices is interpreted as edge-triggered, however the interrupt signal of the m_can controller integrated in Intel Elkhart Lake CPUs appears to be generated level-triggered. Consider the following sequence of events: - IR register is read, interrupt X is set - A new interrupt Y is triggered in the m_can controller - IR register is written to acknowledge interrupt X. Y remains set in IR As at no point in this sequence no interrupt flag is set in IR, the m_can interrupt line will never become deasserted, and no edge will ever be observed to trigger another run of the ISR. This was observed to result in the TX queue of the EHL m_can to get stuck under high load, because frames were queued to the hardware in m_can_start_xmit(), but m_can_finish_tx() was never run to account for their successful transmission. On an Elkhart Lake based board with the two CAN interfaces connected to each other, the following script can reproduce the issue: ip link set can0 up type can bitrate 1000000 ip link set can1 up type can bitrate 1000000 cangen can0 -g 2 -I 000 -L 8 & cangen can0 -g 2 -I 001 -L 8 & cangen can0 -g 2 -I 002 -L 8 & cangen can0 -g 2 -I 003 -L 8 & cangen can0 -g 2 -I 004 -L 8 & cangen can0 -g 2 -I 005 -L 8 & cangen can0 -g 2 -I 006 -L 8 & cangen can0 -g 2 -I 007 -L 8 & cangen can1 -g 2 -I 100 -L 8 & cangen can1 -g 2 -I 101 -L 8 & cangen can1 -g 2 -I 102 -L 8 & cangen can1 -g 2 -I 103 -L 8 & cangen can1 -g 2 -I 104 -L 8 & cangen can1 -g 2 -I 105 -L 8 & cangen can1 -g 2 -I 106 -L 8 & cangen can1 -g 2 -I 107 -L 8 & stress-ng --matrix 0 & To fix the issue, repeatedly read and acknowledge interrupts at the start of the ISR until no interrupt flags are set, so the next incoming interrupt will also result in an edge on the interrupt line. While we have received a report that even with this patch, the TX queue can become stuck under certain (currently unknown) circumstances on the Elkhart Lake, this patch completely fixes the issue with the above reproducer, and it is unclear whether the remaining issue has a similar cause at all. Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake") Signed-off-by: Matthias Schiffer Reviewed-by: Markus Schneider-Pargmann --- As noted in the updated commit description, this patch either doesn't fix the issue completely, or there is a different issue with the same symptoms lurking in the driver. I think it would still be a good idea to merge this patch, as it does significantly improve the situation with m_can_pci (completely fixing the reproducer.) Our customer who reported the issue to us seems to have found a way to work around the remaining issue by combining this patch with some change in their userspace application (using one socket per destination address instead of sharing a single socket for the whole application), but I have no idea how that would even affect the behavior except for subtle timing differences. Having a workaround unfortunately also means that we currently can't afford to put more effort into the analysis of the remaining issue. v2: introduce flag is_edge_triggered, so we can avoid the loop on !m_can_pci v3: - rename flag to irq_edge_triggered - update comment to describe the issue more generically as one of systems with edge-triggered interrupt line. m_can_pci is mentioned as an example, as it is the only m_can variant that currently sets the irq_edge_triggered flag. v4: - update comment as suggested by Markus - add Reviewed-by - extend commit description with reproducer and recent updates drivers/net/can/m_can/m_can.c | 22 +++++++++++++++++----- drivers/net/can/m_can/m_can.h | 1 + drivers/net/can/m_can/m_can_pci.c | 1 + 3 files changed, 19 insertions(+), 5 deletions(-) diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c index c85ac1b15f723..5dfadecc427a3 100644 --- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -1207,20 +1207,32 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir) static int m_can_interrupt_handler(struct m_can_classdev *cdev) { struct net_device *dev = cdev->net; - u32 ir; + u32 ir = 0, ir_read; int ret; if (pm_runtime_suspended(cdev->dev)) return IRQ_NONE; - ir = m_can_read(cdev, M_CAN_IR); + /* The m_can controller signals its interrupt status as a level, but + * depending in the integration the CPU may interpret the signal as + * edge-triggered (for example with m_can_pci). For these + * edge-triggered integrations, we must observe that IR is 0 at least + * once to be sure that the next interrupt will generate an edge. + */ + while ((ir_read = m_can_read(cdev, M_CAN_IR)) != 0) { + ir |= ir_read; + + /* ACK all irqs */ + m_can_write(cdev, M_CAN_IR, ir); + + if (!cdev->irq_edge_triggered) + break; + } + m_can_coalescing_update(cdev, ir); if (!ir) return IRQ_NONE; - /* ACK all irqs */ - m_can_write(cdev, M_CAN_IR, ir); - if (cdev->ops->clear_interrupts) cdev->ops->clear_interrupts(cdev); diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h index 92b2bd8628e6b..ef39e8e527ab6 100644 --- a/drivers/net/can/m_can/m_can.h +++ b/drivers/net/can/m_can/m_can.h @@ -99,6 +99,7 @@ struct m_can_classdev { int pm_clock_support; int pm_wake_source; int is_peripheral; + bool irq_edge_triggered; // Cached M_CAN_IE register content u32 active_interrupts; diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c index d72fe771dfc7a..9ad7419f88f83 100644 --- a/drivers/net/can/m_can/m_can_pci.c +++ b/drivers/net/can/m_can/m_can_pci.c @@ -127,6 +127,7 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id) mcan_class->pm_clock_support = 1; mcan_class->pm_wake_source = 0; mcan_class->can.clock.freq = id->driver_data; + mcan_class->irq_edge_triggered = true; mcan_class->ops = &m_can_pci_ops; pci_set_drvdata(pci, mcan_class);