From patchwork Mon Feb 6 02:52:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liao, Chang" X-Patchwork-Id: 13129294 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3A8DCC636CC for ; Mon, 6 Feb 2023 02:53:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Subject:CC:To:From:MIME-Version:Date: Message-ID:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=YiWgex06unAyYpQ62yKAKIvgjIUcuZak5nN8WR5hNhE=; b=3qykOtsS+oUI6B AbLQPL7cfhkKoj8O2MOxhOumbUWdBg4HKA+te15csn+bgisBc9OA42VgZa7q+gOFtsox4dHuCB+Jv FtM0IbQDciC3p12XEpdV1to0ud6kKmqYzkQJeVEZ6b8s2hJDFXt7Gg+Q+LkvEdq/WJJ9KOP555usW N749pUfkHYN+RzDlf5OEDJSnY987JR0LlM8Pxavuy7iF1hG+PyreffbEtouguH50AK+B52PlKhFoY +iBCU3wrm6l1dzg+odPfU2zmxfHTM/lgbdABNHOXfmRZgJ9LNneT4LBsU+W6/pVRKWZFIOBzQZqcr N2+cCfCU5sSoNJHQpXcA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pOrcY-007Dgz-GH; Mon, 06 Feb 2023 02:52:42 +0000 Received: from szxga02-in.huawei.com ([45.249.212.188]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pOrcT-007Dfp-Vp for linux-arm-kernel@lists.infradead.org; Mon, 06 Feb 2023 02:52:40 +0000 Received: from kwepemi500012.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4P99jH5n9NzRqLn; Mon, 6 Feb 2023 10:50:11 +0800 (CST) Received: from [10.67.110.108] (10.67.110.108) by kwepemi500012.china.huawei.com (7.221.188.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Mon, 6 Feb 2023 10:52:31 +0800 Message-ID: Date: Mon, 6 Feb 2023 10:52:30 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 From: "liaochang (A)" To: Marc Zyngier , Catalin Marinas CC: Thomas Gleixner , guohanjun 00470146 , Yipeng Zou , "zhangjianhua (E)" , Subject: [RFC] Some issues about LPI migration on ARM SMP platform X-Originating-IP: [10.67.110.108] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemi500012.china.huawei.com (7.221.188.12) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230205_185238_442892_92EBFE3D X-CRM114-Status: GOOD ( 19.45 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi, Marc,Catalin Recently, Yipeng report an issue [1] about LPI migration on SMP and get some valuable feedback, but after further investigation, I beleive it is a fundamental issue of LPI handling mechanism which can happen in those contexts: For example, NIC device generates MSI and sends LPI to CPU0 via ITS, meanwhile irqbalance running on CPU1 set irq affinty of NIC to CPU1, the next interrupt will be sent to CPU1, due to the state of irq is still in progress, kernel does not end up performing irq handler on CPU1, which results in some userland service timeout, the sequence of events is shown as follows: NIC CPU0 CPU1 ========== =================== ================== Generate Interrupt#1 READ_IAR Lock irq_desc Set IRQD_IN_PROGRESS Unlock irq_desc Lock irq_desc Change LPI affinity Unlock irq_desc Call irq_handler Generate Interupt#2 READ_IAR Lock irq_desc Check IRQD_IN_PROGRESS Unlock irq_desc Return from interrupt#2 Lock irq_desc Clear IRQD_IN_PROGRESS Unlock irq_desc return from interrupt#1 >From the perspective of NIC, if one IRQ acked is not serviced by kernel, it needs to wait for a long time to generate the next event. This problem is reproduced on several chips from different Arm vendors using something like the following commands, please replace the IRQ 56 with the ITS-MSI irq on your machine if you want to test on your own. # while true; do echo trigger > /sys/kernel/debug/irq/irqs/56; done # while true; do echo 0 > /proc/irq/56/smp_affinity_list; echo 1 > /proc/irq/56/smp_affinity_list; done In [1], Thomas suggests to enable CONFIG_GENERIC_PENDING_IRQ to delay the affinity change of LPI, so that the interrupt will not be migrated to the new CPU until all interrupt context exits, it ensures all IRQ acked will be serviced, and i develop a not-so-perfect patch [2] to delay the affinity change of LPI until the next interrupt acknowledge, let me use the sequence below explains what this patch has done: NIC CPU0 CPU1 ========== =================== ================== Generate Interrupt#1 READ_IAR Lock irq_desc Set IRQD_IN_PROGRESS Unlock irq_desc Lock irq_desc 1) its_set_affinity returns -EBUSY 2) Call irqd_set_move_pending Unlock irq_desc Call irq_handler Generate Interupt#2 Lock irq_desc Clear IRQD_IN_PROGRESS Unlock irq_desc Return from interrupt#1 3) Read_IAR and clear the corresponding bit in CPU0 LPI pending table 4) Call irq_move_irq 5) Call irq_retrigger to set corresponding bit in CPU1 LPI pending table. 6) Return from interrupt#2 Read_IAR Lock irq_desc Set IRQD_IN_PROGRESS Unlock irq_desc Call irq_handler Generate Interrupt#3 Lock irq_desc Clear IRQD_IN_PROGRESS Unlock irq_desc Return from interrupt#2 After that, the missing LPI caused userland timeout never occur again so far, but I don't think this solution can ensure all interrupt generated by Device could be serviced on CPU1 totally, it depends on the order of the issuing of device interrupt and the reading of IAR on CPU1, if device generates interrupt#3 before CPU1 starts to service the interrupt#2 that is retriggered on CPU0, the interrupt#3 is drop actually because LPI pending table use single bit to record each LPI, this scenario is shown as follows: NIC CPU0 CPU1 ========== =================== ================== Generate Interupt#2 1) Read IAR and clear the corresponding bit in CPU0 LPI pending table 2) Call irq_move_irq 3) Call irq_retrigger to set correspondng bit in CPU1 LPI pending table Generate Interrupt#3 4) Return from interrupt#2 5) Read_IAR and clear the corresponding bit in CPU1 LPI pending table ... Call irq_handler ... Return from interrupt#2 So I wonder why GIC does not support Active state for LPI just like SPI /PPI/SGI, if so, this problem will be much easier to deal, does ARM has any plan to support Active state for LPI, or something else to fix this issue? Thanks. [1] https://lore.kernel.org/all/20230106082136.68501-1-zouyipeng@huawei.com/ [2] On CONFIG_GENERIC_PENDING_IRQ=y, the hotfix patch supports delay irq migration for LPI: Liao, Chang --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1682,6 +1682,17 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val, return -EINVAL; } +void handle_misroute_irq(struct irq_desc *desc) +{ + struct irq_data *data = irq_desc_get_irq_data(desc); + + if (irq_move_pending(data)) { + irq_move_irq(data); + its_irq_retrigger(data); + } else + handle_fasteoi_irq(desc); +} + static u64 its_irq_get_msi_base(struct its_device *its_dev) { struct its_node *its = its_dev->its; diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 34d58567b78d..a563454ace20 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1466,7 +1466,7 @@ static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq, if (!gic_dist_supports_lpis()) return -EPERM; irq_domain_set_info(d, irq, hw, chip, d->host_data, - handle_fasteoi_irq, NULL, NULL); + handle_misroute_irq, NULL, NULL); break; -- BR,