From patchwork Thu Jul 16 23:07:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Song Bao Hua (Barry Song)" X-Patchwork-Id: 11668543 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4012618 for ; Thu, 16 Jul 2020 23:11:14 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9B77020760 for ; Thu, 16 Jul 2020 23:11:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="Gu5juvwc" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9B77020760 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=hisilicon.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:Subject:To:From: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=Bf+DRApzhdPzxA5Y4JgIQaZVf2rNpNTXvhLZVolNWOM=; b=Gu5juvwcCMFe8fpbj1GDV3gvTP 2UdQC/8mfXLDzf/iI/VyIkkEPa6769SDa6/Fb74S6nqNsMaPrgtvvcBW7qzBuL8W/cE3jLkoIFggJ YvnEgb2FVy/es2rHILrUdi0D9LjaYJvHM4iGwP7sK23ceW7cbmWLu0oatZ7Lq/XB5kgIdkpJCRCfA RoU8YI9jc1CQHDeOU06MWl5vq8+2/NtZzdHEnmJmuoWqqxtJy5OPDG/F28VGzCI4/w8gneGzr3vpF dBhn2HwjKstnk4/gwp/KCQzhdSbkcS5vrnDJOEx2YJxzFfNSXmv9iGoEDiBLrgXQjdi9NjGh/nXVd nBAD4J9w==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jwD0Q-00055b-As; Thu, 16 Jul 2020 23:09:34 +0000 Received: from szxga07-in.huawei.com ([45.249.212.35] helo=huawei.com) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jwD0N-00053x-C2 for linux-arm-kernel@lists.infradead.org; Thu, 16 Jul 2020 23:09:32 +0000 Received: from DGGEMS406-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 554288012DFCE76CF193; Fri, 17 Jul 2020 07:09:19 +0800 (CST) Received: from SWX921481.china.huawei.com (10.126.203.111) by DGGEMS406-HUB.china.huawei.com (10.3.19.206) with Microsoft SMTP Server id 14.3.487.0; Fri, 17 Jul 2020 07:09:12 +0800 From: Barry Song To: , , Subject: [PATCH] iommu/arm-smmu-v3: remove the approach of MSI polling for CMD SYNC Date: Fri, 17 Jul 2020 11:07:09 +1200 Message-ID: <20200716230709.32820-1-song.bao.hua@hisilicon.com> X-Mailer: git-send-email 2.21.0.windows.1 MIME-Version: 1.0 X-Originating-IP: [10.126.203.111] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200716_190931_648454_BB5DB5A8 X-CRM114-Status: GOOD ( 11.29 ) X-Spam-Score: 0.7 (/) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (0.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [45.249.212.35 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_H4 RBL: Very Good reputation (+4) [45.249.212.35 listed in wl.mailspike.net] 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 3.0 AC_FROM_MANY_DOTS Multiple periods in From user name X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Barry Song , linux-kernel@vger.kernel.org, linuxarm@huawei.com, iommu@lists.linux-foundation.org, Prime Zeng , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Before commit 587e6c10a7ce ("iommu/arm-smmu-v3: Reduce contention during command-queue insertion"), msi polling perhaps performed better since it could run outside the spin_lock_irqsave() while the code polling cons reg was running in the lock. But after the great reorganization of smmu queue, neither of these two polling methods are running in a spinlock. And real tests show polling cons reg via sev means smaller latency. It is probably because polling by msi will ask hardware to write memory but sev polling depends on the update of register only. Using 16 threads to run netperf on hns3 100G NIC with UDP packet size in 32768bytes and set iommu to strict, TX throughput can improve from 25227.74Mbps to 27145.59Mbps by this patch. In this case, SMMU is super busy as hns3 sends map/unmap requests extremely frequently. Cc: Prime Zeng Signed-off-by: Barry Song --- drivers/iommu/arm-smmu-v3.c | 46 +------------------------------------ 1 file changed, 1 insertion(+), 45 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index f578677a5c41..e55282a636c8 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -964,12 +964,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp); break; case CMDQ_OP_CMD_SYNC: - if (ent->sync.msiaddr) { - cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ); - cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK; - } else { - cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV); - } + cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV); cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH); cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB); break; @@ -983,21 +978,10 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct arm_smmu_device *smmu, u32 prod) { - struct arm_smmu_queue *q = &smmu->cmdq.q; struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC, }; - /* - * Beware that Hi16xx adds an extra 32 bits of goodness to its MSI - * payload, so the write will zero the entire command on that platform. - */ - if (smmu->features & ARM_SMMU_FEAT_MSI && - smmu->features & ARM_SMMU_FEAT_COHERENCY) { - ent.sync.msiaddr = q->base_dma + Q_IDX(&q->llq, prod) * - q->ent_dwords * 8; - } - arm_smmu_cmdq_build_cmd(cmd, &ent); } @@ -1251,30 +1235,6 @@ static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu, return ret; } -/* - * Wait until the SMMU signals a CMD_SYNC completion MSI. - * Must be called with the cmdq lock held in some capacity. - */ -static int __arm_smmu_cmdq_poll_until_msi(struct arm_smmu_device *smmu, - struct arm_smmu_ll_queue *llq) -{ - int ret = 0; - struct arm_smmu_queue_poll qp; - struct arm_smmu_cmdq *cmdq = &smmu->cmdq; - u32 *cmd = (u32 *)(Q_ENT(&cmdq->q, llq->prod)); - - queue_poll_init(smmu, &qp); - - /* - * The MSI won't generate an event, since it's being written back - * into the command queue. - */ - qp.wfe = false; - smp_cond_load_relaxed(cmd, !VAL || (ret = queue_poll(&qp))); - llq->cons = ret ? llq->prod : queue_inc_prod_n(llq, 1); - return ret; -} - /* * Wait until the SMMU cons index passes llq->prod. * Must be called with the cmdq lock held in some capacity. @@ -1332,10 +1292,6 @@ static int __arm_smmu_cmdq_poll_until_consumed(struct arm_smmu_device *smmu, static int arm_smmu_cmdq_poll_until_sync(struct arm_smmu_device *smmu, struct arm_smmu_ll_queue *llq) { - if (smmu->features & ARM_SMMU_FEAT_MSI && - smmu->features & ARM_SMMU_FEAT_COHERENCY) - return __arm_smmu_cmdq_poll_until_msi(smmu, llq); - return __arm_smmu_cmdq_poll_until_consumed(smmu, llq); }