From patchwork Thu Sep 19 12:17:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Liao, Chang" X-Patchwork-Id: 13807686 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AEACECAC5B5 for ; Thu, 19 Sep 2024 12:30:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:Message-ID:Date:Subject:CC:To:From:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=y5KqLd59s/5+85x555DvId3qLFXDCR3AFYjeCe11/9Y=; b=rNjISYjLZWrlOs+Pk3Nc0UfYEW JKBHmLgC2azdcmu92EAOMa5MGnvCQLVlITdwKygLyvK2hjegyfgDbEbbybTWFKHWMwN3/GPa2t2b+ VY4sSrlGi/1fYTrAvKyNImwsyy0qPJbWj++zqyVgBPxR82Gmboy1KQIjHeePq6TbMfxXtz1CUkeNr 0DgEh+Zblzj2OgOk8idoa66q8pdP/KlCUL7bDW3IZXE1o+1aFt8SkJZdH5vx5vYbUjokY7RePOgWv iqvTxENO6DcltsJ6i4z3vNuy0eMgbOSmlNuEqvW1UD1zLkQmESpnCCPifAHefKsu076kSJ4woWA9X /uBdm2Uw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1srGIT-0000000AEue-2wDd; Thu, 19 Sep 2024 12:30:09 +0000 Received: from szxga03-in.huawei.com ([45.249.212.189]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1srGG4-0000000AEWz-48Ka for linux-arm-kernel@lists.infradead.org; Thu, 19 Sep 2024 12:27:42 +0000 Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4X8ZXM4Ww2zFqnX; Thu, 19 Sep 2024 20:27:15 +0800 (CST) Received: from kwepemd200013.china.huawei.com (unknown [7.221.188.133]) by mail.maildlp.com (Postfix) with ESMTPS id A132A18010F; Thu, 19 Sep 2024 20:27:30 +0800 (CST) Received: from huawei.com (10.67.174.28) by kwepemd200013.china.huawei.com (7.221.188.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.34; Thu, 19 Sep 2024 20:27:30 +0800 From: Liao Chang To: , , , , , CC: , , Subject: [PATCH] arm64: uprobes: Optimize cache flushes for xol slot Date: Thu, 19 Sep 2024 12:17:19 +0000 Message-ID: <20240919121719.2148361-1-liaochang1@huawei.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Originating-IP: [10.67.174.28] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemd200013.china.huawei.com (7.221.188.133) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240919_052741_309714_8148CD43 X-CRM114-Status: GOOD ( 11.42 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The profiling of single-thread selftests bench reveals a bottlenect in caches_clean_inval_pou() on ARM64. On my local testing machine, this function takes approximately 34% of CPU cycles for trig-uprobe-nop and trig-uprobe-push. This patch add a check to avoid unnecessary cache flush when writing instruction to the xol slot. If the instruction is same with the existing instruction in slot, there is no need to synchronize D/I cache. Since xol slot allocation and updates occur on the hot path of uprobe handling, The upstream kernel running on Kunpeng916 (Hi1616), 4 NUMA nodes, 64 cores@ 2.4GHz reveals this optimization has obvious gain for nop and push testcases. Before (next-20240918) ---------------------- uprobe-nop ( 1 cpus): 0.418 ± 0.001M/s ( 0.418M/s/cpu) uprobe-push ( 1 cpus): 0.411 ± 0.005M/s ( 0.411M/s/cpu) uprobe-ret ( 1 cpus): 2.052 ± 0.002M/s ( 2.052M/s/cpu) uretprobe-nop ( 1 cpus): 0.350 ± 0.000M/s ( 0.350M/s/cpu) uretprobe-push ( 1 cpus): 0.353 ± 0.000M/s ( 0.353M/s/cpu) uretprobe-ret ( 1 cpus): 1.074 ± 0.001M/s ( 1.074M/s/cpu) After ----- uprobe-nop ( 1 cpus): 0.926 ± 0.000M/s ( 0.926M/s/cpu) uprobe-push ( 1 cpus): 0.910 ± 0.001M/s ( 0.910M/s/cpu) uprobe-ret ( 1 cpus): 2.056 ± 0.001M/s ( 2.056M/s/cpu) uretprobe-nop ( 1 cpus): 0.653 ± 0.001M/s ( 0.653M/s/cpu) uretprobe-push ( 1 cpus): 0.645 ± 0.000M/s ( 0.645M/s/cpu) uretprobe-ret ( 1 cpus): 1.093 ± 0.001M/s ( 1.093M/s/cpu) Signed-off-by: Liao Chang --- arch/arm64/kernel/probes/uprobes.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/arm64/kernel/probes/uprobes.c b/arch/arm64/kernel/probes/uprobes.c index d49aef2657cd..5ee27509d6f6 100644 --- a/arch/arm64/kernel/probes/uprobes.c +++ b/arch/arm64/kernel/probes/uprobes.c @@ -17,12 +17,16 @@ void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr, void *xol_page_kaddr = kmap_atomic(page); void *dst = xol_page_kaddr + (vaddr & ~PAGE_MASK); + if (!memcmp(dst, src, len)) + goto done; + /* Initialize the slot */ memcpy(dst, src, len); /* flush caches (dcache/icache) */ sync_icache_aliases((unsigned long)dst, (unsigned long)dst + len); +done: kunmap_atomic(xol_page_kaddr); }