From patchwork Mon Jun 17 19:49:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Cooper X-Patchwork-Id: 11000357 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4C51713AF for ; Mon, 17 Jun 2019 19:52:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 408DF288E1 for ; Mon, 17 Jun 2019 19:52:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 32AEE289F1; Mon, 17 Jun 2019 19:52:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B2CC728856 for ; Mon, 17 Jun 2019 19:52:07 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hcxdq-0006hZ-0K; Mon, 17 Jun 2019 19:50:10 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hcxdp-0006hU-E0 for xen-devel@lists.xenproject.org; Mon, 17 Jun 2019 19:50:09 +0000 X-Inumbo-ID: 1bddde7e-9139-11e9-8980-bc764e045a96 Received: from esa6.hc3370-68.iphmx.com (unknown [216.71.155.175]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 1bddde7e-9139-11e9-8980-bc764e045a96; Mon, 17 Jun 2019 19:50:08 +0000 (UTC) Authentication-Results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=andrew.cooper3@citrix.com; spf=Pass smtp.mailfrom=Andrew.Cooper3@citrix.com; spf=None smtp.helo=postmaster@mail.citrix.com Received-SPF: None (esa6.hc3370-68.iphmx.com: no sender authenticity information available from domain of andrew.cooper3@citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa6.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="andrew.cooper3@citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa6.hc3370-68.iphmx.com: domain of Andrew.Cooper3@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa6.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="Andrew.Cooper3@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ~all" Received-SPF: None (esa6.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa6.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: Cbeve3Z9Qkc/lTSLFyhM/411DHzsSSv2ZGtwS8uZCH6kjKcErycj+nWnOfsRY409pLPwo0OTmj 5rev5iYJ6DFansbKO1hsBMDxD+UDUNepCiTpV6e2XV0nehSoawKnog1onhYzcEv7YTEbq2docU 9lX/IiPVQe7ZC/2yrkP+Uh361u08onAEim3H2x/snh/iYln54tvzdohUG9M8RQ43ebohXbBIJ9 DJ9BZU5XjJeGCsrEues/Zqq8ATweXcYcWfXhE1oWqNRuAh2v57XkrVrJ6aXT6KZa0QPsD/X725 rLY= X-SBRS: 2.7 X-MesageID: 1836386 X-Ironport-Server: esa6.hc3370-68.iphmx.com X-Remote-IP: 162.221.158.21 X-Policy: $RELAYED X-IronPort-AV: E=Sophos;i="5.63,386,1557201600"; d="scan'208";a="1836386" From: Andrew Cooper To: Xen-devel Date: Mon, 17 Jun 2019 20:49:59 +0100 Message-ID: <1560800999-11592-1-git-send-email-andrew.cooper3@citrix.com> X-Mailer: git-send-email 2.1.4 MIME-Version: 1.0 Subject: [Xen-devel] [PATCH] x86/clear_page: Update clear_page_sse2() after dropping 32bit Xen X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Andrew Cooper , =?utf-8?b?RWR3aW4gVMO2csO2?= =?utf-8?b?aw==?= , Wei Liu , Jan Beulich , =?utf-8?q?Roger_Pau_Monn=C3=A9?= Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP This code was never updated when the 32bit build of Xen was dropped. * Expand the now-redundant ptr_reg macro. * The number of iterations in the loop can be halfed by using 64bit writes, without consuming any extra execution resource in the pipeline. Adjust all numbers/offsets appropriately. * Replace dec with sub to avoid a eflags stall, and position it to be macro-fused with the related jmp. * With no need to preserve eflags across the body of the loop, replace lea with add which has 1/3'rd the latency on basically all 64bit hardware. A quick userspace perf test on my Haswell dev box indicates that the old version takes ~1385 cycles on average (ignoring outliers), and the new version takes ~1060 cyles, or about 77% of the time. Reported-by: Edwin Török Signed-off-by: Andrew Cooper Reviewed-by: Jan Beulich --- CC: Jan Beulich CC: Wei Liu CC: Roger Pau Monné CC: Edwin Török There is almost certainly better room for improvement, especially now that we have alternatives, but this is substantial improvement which is very safe for backport. --- xen/arch/x86/clear_page.S | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/clear_page.S b/xen/arch/x86/clear_page.S index 243a767..0817610 100644 --- a/xen/arch/x86/clear_page.S +++ b/xen/arch/x86/clear_page.S @@ -2,18 +2,16 @@ #include -#define ptr_reg %rdi - ENTRY(clear_page_sse2) - mov $PAGE_SIZE/16, %ecx + mov $PAGE_SIZE/32, %ecx xor %eax,%eax -0: dec %ecx - movnti %eax, (ptr_reg) - movnti %eax, 4(ptr_reg) - movnti %eax, 8(ptr_reg) - movnti %eax, 12(ptr_reg) - lea 16(ptr_reg), ptr_reg +0: movnti %rax, 0(%rdi) + movnti %rax, 8(%rdi) + movnti %rax, 16(%rdi) + movnti %rax, 24(%rdi) + add $32, %rdi + sub $1, %ecx jnz 0b sfence