From patchwork Thu Jan 6 16:19:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Laight X-Patchwork-Id: 12705536 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A16BFC433EF for ; Thu, 6 Jan 2022 16:21:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241194AbiAFQVM (ORCPT ); Thu, 6 Jan 2022 11:21:12 -0500 Received: from eu-smtp-delivery-151.mimecast.com ([185.58.85.151]:44412 "EHLO eu-smtp-delivery-151.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241162AbiAFQVF (ORCPT ); Thu, 6 Jan 2022 11:21:05 -0500 Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-46-2FcQvvipNPu80R9wmBi3Mg-2; Thu, 06 Jan 2022 16:20:05 +0000 X-MC-Unique: 2FcQvvipNPu80R9wmBi3Mg-2 Received: from AcuMS.Aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) by AcuMS.aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) with Microsoft SMTP Server (TLS) id 15.0.1497.26; Thu, 6 Jan 2022 16:19:54 +0000 Received: from AcuMS.Aculab.com ([fe80::994c:f5c2:35d6:9b65]) by AcuMS.aculab.com ([fe80::994c:f5c2:35d6:9b65%12]) with mapi id 15.00.1497.026; Thu, 6 Jan 2022 16:19:54 +0000 From: David Laight To: 'Eric Dumazet' , 'Peter Zijlstra' CC: "'tglx@linutronix.de'" , "'mingo@redhat.com'" , 'Borislav Petkov' , "'dave.hansen@linux.intel.com'" , 'X86 ML' , "'hpa@zytor.com'" , "'alexanderduyck@fb.com'" , 'open list' , 'netdev' , "'Noah Goldstein'" Subject: [PATCH ] x86/lib: Optimise copy loop for long buffers in csum-partial_64.c Thread-Topic: [PATCH ] x86/lib: Optimise copy loop for long buffers in csum-partial_64.c Thread-Index: AdgDF7MJrv4d4sxmRYSm5doCrHN7tQ== Date: Thu, 6 Jan 2022 16:19:54 +0000 Message-ID: <04c41a96f4eb4fe782d10ae2691ad93e@AcuMS.aculab.com> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=C51A453 smtp.mailfrom=david.laight@aculab.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org gcc converts the loop into one that only increments the pointer but makes a mess of calculating the limit and gcc 9.1+ completely refuses to use the final value of 'buff' from the last iteration. Explicitly code a pointer comparison and don't bother changing len. Signed-off-by: David Laight --- The asm("" : "+r" (buff)); forces gcc to use the loop-updated value of 'buff' and removes at least 6 instructions. The gcc folk really ought to look at why gcc 9.1 onwards is so much worse that gcc 8. See https://godbolt.org/z/T39PcnvfE arch/x86/lib/csum-partial_64.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c index edd3e579c2a7..342de5f24fcb 100644 --- a/arch/x86/lib/csum-partial_64.c +++ b/arch/x86/lib/csum-partial_64.c @@ -27,21 +27,24 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) u64 temp64 = (__force u64)sum; unsigned result; - while (unlikely(len >= 64)) { - asm("addq 0*8(%[src]),%[res]\n\t" - "adcq 1*8(%[src]),%[res]\n\t" - "adcq 2*8(%[src]),%[res]\n\t" - "adcq 3*8(%[src]),%[res]\n\t" - "adcq 4*8(%[src]),%[res]\n\t" - "adcq 5*8(%[src]),%[res]\n\t" - "adcq 6*8(%[src]),%[res]\n\t" - "adcq 7*8(%[src]),%[res]\n\t" - "adcq $0,%[res]" - : [res] "+r" (temp64) - : [src] "r" (buff) - : "memory"); - buff += 64; - len -= 64; + if (unlikely(len >= 64)) { + const void *lim = buff + (len & ~63u); + do { + asm("addq 0*8(%[src]),%[res]\n\t" + "adcq 1*8(%[src]),%[res]\n\t" + "adcq 2*8(%[src]),%[res]\n\t" + "adcq 3*8(%[src]),%[res]\n\t" + "adcq 4*8(%[src]),%[res]\n\t" + "adcq 5*8(%[src]),%[res]\n\t" + "adcq 6*8(%[src]),%[res]\n\t" + "adcq 7*8(%[src]),%[res]\n\t" + "adcq $0,%[res]" + : [res] "+r" (temp64) + : [src] "r" (buff) + : "memory"); + asm("" : "+r" (buff)); + buff += 64; + } while (buff < lim); } if (len & 32) {