From patchwork Mon Jul 15 14:39:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 11044379 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 54FDA6C5 for ; Mon, 15 Jul 2019 14:41:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D4AB21BED for ; Mon, 15 Jul 2019 14:41:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 281EB28437; Mon, 15 Jul 2019 14:41:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5AD4E21BED for ; Mon, 15 Jul 2019 14:41:35 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hn28T-00052U-T5; Mon, 15 Jul 2019 14:39:25 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hn28T-00052P-HM for xen-devel@lists.xenproject.org; Mon, 15 Jul 2019 14:39:25 +0000 X-Inumbo-ID: 546fa21c-a70e-11e9-96c8-3f9c1e7f5b12 Received: from m9a0002g.houston.softwaregrp.com (unknown [15.124.64.67]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 546fa21c-a70e-11e9-96c8-3f9c1e7f5b12; Mon, 15 Jul 2019 14:39:21 +0000 (UTC) Received: FROM m9a0002g.houston.softwaregrp.com (15.121.0.191) BY m9a0002g.houston.softwaregrp.com WITH ESMTP; Mon, 15 Jul 2019 14:39:18 +0000 Received: from M9W0068.microfocus.com (2002:f79:bf::f79:bf) by M9W0068.microfocus.com (2002:f79:bf::f79:bf) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10; Mon, 15 Jul 2019 14:39:06 +0000 Received: from NAM02-SN1-obe.outbound.protection.outlook.com (15.124.72.11) by M9W0068.microfocus.com (15.121.0.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10 via Frontend Transport; Mon, 15 Jul 2019 14:39:06 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=G4nQWTLhUi0IAC7Z+t+H8l6nF6xn9xEl7voijPGA/fBOv7urpTMK5Go/OuG9LvUaLEfc+1F1rfnK0gBDae4OLrI+vXpIDtKWoqq1C+JCXBVjx/uary5lUWSW6v4cK3RIYFDqZSWdGrM8NIkHffpPVlcI5tAxC1el8hen0/TmVx9LkCdL+ARhV2ZeMUz3GC7Ga6+33apgoRffCqf0yDa/obYaseyXzxX4ZQH/0Qr93/QBaO88v02G63q8YILkGZHwcmKwUgCkN9ymRwDrPC1Z5/TqeIMgzKzGxWRYm2kI/EiRaTlGmbKF+MnkltMcKsWVxzwok1+cx84OMkRCMQgWbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=RnE8u52RhQY1R84cMDOlqjRk3JwDdueY6Yc8S+gZ6lU=; b=dCwzaIRh7gwzmeA3cep2F5dJMdHjWRiq2vYLPrBDPhjWbxhBNvYfZVZ06gA20CKxOywS55/eRMTbeNoYopISBRZg9vdAWiL7oG0OUniiM38rpexdI6RQCEvrWXQTiP8tcgUNiM0Irolr5OqZ90hlwG0isPRrHgoLMf6VKwIInffwzb7WODLQbz75C0MlGhjqoebRxh/Xf8WN/XqtBl5cVQ/R/9BZfQDv5HqXBwFeUI0mS3+UsksJmxnwKjF+CbEVzyuxohGEijG3vIWRRIW0Y2uWrl9imLCk7Ynu+f+4DkT1fyLT095HWCwwHcM0G4e9C/4pUSm1MbmzZh3uSXzh/w== ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=suse.com;dmarc=pass action=none header.from=suse.com;dkim=pass header.d=suse.com;arc=none Received: from DM6PR18MB3401.namprd18.prod.outlook.com (10.255.174.218) by DM6PR18MB3194.namprd18.prod.outlook.com (10.255.172.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2073.14; Mon, 15 Jul 2019 14:39:05 +0000 Received: from DM6PR18MB3401.namprd18.prod.outlook.com ([fe80::1fe:35f6:faf3:78c7]) by DM6PR18MB3401.namprd18.prod.outlook.com ([fe80::1fe:35f6:faf3:78c7%7]) with mapi id 15.20.2073.012; Mon, 15 Jul 2019 14:39:05 +0000 From: Jan Beulich To: "xen-devel@lists.xenproject.org" Thread-Topic: [PATCH v2] x86: use POPCNT for hweight() when available Thread-Index: AQHVOxsNQrnQhBRRC06Mrvn1khoA7Q== Date: Mon, 15 Jul 2019 14:39:04 +0000 Message-ID: <55a4a24d-7fac-527c-6bcf-8d689136bac2@suse.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-clientproxiedby: DB6PR07CA0191.eurprd07.prod.outlook.com (2603:10a6:6:42::21) To DM6PR18MB3401.namprd18.prod.outlook.com (2603:10b6:5:1cc::26) authentication-results: spf=none (sender IP is ) smtp.mailfrom=JBeulich@suse.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [87.234.252.170] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: e8ab4829-698a-41b0-2d77-08d709322f8e x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(49563074)(7193020); SRVR:DM6PR18MB3194; x-ms-traffictypediagnostic: DM6PR18MB3194: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:1468; x-forefront-prvs: 00997889E7 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(4636009)(376002)(366004)(136003)(39860400002)(396003)(346002)(199004)(189003)(478600001)(26005)(8676002)(31696002)(256004)(80792005)(2616005)(53936002)(81156014)(476003)(14454004)(5660300002)(4326008)(2501003)(2351001)(6506007)(386003)(68736007)(5640700003)(66066001)(86362001)(7736002)(6512007)(99286004)(186003)(31686004)(66446008)(66556008)(64756008)(99936001)(8936002)(66476007)(71190400001)(71200400001)(66616009)(66946007)(3846002)(6116002)(25786009)(102836004)(6486002)(81166006)(2906002)(6436002)(316002)(6916009)(36756003)(52116002)(54906003)(305945005)(486006)(16393002); DIR:OUT; SFP:1102; SCL:1; SRVR:DM6PR18MB3194; H:DM6PR18MB3401.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: suse.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: zUZQMD27kCruw26RcD+8YitoIR/1nNGQZjAod47v6ezc042eRAsfLtAwATlWm3MBg+HPpsy61LDDessXxm5WX6+s/QjnSkNWWxnVfqSOqnR1/spqVS0/IEhP2oRZegRKl+DKh5ac+073Z5vP33SZ8zv0s6XsHRbq7/LuLmck89/7CvR0X2jB2MPBH/LBW3EP6paKAkRJMYB61d0VdlrhVYTOBeJ7k7KITchfP5NepsSiaN78fbVoDQqlXmcN6sJkoouk6+0Wd6IPCZfm5VoU58JKsNOpEQcrTeclRxkq6QnpqhJQureyOo8u9DQDBHcZI0rrosJu1bv048ou+lqQXb77mFvyJHUCbC8Ky7/l9dLEe1ni4jA7zGwsEqaH+N4UhEok8c63BBlVOSOANEBNJ/Ag0cZM/m8fcqtWEy8HaUk= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: e8ab4829-698a-41b0-2d77-08d709322f8e X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Jul 2019 14:39:04.8705 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 856b813c-16e5-49a5-85ec-6f081e13b527 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: JBeulich@suse.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR18MB3194 X-OriginatorOrg: suse.com Subject: [Xen-devel] [PATCH v2] x86: use POPCNT for hweight() when available X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Andrew Cooper , Wei Liu , =?utf-8?q?Roger_Pau_Monn=C3=A9?= Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP This is faster than using the software implementation, and the insn is available on all half-way recent hardware. Therefore convert generic_hweight() to out-of-line functions (without affecting Arm) and use alternatives patching to replace the function calls. Note that the approach doesn#t work for clang, due to it not recognizing -ffixed-*. Suggested-by: Andrew Cooper Signed-off-by: Jan Beulich --- v2: Also suppress UB sanitizer instrumentation. Reduce macroization in hweight.c. Exclude clang builds. --- Note: Using "g" instead of "X" as the dummy constraint in hweight64() and hweight32(), other than expected, produces slightly better code with gcc 8. x86: use POPCNT for hweight() when available This is faster than using the software implementation, and the insn is available on all half-way recent hardware. Therefore convert generic_hweight() to out-of-line functions (without affecting Arm) and use alternatives patching to replace the function calls. Note that the approach doesn#t work for clang, due to it not recognizing -ffixed-*. Suggested-by: Andrew Cooper Signed-off-by: Jan Beulich --- v2: Also suppress UB sanitizer instrumentation. Reduce macroization in hweight.c. Exclude clang builds. --- Note: Using "g" instead of "X" as the dummy constraint in hweight64() and hweight32(), other than expected, produces slightly better code with gcc 8. --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -31,6 +31,10 @@ obj-y += emul-i8254.o obj-y += extable.o obj-y += flushtlb.o obj-$(CONFIG_CRASH_DEBUG) += gdbstub.o +# clang doesn't appear to know of -ffixed-* +hweight-$(gcc) := hweight.o +hweight-$(clang) := +obj-y += $(hweight-y) obj-y += hypercall.o obj-y += i387.o obj-y += i8259.o @@ -251,6 +255,10 @@ boot/mkelf32: boot/mkelf32.c efi/mkreloc: efi/mkreloc.c $(HOSTCC) $(HOSTCFLAGS) -g -o $@ $< +nocov-y += hweight.o +noubsan-y += hweight.o +hweight.o: CFLAGS += $(foreach reg,cx dx si 8 9 10 11,-ffixed-r$(reg)) + .PHONY: clean clean:: rm -f asm-offsets.s *.lds boot/*.o boot/*~ boot/core boot/mkelf32 --- /dev/null +++ b/xen/arch/x86/hweight.c @@ -0,0 +1,21 @@ +#define generic_hweight64 _hweight64 +#define generic_hweight32 _hweight32 +#define generic_hweight16 _hweight16 +#define generic_hweight8 _hweight8 + +#include + +#undef inline +#define inline always_inline + +#include + +#undef generic_hweight8 +#undef generic_hweight16 +#undef generic_hweight32 +#undef generic_hweight64 + +unsigned int generic_hweight8 (unsigned int x) { return _hweight8 (x); } +unsigned int generic_hweight16(unsigned int x) { return _hweight16(x); } +unsigned int generic_hweight32(unsigned int x) { return _hweight32(x); } +unsigned int generic_hweight64(uint64_t x) { return _hweight64(x); } --- a/xen/include/asm-x86/bitops.h +++ b/xen/include/asm-x86/bitops.h @@ -475,9 +475,36 @@ static inline int fls(unsigned int x) * * The Hamming Weight of a number is the total number of bits set in it. */ +#ifndef __clang__ +/* POPCNT encodings with %{r,e}di input and %{r,e}ax output: */ +#define POPCNT_64 ".byte 0xF3, 0x48, 0x0F, 0xB8, 0xC7" +#define POPCNT_32 ".byte 0xF3, 0x0F, 0xB8, 0xC7" + +#define hweight_(n, x, insn, setup, cout, cin) ({ \ + unsigned int res_; \ + /* \ + * For the function call the POPCNT input register needs to be marked \ + * modified as well. Set up a local variable of appropriate type \ + * for this purpose. \ + */ \ + typeof((uint##n##_t)(x) + 0U) val_ = (x); \ + alternative_io(setup "; call generic_hweight" #n, \ + insn, X86_FEATURE_POPCNT, \ + ASM_OUTPUT2([res] "=a" (res_), [val] cout (val_)), \ + [src] cin (val_)); \ + res_; \ +}) +#define hweight64(x) hweight_(64, x, POPCNT_64, "", "+D", "g") +#define hweight32(x) hweight_(32, x, POPCNT_32, "", "+D", "g") +#define hweight16(x) hweight_(16, x, "movzwl %w[src], %[val]; " POPCNT_32, \ + "mov %[src], %[val]", "=&D", "rm") +#define hweight8(x) hweight_( 8, x, "movzbl %b[src], %[val]; " POPCNT_32, \ + "mov %[src], %[val]", "=&D", "rm") +#else #define hweight64(x) generic_hweight64(x) #define hweight32(x) generic_hweight32(x) #define hweight16(x) generic_hweight16(x) #define hweight8(x) generic_hweight8(x) +#endif #endif /* _X86_BITOPS_H */ --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -31,6 +31,10 @@ obj-y += emul-i8254.o obj-y += extable.o obj-y += flushtlb.o obj-$(CONFIG_CRASH_DEBUG) += gdbstub.o +# clang doesn't appear to know of -ffixed-* +hweight-$(gcc) := hweight.o +hweight-$(clang) := +obj-y += $(hweight-y) obj-y += hypercall.o obj-y += i387.o obj-y += i8259.o @@ -251,6 +255,10 @@ boot/mkelf32: boot/mkelf32.c efi/mkreloc: efi/mkreloc.c $(HOSTCC) $(HOSTCFLAGS) -g -o $@ $< +nocov-y += hweight.o +noubsan-y += hweight.o +hweight.o: CFLAGS += $(foreach reg,cx dx si 8 9 10 11,-ffixed-r$(reg)) + .PHONY: clean clean:: rm -f asm-offsets.s *.lds boot/*.o boot/*~ boot/core boot/mkelf32 --- /dev/null +++ b/xen/arch/x86/hweight.c @@ -0,0 +1,21 @@ +#define generic_hweight64 _hweight64 +#define generic_hweight32 _hweight32 +#define generic_hweight16 _hweight16 +#define generic_hweight8 _hweight8 + +#include + +#undef inline +#define inline always_inline + +#include + +#undef generic_hweight8 +#undef generic_hweight16 +#undef generic_hweight32 +#undef generic_hweight64 + +unsigned int generic_hweight8 (unsigned int x) { return _hweight8 (x); } +unsigned int generic_hweight16(unsigned int x) { return _hweight16(x); } +unsigned int generic_hweight32(unsigned int x) { return _hweight32(x); } +unsigned int generic_hweight64(uint64_t x) { return _hweight64(x); } --- a/xen/include/asm-x86/bitops.h +++ b/xen/include/asm-x86/bitops.h @@ -475,9 +475,36 @@ static inline int fls(unsigned int x) * * The Hamming Weight of a number is the total number of bits set in it. */ +#ifndef __clang__ +/* POPCNT encodings with %{r,e}di input and %{r,e}ax output: */ +#define POPCNT_64 ".byte 0xF3, 0x48, 0x0F, 0xB8, 0xC7" +#define POPCNT_32 ".byte 0xF3, 0x0F, 0xB8, 0xC7" + +#define hweight_(n, x, insn, setup, cout, cin) ({ \ + unsigned int res_; \ + /* \ + * For the function call the POPCNT input register needs to be marked \ + * modified as well. Set up a local variable of appropriate type \ + * for this purpose. \ + */ \ + typeof((uint##n##_t)(x) + 0U) val_ = (x); \ + alternative_io(setup "; call generic_hweight" #n, \ + insn, X86_FEATURE_POPCNT, \ + ASM_OUTPUT2([res] "=a" (res_), [val] cout (val_)), \ + [src] cin (val_)); \ + res_; \ +}) +#define hweight64(x) hweight_(64, x, POPCNT_64, "", "+D", "g") +#define hweight32(x) hweight_(32, x, POPCNT_32, "", "+D", "g") +#define hweight16(x) hweight_(16, x, "movzwl %w[src], %[val]; " POPCNT_32, \ + "mov %[src], %[val]", "=&D", "rm") +#define hweight8(x) hweight_( 8, x, "movzbl %b[src], %[val]; " POPCNT_32, \ + "mov %[src], %[val]", "=&D", "rm") +#else #define hweight64(x) generic_hweight64(x) #define hweight32(x) generic_hweight32(x) #define hweight16(x) generic_hweight16(x) #define hweight8(x) generic_hweight8(x) +#endif #endif /* _X86_BITOPS_H */