From patchwork Thu Feb 15 08:14:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 13557621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 02807C4829E for ; Thu, 15 Feb 2024 08:16:47 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu9-0007gs-7B; Thu, 15 Feb 2024 03:15:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtU-00071U-B7 for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:57 -0500 Received: from mail-pj1-x1036.google.com ([2607:f8b0:4864:20::1036]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtS-0001N9-QO for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:56 -0500 Received: by mail-pj1-x1036.google.com with SMTP id 98e67ed59e1d1-295c8b795e2so460106a91.0 for ; Thu, 15 Feb 2024 00:14:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984893; x=1708589693; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=awE855qcctlZwnzAHPxC0TDCZW+3k7eu29biOnePABpynBxx8W4+IkEmcfczHkP0/f 88UWPFms5GZbajhPIWwi+2anMUEPCgW+j3kjrBun6DebIE3/PHKHDX7Z0UmqpdnPtB2D LS8AB1ysQFKNp/AHEcxQDr1+pRRsY1g36jzsi5ImhAiQHGdUOHVyq2n9A6hRMDbTYSMr q//6QQMHkB2riWQES0YOLEztexu8ewEb+OY8Q4dBx9zIZzewX5lv0/nIWLWCbYSABis/ 4t9r0gtqfIVQ+DkhhDKN3Guj5IP2TFj+SdxlQEQNhqdH5RViX53C2ZtHj2qDG+ecLpfg Vcdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984893; x=1708589693; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=hFqxsgEpKDueKx7FseqHYOi7l/M54ormWoR/qK9i0meA+pv69yESnfZCmtqDZG+bjA v4mnV2YFThISK2JNXrdFi8LraGqAATHgd37KIfLIFhmzGoncrOf3GPBVM16BLKYXx5De mazMCsV+e3PBZjF4IDhQBLhBYMC90LKbTpkuW+qkg2YDJY3771DyUJXtKYdMhXnkWsyn qlCjl1X9hkM2O/z8ZkqR5kb4/ELgKZr2MKwKNzQ8jArOTrcxoRRvcBvbye2iU20x+jvg 8qYYC1rwg3kKyFHSPWeDfVozD8hEEzks72hrZTKxzx49K7POe8tc4tQmVEmLzMoneFqj F1sQ== X-Gm-Message-State: AOJu0YxUbcvMvTwXLJRWzs2beUOiEblbDmXxdqDWdmR/XeYQLdShaMuM hp24M9B3JrW3JMMVPQKoJ2APZ1NiYgwAOZYCxb06x2NqCWBkY7FmPEHtkecvnQKG2SodH5K2aDX 0 X-Google-Smtp-Source: AGHT+IEjzEpF95iRu04ODMPmpwiAGhUUMEgRkWSdaJ1h07KcMKHTxihRPUyIoG/pSXtgxIoOYveB0w== X-Received: by 2002:a17:90b:4a02:b0:299:17a7:c443 with SMTP id kk2-20020a17090b4a0200b0029917a7c443mr274265pjb.32.1707984893328; Thu, 15 Feb 2024 00:14:53 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:52 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 01/10] util/bufferiszero: Remove SSE4.1 variant Date: Wed, 14 Feb 2024 22:14:40 -1000 Message-Id: <20240215081449.848220-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1036; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1036.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Alexander Monakov The SSE4.1 variant is virtually identical to the SSE2 variant, except for using 'PTEST+JNZ' in place of 'PCMPEQB+PMOVMSKB+CMP+JNE' for testing if an SSE register is all zeroes. The PTEST instruction decodes to two uops, so it can be handled only by the complex decoder, and since CMP+JNE are macro-fused, both sequences decode to three uops. The uops comprising the PTEST instruction dispatch to p0 and p5 on Intel CPUs, so PCMPEQB+PMOVMSKB is comparatively more flexible from dispatch standpoint. Hence, the use of PTEST brings no benefit from throughput standpoint. Its latency is not important, since it feeds only a conditional jump, which terminates the dependency chain. I never observed PTEST variants to be faster on real hardware. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-2-amonakov@ispras.ru> --- util/bufferiszero.c | 29 ----------------------------- 1 file changed, 29 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 3e6a5dfd63..f5a3634f9a 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -100,34 +100,6 @@ buffer_zero_sse2(const void *buf, size_t len) } #ifdef CONFIG_AVX2_OPT -static bool __attribute__((target("sse4"))) -buffer_zero_sse4(const void *buf, size_t len) -{ - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - __builtin_prefetch(p); - if (unlikely(!_mm_testz_si128(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_testz_si128(t, t); -} - static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { @@ -221,7 +193,6 @@ select_accel_cpuinfo(unsigned info) #endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, - { CPUINFO_SSE4, 64, buffer_zero_sse4 }, #endif { CPUINFO_SSE2, 64, buffer_zero_sse2 }, { CPUINFO_ALWAYS, 0, buffer_zero_int }, From patchwork Thu Feb 15 08:14:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 13557625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6EF61C4829E for ; Thu, 15 Feb 2024 08:17:19 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu8-0007fJ-RD; Thu, 15 Feb 2024 03:15:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtV-00072x-P3 for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:58 -0500 Received: from mail-pj1-x102c.google.com ([2607:f8b0:4864:20::102c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtU-0001NL-2d for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:57 -0500 Received: by mail-pj1-x102c.google.com with SMTP id 98e67ed59e1d1-290ec261a61so409173a91.0 for ; Thu, 15 Feb 2024 00:14:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984894; x=1708589694; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=WwqMPmy6Kr5IbEMgHOJ3D+901EKZpQdAzh2H/J1P2EwG4jdtru6x6hlWi5hiZa+UAZ KW0M2K8zLkYXAhmUcOrnVl5cg4iVHm5JDmE3ATlju4SN1xoAIVxfLe9WdDYgQ3RaxRMs /yDkOIX6uYgd+1iIsaiV6x1UpwbAbjdifyYrhtNkvkxkxWMMZJpVJKfzZuLQWFI4ni1s jTlTM2Fl47M9eaPVudQISbc/w6DovQYuSe6WxQYFYBG6iXs6s0zs/cP1ZKqZzVFfYurp YgSNbB+LPGiwjOwDutRxX8+tFwVGYdnVmrCBbn6zJcu1l+8IfCZwXv7U/RIIkPaVGllJ GLrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984894; x=1708589694; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=B24Prdun6mjGAKNSKOeE9ZjTID6so6IF3thX9XrTBjiv5JDrA89xmowSdNE+zNYYWB nLa2LQOlli6jzl0dSwe3ke96qRCjIUfUyfmyo/+8rMBIoHjthP+QEtPVujltxebfCjsM ZThPMA/hv7Bo6hrlEKTJZDXS2h+4bGwJlCmfC+rPIEJ8oD3xiuZoF5SSl+1WkC4+054C zQYKE+NQUvNlBrUSWnIvw4zeoDRy/SX65CqAC1m40lsZRMn0/QUNpbLxPJhNipFDZhs8 +H7GyM6HkVu6gD1Pkkp9ciqQfeKVJio72ZHOQOdY6eiF7R6mdRR/NyU+jqUwiBbMsFaG E7xw== X-Gm-Message-State: AOJu0YxYdSXfEKnJ9ykRFffNTqtUW1bWsNg7hu8AciIYYFNjRI0CQGxX W4tS8D6LDfrsuPHQyZ5602XaeBMViD+hufAwZ1h10ibIzv7wa6AMin7RJeexW+3S8pbCipWcNNL b X-Google-Smtp-Source: AGHT+IFWlzoGQi8xBe5pHhUEZ8Bm2klpBfKU1YBM5dEXTh/dltyNCile9CFhCEEoEYmWbebD0wqzZQ== X-Received: by 2002:a17:90b:1e01:b0:298:e10b:1776 with SMTP id pg1-20020a17090b1e0100b00298e10b1776mr1136813pjb.8.1707984894651; Thu, 15 Feb 2024 00:14:54 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:54 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 02/10] util/bufferiszero: Remove AVX512 variant Date: Wed, 14 Feb 2024 22:14:41 -1000 Message-Id: <20240215081449.848220-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::102c; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x102c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Alexander Monakov Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD routines are invoked much more rarely in normal use when most buffers are non-zero. This makes use of AVX512 unprofitable, as it incurs extra frequency and voltage transition periods during which the CPU operates at reduced performance, as described in https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html Signed-off-by: Mikhail Romanov Signed-off-by: Alexander Monakov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-4-amonakov@ispras.ru> Signed-off-by: Richard Henderson --- util/bufferiszero.c | 38 +++----------------------------------- 1 file changed, 3 insertions(+), 35 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index f5a3634f9a..641d5f9b9e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -64,7 +64,7 @@ buffer_zero_int(const void *buf, size_t len) } } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || defined(__SSE2__) +#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include /* Note that each of these vectorized functions require len >= 64. */ @@ -128,41 +128,12 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -#ifdef CONFIG_AVX512F_OPT -static bool __attribute__((target("avx512f"))) -buffer_zero_avx512(const void *buf, size_t len) -{ - /* Begin with an unaligned head of 64 bytes. */ - __m512i t = _mm512_loadu_si512(buf); - __m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64); - __m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64); - - /* Loop over 64-byte aligned blocks of 256. */ - while (p <= e) { - __builtin_prefetch(p); - if (unlikely(_mm512_test_epi64_mask(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - t |= _mm512_loadu_si512(buf + len - 4 * 64); - t |= _mm512_loadu_si512(buf + len - 3 * 64); - t |= _mm512_loadu_si512(buf + len - 2 * 64); - t |= _mm512_loadu_si512(buf + len - 1 * 64); - - return !_mm512_test_epi64_mask(t, t); - -} -#endif /* CONFIG_AVX512F_OPT */ - /* * Make sure that these variables are appropriately initialized when * SSE2 is enabled on the compiler command-line, but the compiler is * too old to support CONFIG_AVX2_OPT. */ -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) # define INIT_USED 0 # define INIT_LENGTH 0 # define INIT_ACCEL buffer_zero_int @@ -188,9 +159,6 @@ select_accel_cpuinfo(unsigned info) unsigned len; bool (*fn)(const void *, size_t); } all[] = { -#ifdef CONFIG_AVX512F_OPT - { CPUINFO_AVX512F, 256, buffer_zero_avx512 }, -#endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, #endif @@ -208,7 +176,7 @@ select_accel_cpuinfo(unsigned info) return 0; } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); From patchwork Thu Feb 15 08:14:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 13557620 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 325C4C48BEF for ; Thu, 15 Feb 2024 08:16:35 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu9-0007iw-U3; Thu, 15 Feb 2024 03:15:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtX-00076g-9n for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:59 -0500 Received: from mail-pg1-x52d.google.com ([2607:f8b0:4864:20::52d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtV-0001Ne-CU for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:59 -0500 Received: by mail-pg1-x52d.google.com with SMTP id 41be03b00d2f7-53fbf2c42bfso488195a12.3 for ; Thu, 15 Feb 2024 00:14:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984896; x=1708589696; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dXCRLUcCrBeHTOJN8WRLVbCeiDdcG1rvhb6HzJR+Pi4=; b=fN/YWgu4I2uogIlX4DZDGhuCci0c2r3g076Fp/U8f/yy+1HZnu0S6G7JQ7ZxtkY64v ZGsCpbjyRpFrRW8qH50UatrUfjcVGZvyl0iQgb3uYlfTy9KQ2TuwX5W97NhguO9DT31I 6aI3nm//CMxpOkHH69OWK7CvlziPMOliTX/R7RXaa46K65NofHJzjN5BCAMDIKdaIqVh jPuJn7ALwi2rS5rPdmVvhUjay1tSCZtKPY4Gekg04Rz+c9W1maGWiKXSYObyai5aL7Lw HzMa2Km3rAQhMFsVluMeNCSxi4xFpJYcKh0NpgNt/EjXuCYCXbZbxlWB12kBcvyu0lKl tf4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984896; x=1708589696; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dXCRLUcCrBeHTOJN8WRLVbCeiDdcG1rvhb6HzJR+Pi4=; b=czfzXTCvINMkJ1BJc7gjJD6g3IyPxrjYcpyjdjN4klvhMcq/DMJyoUqeYXrbLFJjbT 7TrjikHw7cmjeS66OSHOq7mr40lC+a0Q9CK6IOiXqM7EEflzs4ZDktQXdnEVPN11HQjk gpEynIVeOYcZsVHn/KRAz5sZ05seFWLz2FoqG8SzPSuSYSCW+1B0fkZssF4tZwXtUM+N MdaufpL2bfHQ3oO36r+7tTf5gEWrBYHhuszEx2ZSqKYAfi7alGi14+1WZ22DJcoVVgGW nNck4zWIgMI7UWtj2meU2DVgEcc9DSRym09U9Awdvt3S3Yj1Qnf5htVx4dVk1u8dCyRW 0LQw== X-Gm-Message-State: AOJu0Yygc7TxkqkDgufih2J1Us3sUqSvvcRY7KpdR7S0VNXI/31pli+M 1jLkTbrdfu0s7CqOXsAJlowgnoeutTi9c7QzuQBOjIKEhIKBHJ/OWEPludIAjHhKNcy5X4jlVdZ h X-Google-Smtp-Source: AGHT+IEImE1fKquE4e4E5yoRFqJy0JY89vxxKXq4FSlmWjSTHoqavMo25BrqPV3uC2I3liaFg1htRg== X-Received: by 2002:a05:6a20:a195:b0:19e:425e:ec56 with SMTP id r21-20020a056a20a19500b0019e425eec56mr980332pzk.24.1707984895996; Thu, 15 Feb 2024 00:14:55 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:55 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 03/10] util/bufferiszero: Reorganize for early test for acceleration Date: Wed, 14 Feb 2024 22:14:42 -1000 Message-Id: <20240215081449.848220-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::52d; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x52d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Alexander Monakov Test for length >= 256 inline, where is is often a constant. Before calling into the accelerated routine, sample three bytes from the buffer, which handles most non-zero buffers. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Message-Id: <20240206204809.9859-3-amonakov@ispras.ru> [rth: Use __builtin_constant_p and perform the sample out-of-line.] Signed-off-by: Richard Henderson --- include/qemu/cutils.h | 15 +++++++- util/bufferiszero.c | 89 ++++++++++++++++++------------------------- 2 files changed, 51 insertions(+), 53 deletions(-) diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h index 92c927a6a3..36f8cfa0e9 100644 --- a/include/qemu/cutils.h +++ b/include/qemu/cutils.h @@ -187,9 +187,22 @@ char *freq_to_str(uint64_t freq_hz); /* used to print char* safely */ #define STR_OR_NULL(str) ((str) ? (str) : "null") -bool buffer_is_zero(const void *buf, size_t len); +/* + * Check if a buffer is all zeroes. + */ + +bool buffer_is_zero_ool(const void *vbuf, size_t len); +bool buffer_is_zero_ge256(const void *vbuf, size_t len); bool test_buffer_is_zero_next_accel(void); +#ifdef __OPTIMIZE__ +#define buffer_is_zero(B, L) \ + (__builtin_constant_p(L) && (size_t)(L) >= 256 \ + ? buffer_is_zero_ge256(B, L) : buffer_is_zero_ool(B, L)) +#else +#define buffer_is_zero buffer_is_zero_ool +#endif + /* * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128) * Input is limited to 14-bit numbers diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 641d5f9b9e..38527f2467 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,8 +26,9 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool -buffer_zero_int(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t); + +static bool buffer_is_zero_integer(const void *buf, size_t len) { if (unlikely(len < 8)) { /* For a very small buffer, simply accumulate all the bytes. */ @@ -128,60 +129,38 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -/* - * Make sure that these variables are appropriately initialized when - * SSE2 is enabled on the compiler command-line, but the compiler is - * too old to support CONFIG_AVX2_OPT. - */ -#if defined(CONFIG_AVX2_OPT) -# define INIT_USED 0 -# define INIT_LENGTH 0 -# define INIT_ACCEL buffer_zero_int -#else -# ifndef __SSE2__ -# error "ISA selection confusion" -# endif -# define INIT_USED CPUINFO_SSE2 -# define INIT_LENGTH 64 -# define INIT_ACCEL buffer_zero_sse2 -#endif - -static unsigned used_accel = INIT_USED; -static unsigned length_to_accel = INIT_LENGTH; -static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; - static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - unsigned len; bool (*fn)(const void *, size_t); } all[] = { #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, 128, buffer_zero_avx2 }, + { CPUINFO_AVX2, buffer_zero_avx2 }, #endif - { CPUINFO_SSE2, 64, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, 0, buffer_zero_int }, + { CPUINFO_SSE2, buffer_zero_sse2 }, + { CPUINFO_ALWAYS, buffer_is_zero_integer }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { if (info & all[i].bit) { - length_to_accel = all[i].len; - buffer_accel = all[i].fn; + buffer_is_zero_accel = all[i].fn; return all[i].bit; } } return 0; } -#if defined(CONFIG_AVX2_OPT) +static unsigned used_accel; + static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); } -#endif /* CONFIG_AVX2_OPT */ + +#define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { @@ -194,36 +173,42 @@ bool test_buffer_is_zero_next_accel(void) used_accel |= used; return used; } - -static bool select_accel_fn(const void *buf, size_t len) -{ - if (likely(len >= length_to_accel)) { - return buffer_accel(buf, len); - } - return buffer_zero_int(buf, len); -} - #else -#define select_accel_fn buffer_zero_int bool test_buffer_is_zero_next_accel(void) { return false; } + +#define INIT_ACCEL buffer_is_zero_integer #endif -/* - * Checks if a buffer is all zeroes - */ -bool buffer_is_zero(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; + +static inline bool buffer_is_zero_sample3(const char *buf, size_t len) +{ + return (buf[0] | buf[len - 1] | buf[len / 2]) == 0; +} + +bool buffer_is_zero_ool(const void *buf, size_t len) { if (unlikely(len == 0)) { return true; } + if (!buffer_is_zero_sample3(buf, len)) { + return false; + } + /* All bytes are covered for any len <= 3. */ + if (unlikely(len <= 3)) { + return true; + } - /* Fetch the beginning of the buffer while we select the accelerator. */ - __builtin_prefetch(buf); - - /* Use an optimized zero check if possible. Note that this also - includes a check for an unrolled loop over 64-bit integers. */ - return select_accel_fn(buf, len); + if (likely(len >= 256)) { + return buffer_is_zero_accel(buf, len); + } + return buffer_is_zero_integer(buf, len); +} + +bool buffer_is_zero_ge256(const void *buf, size_t len) +{ + return buffer_is_zero_sample3(buf, len) && buffer_is_zero_accel(buf, len); } From patchwork Thu Feb 15 08:14:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 13557622 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F0DAAC48BEB for ; Thu, 15 Feb 2024 08:16:51 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu7-0007b3-R9; Thu, 15 Feb 2024 03:15:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtZ-0007BA-0Y for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:02 -0500 Received: from mail-pj1-x102d.google.com ([2607:f8b0:4864:20::102d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtX-0001No-El for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:00 -0500 Received: by mail-pj1-x102d.google.com with SMTP id 98e67ed59e1d1-296c562ac70so525664a91.2 for ; Thu, 15 Feb 2024 00:14:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984897; x=1708589697; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VgkwjOwkXeKSpsaCNpWivSMzmaRHOTSXYP4mzjDgWYo=; b=U3PDUSXX7ASKeyKrx7u2i7lRXYRDOkmyKhP4s9QRep0cTnZL+E636VURS7wO/NX4hh 0ri8EghIHBvrPB1GL2+f2aHgMK6BVmGsuo/4H1AOp7lktXiGmHX7egobiw1IC0/zxVut V2nRCbzKv1zM7eKX6wqukGffnYDI+ch+fnn643/+9UXTN4T6bcbYb45UHrAMqqCMvYN5 H97dcecGvX6wXFsCVLTTsxHEjWPcekit4zGhkXMhsDSCzgNP75rFJ3vKyt7ZVUwTeUCi Eybk4RilIwzFeT1W/s9J63ae26lKfIiaqdceAn3l+aLIYAu51rQUgDquDNZqFu8BOPQq C26A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984897; x=1708589697; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VgkwjOwkXeKSpsaCNpWivSMzmaRHOTSXYP4mzjDgWYo=; b=YUwPqN1Ass3R1dCkQs3zo7fyeVsx2rA6GqJltHNnJs3zAapSCabl1WF6mHu9RTQszT tNgWIifFZckrmn0jiIGWTPGeWonJ8Z2apfTTyA+R2k23ckTIHDqF/Tjmnc+egwWBXfhG bCvNRTqkYFXpW71Zu6W4O5wWY2ZoujrNl7ut8uXC/CCFJDL4/leAghZJGhTEB+2Ir8+w pXaziebKgltMMzqeTxrhT36y5Tbrf8EIIjM8tQBpftm+PbaJSxzQojPMWLYnyPDTc1vV phGRQle9ORLKOq4gv2XscJYhpt50VCbpAKGlKZpkrMY0GFJhzCBJiq+sHQO5pUoeINMR qnJg== X-Gm-Message-State: AOJu0YzFE/O1K7njPuqEd11dONRXa2Uu8FRcHgZoEmXBKCv5fjvgIs+p Ld6Dnip9oly7O7mevqeZh/XWgVak8t89JUpnSXPEz/tpl9myxZgIq78hlEQJ7K6RoK5gBGZMRwE n X-Google-Smtp-Source: AGHT+IG9zZ4k/AfypOCpWf54IbQa1Q+gp5GW5Kf1yF8LyFxnNpgLF8TnEShBmWh3tfG++Vr7IDAClA== X-Received: by 2002:a17:90a:4381:b0:298:c136:2ffc with SMTP id r1-20020a17090a438100b00298c1362ffcmr860621pjg.45.1707984897218; Thu, 15 Feb 2024 00:14:57 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:56 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 04/10] util/bufferiszero: Remove useless prefetches Date: Wed, 14 Feb 2024 22:14:43 -1000 Message-Id: <20240215081449.848220-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::102d; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x102d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Alexander Monakov Use of prefetching in bufferiszero.c is quite questionable: - prefetches are issued just a few CPU cycles before the corresponding line would be hit by demand loads; - they are done for simple access patterns, i.e. where hardware prefetchers can perform better; - they compete for load ports in loops that should be limited by load port throughput rather than ALU throughput. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-5-amonakov@ispras.ru> --- util/bufferiszero.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 38527f2467..6ef5f8ec79 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -50,7 +50,6 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); for (; p + 8 <= e; p += 8) { - __builtin_prefetch(p + 8); if (t) { return false; } @@ -80,7 +79,6 @@ buffer_zero_sse2(const void *buf, size_t len) /* Loop over 16-byte aligned blocks of 64. */ while (likely(p <= e)) { - __builtin_prefetch(p); t = _mm_cmpeq_epi8(t, zero); if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { return false; @@ -111,7 +109,6 @@ buffer_zero_avx2(const void *buf, size_t len) /* Loop over 32-byte aligned blocks of 128. */ while (p <= e) { - __builtin_prefetch(p); if (unlikely(!_mm256_testz_si256(t, t))) { return false; } From patchwork Thu Feb 15 08:14:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 13557626 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7DA1CC48BEF for ; Thu, 15 Feb 2024 08:17:19 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWuC-0007mX-Bs; Thu, 15 Feb 2024 03:15:40 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtZ-0007Bb-GN for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:03 -0500 Received: from mail-pg1-x52a.google.com ([2607:f8b0:4864:20::52a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtX-0001O5-Or for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:01 -0500 Received: by mail-pg1-x52a.google.com with SMTP id 41be03b00d2f7-5d3912c9a83so497968a12.3 for ; Thu, 15 Feb 2024 00:14:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984898; x=1708589698; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=o1WrtT24LEWJ2UE0G4+JsaaZMny8aXncXNU55PndN9g=; b=auiEAbsUS+da5ViS/dPNXxUTP12mUuVl4Ihtgw9dOzOwPs0p6twZkjSZMJJDKwdk80 Ij/9DI5v3S7ygfdIGecRokFz120KDWZGwIfiypRZVHToXcF6E2NinWzzHNpzrtMA3ADo IYDcYfoqVMfRbpT+CWnYvVSMoZO0pJoafK4iUbYIgIgbAIv+CkHleWJ4FWROcKq53PnP F8Zu8O1QbgzcYaQRkK1AbOnDauxAeMMw/oYzbUtjYoZW5MhInR3sxgBjVlaM3EaG1Rr2 TeK/WLKTGaiCbJP/yJEI/23wYPQHleQibFqGSZqnn6VzbzKwcBWSkjDM5PiUX5tnCM/e pvVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984898; x=1708589698; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o1WrtT24LEWJ2UE0G4+JsaaZMny8aXncXNU55PndN9g=; b=hsTSQu3TpF5vgKXYQSCxfZZDMvejhvn4VTiHwwfwY3rq1Yp0/SDrMvMM/c2ywSZPwk tTBrdp55FaUJj7+qW5R7qjASI9csaTDiU2CpTePD5IeFaWtEDmw0bT1ybz+lMAgrWdXj 8VqYaxqg0ySrDR+L7/PSO1WSxhM5CEP4ONggiUMFdMyLwePFtNnOKOGbN/8UvChhvpp/ FfYXiVGOyJVKv01oeY7SSAdRTjBFaZ3fYt16ytNBjJ34HIKHrTMMBnOPY2BsiXm/gRpP PNbdbLxzwmquvjxz2t2TF90im2+NC5wnl/ovOi5H3hMn3mcew5oSRvEGr4LIX+Oorp77 QU7A== X-Gm-Message-State: AOJu0YwRNsAaGcgFGkdiQ+udchB7jeQeaQWEq4x4siiUyTli9Xm/SxS4 qij8O3T2n4NxLTfn5EeKSA8jahEr3O4lTVk6fdOidmrDDgtMSeDhzOgIdC76LZg2Mck542pNY5u b X-Google-Smtp-Source: AGHT+IGzO4CI/t4QjIU+2Qo27Ye8SxKsJp3yB8BLqMiz9HOQO23z1TlP+chyVgdOuy3RcyvZ2UrF+g== X-Received: by 2002:a05:6a20:d70f:b0:1a0:686b:afdd with SMTP id iz15-20020a056a20d70f00b001a0686bafddmr1262724pzb.5.1707984898453; Thu, 15 Feb 2024 00:14:58 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:58 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants Date: Wed, 14 Feb 2024 22:14:44 -1000 Message-Id: <20240215081449.848220-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::52a; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x52a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: Alexander Monakov Increase unroll factor in SIMD loops from 4x to 8x in order to move their bottlenecks from ALU port contention to load issue rate (two loads per cycle on popular x86 implementations). Avoid using out-of-bounds pointers in loop boundary conditions. Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of PTEST, which is not profitable there (like in the removed SSE4 variant). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-6-amonakov@ispras.ru> --- util/bufferiszero.c | 111 +++++++++++++++++++++++++++++--------------- 1 file changed, 73 insertions(+), 38 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 6ef5f8ec79..2822155c27 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -67,62 +67,97 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include -/* Note that each of these vectorized functions require len >= 64. */ +/* Helper for preventing the compiler from reassociating + chains of binary vector operations. */ +#define SSE_REASSOC_BARRIER(vec0, vec1) asm("" : "+x"(vec0), "+x"(vec1)) + +/* Note that these vectorized functions may assume len >= 256. */ static bool __attribute__((target("sse2"))) buffer_zero_sse2(const void *buf, size_t len) { - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - __m128i zero = _mm_setzero_si128(); + /* Unaligned loads at head/tail. */ + __m128i v = *(__m128i_u *)(buf); + __m128i w = *(__m128i_u *)(buf + len - 16); + /* Align head/tail to 16-byte boundaries. */ + const __m128i *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const __m128i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + __m128i zero = { 0 }; - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - t = _mm_cmpeq_epi8(t, zero); - if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + v = _mm_cmpeq_epi8(v, zero); + if (unlikely(_mm_movemask_epi8(v) != 0xFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + p += 8; + } while (p < e - 7); - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0xFFFF; + return _mm_movemask_epi8(_mm_cmpeq_epi8(v, zero)) == 0xFFFF; } #ifdef CONFIG_AVX2_OPT static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { - /* Begin with an unaligned head of 32 bytes. */ - __m256i t = _mm256_loadu_si256(buf); - __m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32); - __m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32); + /* Unaligned loads at head/tail. */ + __m256i v = *(__m256i_u *)(buf); + __m256i w = *(__m256i_u *)(buf + len - 32); + /* Align head/tail to 32-byte boundaries. */ + const __m256i *p = QEMU_ALIGN_PTR_DOWN(buf + 32, 32); + const __m256i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 32); + __m256i zero = { 0 }; - /* Loop over 32-byte aligned blocks of 128. */ - while (p <= e) { - if (unlikely(!_mm256_testz_si256(t, t))) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* Loop over complete 256-byte blocks. */ + for (; p < e - 7; p += 8) { + /* PTEST is not profitable here. */ + v = _mm256_cmpeq_epi8(v, zero); + if (unlikely(_mm256_movemask_epi8(v) != 0xFFFFFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } ; + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + } - /* Finish the last block of 128 unaligned. */ - t |= _mm256_loadu_si256(buf + len - 4 * 32); - t |= _mm256_loadu_si256(buf + len - 3 * 32); - t |= _mm256_loadu_si256(buf + len - 2 * 32); - t |= _mm256_loadu_si256(buf + len - 1 * 32); - - return _mm256_testz_si256(t, t); + return _mm256_movemask_epi8(_mm256_cmpeq_epi8(v, zero)) == 0xFFFFFFFF; } #endif /* CONFIG_AVX2_OPT */ From patchwork Thu Feb 15 08:14:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 13557623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AB327C4829E for ; Thu, 15 Feb 2024 08:16:57 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWuA-0007jU-49; Thu, 15 Feb 2024 03:15:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtb-0007Cd-3y for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:07 -0500 Received: from mail-oi1-x22c.google.com ([2607:f8b0:4864:20::22c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtZ-0001OY-8c for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:02 -0500 Received: by mail-oi1-x22c.google.com with SMTP id 5614622812f47-3c132695f1bso481388b6e.2 for ; Thu, 15 Feb 2024 00:15:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984900; x=1708589700; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=EEYj0I5Vz7Ix2So9xIua9K5UZDr6lKDlAW8K2SObdT4=; b=hklfNXwg1fcyK0do01I1ggFKQiV5W+zkjiAM/cdWkcEFjZ1D2vRtg7+OWYPBUcvrCu H84lYwspOEypdoqqPoF8limVeVwUIv9hpP0RHR+k7FxrGgfKgOE3cLNsyYeSvGOoBChV GwtHW8R/GiaeW4WDogl7naR+aKy+VzgFfFWbFGLRAh6M4+LqYjh/iNziY6v7wmjEUTZ1 0lCG4h2E4HrbuWjbnTZdHaCcl0OukeHv8io2BqGOJDykMnxFcHH0CtEq7wy6ADxLT31J /mQAbc5AKyWSUFQv/X117tvVz9nACNccVqyWEMpRuIrlOCZXDGb+lXmFxKSzaGKrvb6T vQzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984900; x=1708589700; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EEYj0I5Vz7Ix2So9xIua9K5UZDr6lKDlAW8K2SObdT4=; b=U+2Wt6ghFS7zcwBOPyDMJfYgiFgL5+wPNmNTzSV5o91ITH/S8XRisNuk8t4GANUJnU eGtlknRCUsyNjLbJsS5hg0vR1dyDaNc3TJ/0FgYDTIfBlYhM2BDflqnS6kYe/aGWVTP7 UFj8Ken0CeFIMOZMr7rNvl+oHIzCreaDsgnN2oNV9Jr1Yhv1ZIToSbdl12iPwet/BSFA 5PIkJ9cxLMMqkDaWvM+DHREctsjitgUsZwT/HWt20s84Acm1w/R9tqnNqkdVtQaGJyQW dpcLNF0RA75woMLGZFpfNwu4S42z6ZllqDFxrnCjus5AEZxPR4KYF1Rp7xKW0uNBccRY FHng== X-Gm-Message-State: AOJu0YzWWuEerkF/QzDPCJCPhPXcBoFypEZ+0UfWP/2CP3bI2Nh1bbJB Dh7+YtzZJY+jdAlmgic/419vcUMOmuC6et6pvEdMIyIPdrV552imjn5iAGkflgGLCSv5ClxBfre 0 X-Google-Smtp-Source: AGHT+IFOLM0FrUNRqqzDSgX28Sd59EqhEkhkYq4B3kVA5R4+ck7WlT+FhoeR6ocoQMzyHGhkdlRVcA== X-Received: by 2002:a05:6358:885:b0:176:5d73:34ef with SMTP id m5-20020a056358088500b001765d7334efmr933588rwj.24.1707984899769; Thu, 15 Feb 2024 00:14:59 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:59 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 06/10] util/bufferiszero: Improve scalar variant Date: Wed, 14 Feb 2024 22:14:45 -1000 Message-Id: <20240215081449.848220-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::22c; envelope-from=richard.henderson@linaro.org; helo=mail-oi1-x22c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Split less-than and greater-than 256 cases. Use unaligned accesses for head and tail. Avoid using out-of-bounds pointers in loop boundary conditions. Signed-off-by: Richard Henderson --- util/bufferiszero.c | 86 +++++++++++++++++++++++++++------------------ 1 file changed, 52 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 2822155c27..ce04642c67 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -28,40 +28,58 @@ static bool (*buffer_is_zero_accel)(const void *, size_t); -static bool buffer_is_zero_integer(const void *buf, size_t len) +static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { - if (unlikely(len < 8)) { - /* For a very small buffer, simply accumulate all the bytes. */ - const unsigned char *p = buf; - const unsigned char *e = buf + len; - unsigned char t = 0; + uint64_t t; + const uint64_t *p, *e; - do { - t |= *p++; - } while (p < e); - - return t == 0; - } else { - /* Otherwise, use the unaligned memory access functions to - handle the beginning and end of the buffer, with a couple - of loops handling the middle aligned section. */ - uint64_t t = ldq_he_p(buf); - const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8); - const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); - - for (; p + 8 <= e; p += 8) { - if (t) { - return false; - } - t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; - } - while (p < e) { - t |= *p++; - } - t |= ldq_he_p(buf + len - 8); - - return t == 0; + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer, with a couple + * of loops handling the middle aligned section. + */ + if (unlikely(len <= 8)) { + return (ldl_he_p(buf) | ldl_he_p(buf + len - 4)) == 0; } + + t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + while (p < e) { + t |= *p++; + } + return t == 0; +} + +static bool buffer_is_zero_int_ge256(const void *buf, size_t len) +{ + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer, with a couple + * of loops handling the middle aligned section. + */ + uint64_t t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + const uint64_t *p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + const uint64_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + /* Collect a partial block at the tail end. */ + t |= e[-7] | e[-6] | e[-5] | e[-4] | e[-3] | e[-2] | e[-1]; + + /* + * Loop over 64 byte blocks. + * With the head and tail removed, e - p >= 30, + * so the loop must iterate at least 3 times. + */ + do { + if (t) { + return false; + } + t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; + p += 8; + } while (p < e - 7); + + return t == 0; } #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) @@ -173,7 +191,7 @@ select_accel_cpuinfo(unsigned info) { CPUINFO_AVX2, buffer_zero_avx2 }, #endif { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_integer }, + { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { @@ -211,7 +229,7 @@ bool test_buffer_is_zero_next_accel(void) return false; } -#define INIT_ACCEL buffer_is_zero_integer +#define INIT_ACCEL buffer_is_zero_int_ge256 #endif static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; @@ -237,7 +255,7 @@ bool buffer_is_zero_ool(const void *buf, size_t len) if (likely(len >= 256)) { return buffer_is_zero_accel(buf, len); } - return buffer_is_zero_integer(buf, len); + return buffer_is_zero_int_lt256(buf, len); } bool buffer_is_zero_ge256(const void *buf, size_t len) From patchwork Thu Feb 15 08:14:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 13557618 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AFB09C48BEB for ; Thu, 15 Feb 2024 08:16:11 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu9-0007hR-FJ; Thu, 15 Feb 2024 03:15:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtb-0007Cj-PI for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:06 -0500 Received: from mail-oi1-x22b.google.com ([2607:f8b0:4864:20::22b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWta-0001PV-4M for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:03 -0500 Received: by mail-oi1-x22b.google.com with SMTP id 5614622812f47-3bd72353d9fso450614b6e.3 for ; Thu, 15 Feb 2024 00:15:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984901; x=1708589701; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+5kI9i1T4g1qlGS3lxQuuu2LwuYBA3jGXjmyocvfHSc=; b=bNRaOfVoxaup6DdFdTBI/FTE4rnKVKD2x3aHfXpQhC27t5Jx2az7s5ItLAPAgIEzX8 6sJqvvCDt5CL4dy2fVO4u07hvcl46scWgIpf/C4EyuQbguJYuFTRfn/qSErW7Q8LQKao +g4M7i3lx4kwgBFu0J84mml2mGOSWuECZ4KOZCU28e1FGPfRH4KB5vvRP/9IGBE2M27L OLYMANsL3K92mW96tqQ7flgZEaa6/595TFE0h8T/r2AuzBW7lwN+0e0KrZwZ/BnVWiWi O/ecElTi+IgIVpb6SO374SWqyzPIbCbXNrj9B+paPiXwAeZaYhJKExq2CORN+vRCLECW n1ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984901; x=1708589701; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+5kI9i1T4g1qlGS3lxQuuu2LwuYBA3jGXjmyocvfHSc=; b=BFdBjqY3S9MFjoG3RHS/+nHQLXGFRG0ONzh6g3ITFLW4FRkjJWu9X6qn2RSorKJ1Yv Wx/H6PRwezTfWxDeRxebi0u1g6FrdRJQWVzI4ny8nVvGlL9+codVzQ3AFQa+gT1uJtks 8MPgtw3HViFH7/geq23W6AwgG+AbWH47EuYF9yDdTEDFamFK/LmYIOWWRPROWbdemdzA lv4eh2rd9/UhVVc2El7wKAMKlN/BH+6klfG+EH8U/DiRgVOGdPkTP1sQXPKq8eAWxym0 2hQW8NRzSgIpJ8mvridmUt/5yih1nPPfHYxhnEdLQhcnAmGi0VsTjGBSRWvd2FThGTX/ ehtw== X-Gm-Message-State: AOJu0YzCLFSy2M2tmZDrEBNToLX6TPICHE/AQAIWz/LkcgwVe1qJqaas 0tRGXGGmZwe7H+xCtf7y36DsZclWBsttabXW4LkVyHdAlInldPza3Wsf9pVF7w07o9+TfH1Yi2f A X-Google-Smtp-Source: AGHT+IF9spF03LMpTCCog7Hsal4zoowWZXji4owaG+gTb7KneARWWa/QnD0bTLE0Nim9dq/y/BWhIQ== X-Received: by 2002:a05:6358:6f0b:b0:178:688e:fb21 with SMTP id r11-20020a0563586f0b00b00178688efb21mr1068454rwn.7.1707984901019; Thu, 15 Feb 2024 00:15:01 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:15:00 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 07/10] util/bufferiszero: Introduce biz_accel_fn typedef Date: Wed, 14 Feb 2024 22:14:46 -1000 Message-Id: <20240215081449.848220-8-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::22b; envelope-from=richard.henderson@linaro.org; helo=mail-oi1-x22b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- util/bufferiszero.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index ce04642c67..ce80713071 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,7 +26,8 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool (*buffer_is_zero_accel)(const void *, size_t); +typedef bool (*biz_accel_fn)(const void *, size_t); +static biz_accel_fn buffer_is_zero_accel; static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { @@ -179,13 +180,15 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ + + static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - bool (*fn)(const void *, size_t); + biz_accel_fn fn; } all[] = { #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, buffer_zero_avx2 }, @@ -232,7 +235,7 @@ bool test_buffer_is_zero_next_accel(void) #define INIT_ACCEL buffer_is_zero_int_ge256 #endif -static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; +static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL; static inline bool buffer_is_zero_sample3(const char *buf, size_t len) { From patchwork Thu Feb 15 08:14:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 13557624 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 98F3CC48BEB for ; Thu, 15 Feb 2024 08:17:02 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWuA-0007kI-AF; Thu, 15 Feb 2024 03:15:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtd-0007Cy-Dw for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:06 -0500 Received: from mail-pj1-x1032.google.com ([2607:f8b0:4864:20::1032]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtb-0001QN-MG for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:05 -0500 Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-290da27f597so466759a91.2 for ; Thu, 15 Feb 2024 00:15:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984902; x=1708589702; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GT2gVjPR6SgaNO/ooCzaomL/QLce0sgZc8b20aaRw0g=; b=H+gbamLFlmM9NNvGzN2OwVWi3rlwwOWLdtjbkxG3rf/w2aJi26BM0tr/uNz6AWpt6U nhKXY79JVXI+li093NB6VSQngNLpJCDxQ+2V4B+4dCsKmzIxevUb8/hR9Ts2h/tF4NXU qCr6ItTWiKxCe4l5qejFJ9Dq+Iz3mmoFuSQwdMD3xRuP9ixzpE82VpLCYnsOCbNA7t2m 0Xy/p1MdS41vH6/v+sqz5mJiZ55/EVmbl0L5tujA+ZWgNcAcfhfcddUa/4Vkal1k1WrZ Ih0v+aBCdEM/erSNBHHT6vKy50lccG9FHNwlE5nkwr7Ay2TNX/M69oBEBkneHS2fEL+P 2cEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984902; x=1708589702; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GT2gVjPR6SgaNO/ooCzaomL/QLce0sgZc8b20aaRw0g=; b=vB+CrLX21LwR9Kukz8gPN2kL0eJczYxbQessVT2ZUE4uY89yxl3pIIxcQEyvMnZU3H KPp4GcIzcFu+iKBsgDT2kycfKgzSNOj8qfXqH/y9PzWdQ29jQA7I3zX8rVNbuEPs2Ht/ m9hIi2+3Fn52YGt3Uz8heikkEkyHcasaeIZy0jtmr7jIYv2YpJzfsjh+guHt+ZJF5d0y tRLT0mc+QftGvOecp5jL2sV/nLMXUNj3JexULX2Y141waS60R/qCP+GE+lDrmQCRCUdz W4dfhkBWhKbko4wiabPRHBIvkTOXNLQkNsjGassoCVh7ba5Ft8CuAsWsQyKhplvpAlxb GZVQ== X-Gm-Message-State: AOJu0YzromjsVXG/W8EvVtMjBPOJ+pqmE0qZcTa8QR+mZyFvh1Y/8qYW omaxvgeGzegkvGGXRpzyyuY7BL3t9qLBa5JDrvo7VDMOJdmujzIU27kKt7BUuCRYqoW3HISKZNN U X-Google-Smtp-Source: AGHT+IEFgp/lcu2AAGdZFAR7JBo2Wf+vbs81VYylavDSIm699qxyozPRLot3mcOks9CPBJKa7XFjTQ== X-Received: by 2002:a17:90b:1642:b0:299:165:b429 with SMTP id il2-20020a17090b164200b002990165b429mr963156pjb.23.1707984902259; Thu, 15 Feb 2024 00:15:02 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.15.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:15:01 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Date: Wed, 14 Feb 2024 22:14:47 -1000 Message-Id: <20240215081449.848220-9-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1032; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1032.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Because the three alternatives are monotonic, we don't need to keep a couple of bitmasks, just identify the strongest alternative at startup. Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- util/bufferiszero.c | 56 ++++++++++++++++++--------------------------- 1 file changed, 22 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index ce80713071..4eef6d47bc 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -180,51 +180,39 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ - - -static unsigned __attribute__((noinline)) -select_accel_cpuinfo(unsigned info) -{ - /* Array is sorted in order of algorithm preference. */ - static const struct { - unsigned bit; - biz_accel_fn fn; - } all[] = { +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_zero_sse2, #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, buffer_zero_avx2 }, + buffer_zero_avx2, #endif - { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, - }; - - for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { - if (info & all[i].bit) { - buffer_is_zero_accel = all[i].fn; - return all[i].bit; - } - } - return 0; -} - -static unsigned used_accel; +}; +static unsigned accel_index; static void __attribute__((constructor)) init_accel(void) { - used_accel = select_accel_cpuinfo(cpuinfo_init()); + unsigned info = cpuinfo_init(); + unsigned index = (info & CPUINFO_SSE2 ? 1 : 0); + +#ifdef CONFIG_AVX2_OPT + if (info & CPUINFO_AVX2) { + index = 2; + } +#endif + + accel_index = index; + buffer_is_zero_accel = accel_table[index]; } #define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { - /* - * Accumulate the accelerators that we've already tested, and - * remove them from the set to test this round. We'll get back - * a zero from select_accel_cpuinfo when there are no more. - */ - unsigned used = select_accel_cpuinfo(cpuinfo & ~used_accel); - used_accel |= used; - return used; + if (accel_index != 0) { + buffer_is_zero_accel = accel_table[--accel_index]; + return true; + } + return false; } #else bool test_buffer_is_zero_next_accel(void) From patchwork Thu Feb 15 08:14:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 13557619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D1302C4829E for ; Thu, 15 Feb 2024 08:16:34 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu8-0007cZ-5w; Thu, 15 Feb 2024 03:15:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWte-0007DT-Rs for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:08 -0500 Received: from mail-pg1-x529.google.com ([2607:f8b0:4864:20::529]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtc-0001aA-Vn for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:06 -0500 Received: by mail-pg1-x529.google.com with SMTP id 41be03b00d2f7-5d8ddbac4fbso512868a12.0 for ; Thu, 15 Feb 2024 00:15:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984903; x=1708589703; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LcHS8hoi1egiJUTZk4E4y/djhNgaSWqSASRO2zuVQnQ=; b=nl58qaCR9J7WIXMixV12jPnvQKM3Ubuh2eMGbMGT/qbvYSFqvSmUaW6C6/PsL92jN1 2/nOgMfihxlLLu/0XWjrn4tioz46du1BqfXhV0tfRDkoEtBjD3HJlxQFVcu9sGsHPeAJ Q+/6mbybfb0TiYQUI6tS/MuK2w6sgOxLkwHOct4OcyDHVHOsR23ZBDNk5Nzal2Hw36Jo C2IeuRCXa82RnrHmBxnArO0wQll713nBcu/4iOy3+gKDZ/lXm1KAcurQJHox3fOiSJlM xyPLjSBDuUBIDGxeCy2dcKqOePiGVL7Cjy8sDE47Zx8Rs7aM4DNnjnR+lLHynww7ukhK ZfSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984903; x=1708589703; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LcHS8hoi1egiJUTZk4E4y/djhNgaSWqSASRO2zuVQnQ=; b=JdneTIrRGPLb/nkv/nEChhFPJbf6jBdbqt5AFpu98qvrSNq/MuVfesGM0oj3c1Yulf O78a5RXg4MS3B4Zne88CpFSZB9duONmekHtlVuVTcrMXHQ9HAaRZa243moOtckjEX8RL MVZ7hJwu2002HqHop19kJOdLOT22IRDz6e/y+dsfXRRoAAV5RoesPXQLJFQa/TOOEa9M 3w9UwYAVBBtL2agYXWDp6Y+fHRoM5lPd7nlppMrYJStpOZDrBk5TmShmFS5Qs9qk43jG LClWyK2+rT4Y37Q+KJFZUJUAcgJM7BldgrX2BlrXkW9T5pebfGV7WbhHSGeDQ5qpfDi7 sJSQ== X-Gm-Message-State: AOJu0YzueWXD0mN1QoKtG6ezbmBA8OMIWyPlbfilDn59MtQcb58yXlgk wlpPZF0En3kAlmM5IrDrP80wWGI4BuaHPSjqJ9JGq5eF9GZ442dbxZEfJ5J6Z/d4QO8BJsbWpou i X-Google-Smtp-Source: AGHT+IFu6R4mEqQO92Y8u5pn3OeNLkZw1YBCa/57uEnzHK4ZYR1rdiPHxiZQXAeKvZTOWoOmQzrSMw== X-Received: by 2002:a17:90a:c587:b0:298:c2a8:4ade with SMTP id l7-20020a17090ac58700b00298c2a84ademr953288pjt.28.1707984903549; Thu, 15 Feb 2024 00:15:03 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.15.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:15:03 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64 Date: Wed, 14 Feb 2024 22:14:48 -1000 Message-Id: <20240215081449.848220-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::529; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x529.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely double-check with the compiler flags for __ARM_NEON and don't bother with a runtime check. Otherwise, model the loop after the x86 SSE2 function, and use VADDV to reduce the four vector comparisons. Signed-off-by: Richard Henderson --- util/bufferiszero.c | 74 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 4eef6d47bc..2809b09225 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -214,7 +214,81 @@ bool test_buffer_is_zero_next_accel(void) } return false; } + +#elif defined(__aarch64__) && defined(__ARM_NEON) +#include + +#define REASSOC_BARRIER(vec0, vec1) asm("" : "+w"(vec0), "+w"(vec1)) + +static bool buffer_is_zero_simd(const void *buf, size_t len) +{ + uint32x4_t t0, t1, t2, t3; + + /* Align head/tail to 16-byte boundaries. */ + const uint32x4_t *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const uint32x4_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + + /* Unaligned loads at head/tail. */ + t0 = vld1q_u32(buf) | vld1q_u32(buf + len - 16); + + /* Collect a partial block at tail end. */ + t1 = e[-7] | e[-6]; + t2 = e[-5] | e[-4]; + t3 = e[-3] | e[-2]; + t0 |= e[-1]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + /* Each comparison is [-1,0], so reduction is in [-4..0]. */ + if (unlikely(vaddvq_u32(vceqzq_u32(t0)) != -4)) { + return false; + } + + t0 = p[0] | p[1]; + t1 = p[2] | p[3]; + t2 = p[4] | p[5]; + t3 = p[6] | p[7]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + p += 8; + } while (p < e - 7); + + return vaddvq_u32(vceqzq_u32(t0)) == -4; +} + +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_is_zero_simd, +}; + +static unsigned accel_index = 1; +#define INIT_ACCEL buffer_is_zero_simd + +bool test_buffer_is_zero_next_accel(void) +{ + if (accel_index != 0) { + buffer_is_zero_accel = accel_table[--accel_index]; + return true; + } + return false; +} + #else + bool test_buffer_is_zero_next_accel(void) { return false; From patchwork Thu Feb 15 08:14:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 13557617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E288CC4829E for ; Thu, 15 Feb 2024 08:15:44 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWuB-0007lG-QB; Thu, 15 Feb 2024 03:15:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWth-0007Dz-8B for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:10 -0500 Received: from mail-pg1-x52e.google.com ([2607:f8b0:4864:20::52e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWte-0001bl-5n for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:08 -0500 Received: by mail-pg1-x52e.google.com with SMTP id 41be03b00d2f7-5ce942efda5so490502a12.2 for ; Thu, 15 Feb 2024 00:15:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984905; x=1708589705; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rKwcsoqbJERmdkOS9kUB6SrPLVGyNwfg2Re2n5hH/Bw=; b=tRUrP5e6NdYPo3/srRcWW+J33mNgrgP5pH11LQoIQ7QHX6XqK5T2Q9cMbTPR7Ozyha bJpHHXWshDE3kfxcq0PAdFl6SRlSwxs19PXqUjNocxTaES6sZQQ2Pv2FySCfyi7AouZv 6JDwcncYI2DpmGOdO8/w3U8orXdOLfGS2g63bMNX5vSgVt7vo72SfsPANrwdOAl+Cf20 M3SVJtq4V7LEbdbVLmE+48kx10CJO3H7m5ySvkC/el1+9NizhhFj8Wq02TdUvrc+LIYn jeZsbBZRdJBEjagNc56siuWU1KzhaBc1Pb4szEQ4PCgas6FR7NSlGrndkXrDx3FAC3tX Fhtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984905; x=1708589705; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rKwcsoqbJERmdkOS9kUB6SrPLVGyNwfg2Re2n5hH/Bw=; b=QJyHzbCWZ9SnN7D2cawUk4qIinJ7cIzfdZ2jVraXf4bMLenclf3Nrawhv/xxJgCP/m bfd//hzOgR+lIBFmrIO3GKbl3U7L6XrYS12McwXvwmh4rnKq6MAVFwsmVzEgtacNdLiW l2STdQoSWjr6b9yhL95iPto4eBQnIYJoS4YnjqPAc86kSM3tONYTxQjkoqpljSIM/rhj tLsE4LZGYpZhFDBYt7eVljRPz33134u5gEdfbbzyI8NJyzOFem+YuWS/z4kxu8sKtKEs HckWPhVu3jQNNzQX6XQMVOWoUEfrl5sQluytEqERFR2MWvL4GTahEKrmxed3H9qsZa2s Q/hg== X-Gm-Message-State: AOJu0YwUkt/W635hKlA1lWQbf6dZ7pJeuZ9y1Y4U7laC3a1WZWNAYXDA lz+/VSdBf7cNgsU6d0xzp00iTb8je2P/j/uBcKVCU44fBuZQPpCrl3aKpm93XC7MDTobAJeeMVV D X-Google-Smtp-Source: AGHT+IFm67fZWp8xIQeGNOkUQuYe3WQy07lgADg/ZqQ4rqGUdTAOiDD9rBMQTZ9RUo2nno/QZPzVxw== X-Received: by 2002:a05:6a20:20c1:b0:19e:b534:1bcb with SMTP id t1-20020a056a2020c100b0019eb5341bcbmr1028640pza.23.1707984904797; Thu, 15 Feb 2024 00:15:04 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.15.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:15:04 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [RFC PATCH v4 10/10] util/bufferiszero: Add sve acceleration for aarch64 Date: Wed, 14 Feb 2024 22:14:49 -1000 Message-Id: <20240215081449.848220-11-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::52e; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x52e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Signed-off-by: Richard Henderson --- RFC because I've not benchmarked this on real hw, only run it through qemu for validation. --- host/include/aarch64/host/cpuinfo.h | 1 + util/bufferiszero.c | 49 +++++++++++++++++++++++++++++ util/cpuinfo-aarch64.c | 1 + meson.build | 13 ++++++++ 4 files changed, 64 insertions(+) diff --git a/host/include/aarch64/host/cpuinfo.h b/host/include/aarch64/host/cpuinfo.h index fe671534e4..b4b816cd07 100644 --- a/host/include/aarch64/host/cpuinfo.h +++ b/host/include/aarch64/host/cpuinfo.h @@ -12,6 +12,7 @@ #define CPUINFO_AES (1u << 3) #define CPUINFO_PMULL (1u << 4) #define CPUINFO_BTI (1u << 5) +#define CPUINFO_SVE (1u << 6) /* Initialized with a constructor. */ extern unsigned cpuinfo; diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 2809b09225..af64c9c224 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -270,13 +270,62 @@ static bool buffer_is_zero_simd(const void *buf, size_t len) return vaddvq_u32(vceqzq_u32(t0)) == -4; } +#ifdef CONFIG_SVE_OPT +#include + +#ifndef __ARM_FEATURE_SVE +__attribute__((target("+sve"))) +#endif +static bool buffer_is_zero_sve(const void *buf, size_t len) +{ + svbool_t p, t = svptrue_b8(); + size_t i, n; + + /* + * For the first vector, align to 16 -- reading 1 to 256 bytes. + * Note this routine is only called with len >= 256, which is the + * architectural maximum vector length: the first vector always fits. + */ + i = 0; + n = QEMU_ALIGN_PTR_DOWN(buf + svcntb(), 16) - buf; + p = svwhilelt_b8(i, n); + + do { + svuint8_t d = svld1_u8(p, buf + i); + + p = svcmpne_n_u8(t, d, 0); + if (unlikely(svptest_any(t, p))) { + return false; + } + i += n; + n = svcntb(); + p = svwhilelt_b8(i, len); + } while (svptest_any(t, p)); + + return true; +} +#endif /* CONFIG_SVE_OPT */ + static biz_accel_fn const accel_table[] = { buffer_is_zero_int_ge256, buffer_is_zero_simd, +#ifdef CONFIG_SVE_OPT + buffer_is_zero_sve, +#endif }; +#ifdef CONFIG_SVE_OPT +static unsigned accel_index; +static void __attribute__((constructor)) init_accel(void) +{ + accel_index = (cpuinfo & CPUINFO_SVE ? 2 : 1); + buffer_is_zero_accel = accel_table[accel_index]; +} +#define INIT_ACCEL NULL +#else static unsigned accel_index = 1; #define INIT_ACCEL buffer_is_zero_simd +#endif /* CONFIG_SVE_OPT */ bool test_buffer_is_zero_next_accel(void) { diff --git a/util/cpuinfo-aarch64.c b/util/cpuinfo-aarch64.c index 4c8a005715..a1e22ea66e 100644 --- a/util/cpuinfo-aarch64.c +++ b/util/cpuinfo-aarch64.c @@ -61,6 +61,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void) info |= (hwcap & HWCAP_USCAT ? CPUINFO_LSE2 : 0); info |= (hwcap & HWCAP_AES ? CPUINFO_AES : 0); info |= (hwcap & HWCAP_PMULL ? CPUINFO_PMULL : 0); + info |= (hwcap & HWCAP_SVE ? CPUINFO_SVE : 0); unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2); info |= (hwcap2 & HWCAP2_BTI ? CPUINFO_BTI : 0); diff --git a/meson.build b/meson.build index c1dc83e4c0..89a8241bc0 100644 --- a/meson.build +++ b/meson.build @@ -2822,6 +2822,18 @@ config_host_data.set('CONFIG_ARM_AES_BUILTIN', cc.compiles(''' void foo(uint8x16_t *p) { *p = vaesmcq_u8(*p); } ''')) +config_host_data.set('CONFIG_SVE_OPT', cc.compiles(''' + #include + #ifndef __ARM_FEATURE_SVE + __attribute__((target("+sve"))) + #endif + void foo(void *p) { + svbool_t t = svptrue_b8(); + svuint8_t d = svld1_u8(t, p); + svptest_any(t, svcmpne_n_u8(t, d, 0)); + } + ''')) + have_pvrdma = get_option('pvrdma') \ .require(rdma.found(), error_message: 'PVRDMA requires OpenFabrics libraries') \ .require(cc.compiles(gnu_source_prefix + ''' @@ -4232,6 +4244,7 @@ summary_info += {'memory allocator': get_option('malloc')} summary_info += {'avx2 optimization': config_host_data.get('CONFIG_AVX2_OPT')} summary_info += {'avx512bw optimization': config_host_data.get('CONFIG_AVX512BW_OPT')} summary_info += {'avx512f optimization': config_host_data.get('CONFIG_AVX512F_OPT')} +summary_info += {'sve optimization': config_host_data.get('CONFIG_SVE_OPT')} summary_info += {'gcov': get_option('b_coverage')} summary_info += {'thread sanitizer': get_option('tsan')} summary_info += {'CFI support': get_option('cfi')}