From patchwork Sat Oct 26 12:53:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Puranjay Mohan X-Patchwork-Id: 13852230 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 025CCD10BF8 for ; Sat, 26 Oct 2024 12:54:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:To :From:Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=s1T2lLnxsuHJ/kQwyRtpmb1kyTcYfCJz9W68jCgLnwI=; b=BzR/qHvKyOny1Z Sibm3fmKLoktamOwhTFXuni0JZJ/B8iL9c4+CLvzZfrQflu6AiI5fS3/lN/+2sf5EPHoyNGrS3+Hy ikvAbMIpa3eBBYtnM+QUYNNbOVWTLUvgHAfXOjqpXLcjsxo+aKojEULo3VqF8MejWgiP349yLboF1 OEzk8ugMuNzfGcNdcjx1jsH8CfWrPdFyLUqjgJwWOvIljGYM3TExk5SaUaP3ieCgOvQvglQVQlS7g qWCBrxX6Xn0Bb/IpRKv5MIV2Wfj91SWzl8N7GrPq2ZP8DQiPyFMxb52lE2id4iqYdnrRRjNf6bnBj aCua1f6qm+DFv4EXGt3w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t4gIz-00000006eF9-23G1; Sat, 26 Oct 2024 12:54:09 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t4gIw-00000006eEZ-1ic7 for linux-riscv@lists.infradead.org; Sat, 26 Oct 2024 12:54:08 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 41FC65C1190; Sat, 26 Oct 2024 12:53:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AF472C4CEC7; Sat, 26 Oct 2024 12:54:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1729947244; bh=EXe3LG6/pgVGkYVneepsqC/jPnGrbyEGJMRy6FLBnrg=; h=From:To:Subject:Date:From; b=VJzMvpUw5p+Pfk6FZTcwH4s21IH3U/tn2NwTHKD57f3knaSyZJgDLe5xebknPFpkr iDECD+oLZSfgLJfqFeIVo2z5K7czic1golEJ9JxidaHdZT9Awn6rLPMzwccR2QFgcK d26Ml5B5b+389Tuff1BQwDSw9tMZi2iJ2O8jegeNrl8P5Gd/0GYCVs/ngPcLpKW6sH xjrUB+F2KhjFdeR8rDGQgi882G4NIrlIrAe17WDwtHVc7vLlL3digfoQxAfNKc2dee kIOI2WqnuNS+Mzxe1T5/geKKG1oq3XfefVNjw3dp8Cn8+IZz1EhCLfz7YwJV96ND50 /oj6385dkkz3w== From: Puranjay Mohan To: Albert Ou , Alexei Starovoitov , Andrew Morton , Andrii Nakryiko , bpf@vger.kernel.org, Daniel Borkmann , "David S. Miller" , Eduard Zingerman , Eric Dumazet , Hao Luo , Helge Deller , Jakub Kicinski , "James E.J. Bottomley" , Jiri Olsa , John Fastabend , KP Singh , linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org, linux-riscv@lists.infradead.org, Martin KaFai Lau , Mykola Lysenko , netdev@vger.kernel.org, Palmer Dabbelt , Paolo Abeni , Paul Walmsley , Puranjay Mohan , Puranjay Mohan , Shuah Khan , Song Liu , Stanislav Fomichev , Yonghong Song Subject: [PATCH bpf-next v3 0/4] Optimize bpf_csum_diff() and homogenize for all archs Date: Sat, 26 Oct 2024 12:53:35 +0000 Message-Id: <20241026125339.26459-1-puranjay@kernel.org> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241026_055406_566471_676CE0BD X-CRM114-Status: GOOD ( 16.27 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Changes in v3: v2: https://lore.kernel.org/all/20241023153922.86909-1-puranjay@kernel.org/ - Fix sparse warning in patch 2 Changes in v2: v1: https://lore.kernel.org/all/20241021122112.101513-1-puranjay@kernel.org/ - Remove the patch that adds the benchmark as it is not useful enough to be added to the tree. - Fixed a sparse warning in patch 1. - Add reviewed-by and acked-by tags. NOTE: There are some sparse warning in net/core/filter.c but those are not worth fixing because bpf helpers take and return u64 values and using them in csum related functions that take and return __sum16 / __wsum would need a lot of casts everywhere. The bpf_csum_diff() helper currently returns different values on different architectures because it calls csum_partial() that is either implemented by the architecture like x86_64, arm, etc or uses the generic implementation in lib/checksum.c like arm64, riscv, etc. The implementation in lib/checksum.c returns the folded result that is 16-bit long, but the architecture specific implementation can return an unfolded value that is larger than 16-bits. The helper uses a per-cpu scratchpad buffer for copying the data and then computing the csum on this buffer. This can be optimised by utilising some mathematical properties of csum. The patch 1 in this series does preparatory work for homogenizing the helper. patch 2 does the changes to the helper itself. The performance gain can be seen in the tables below that are generated using the benchmark built in patch 4 of v1 of this series: x86-64: +-------------+------------------+------------------+-------------+ | Buffer Size | Before | After | Improvement | +-------------+------------------+------------------+-------------+ | 4 | 2.296 ± 0.066M/s | 3.415 ± 0.001M/s | 48.73 % | | 8 | 2.320 ± 0.003M/s | 3.409 ± 0.003M/s | 46.93 % | | 16 | 2.315 ± 0.001M/s | 3.414 ± 0.003M/s | 47.47 % | | 20 | 2.318 ± 0.001M/s | 3.416 ± 0.001M/s | 47.36 % | | 32 | 2.308 ± 0.003M/s | 3.413 ± 0.003M/s | 47.87 % | | 40 | 2.300 ± 0.029M/s | 3.413 ± 0.003M/s | 48.39 % | | 64 | 2.286 ± 0.001M/s | 3.410 ± 0.001M/s | 49.16 % | | 128 | 2.250 ± 0.001M/s | 3.404 ± 0.001M/s | 51.28 % | | 256 | 2.173 ± 0.001M/s | 3.383 ± 0.001M/s | 55.68 % | | 512 | 2.023 ± 0.055M/s | 3.340 ± 0.001M/s | 65.10 % | +-------------+------------------+------------------+-------------+ ARM64: +-------------+------------------+------------------+-------------+ | Buffer Size | Before | After | Improvement | +-------------+------------------+------------------+-------------+ | 4 | 1.397 ± 0.005M/s | 1.493 ± 0.005M/s | 6.87 % | | 8 | 1.402 ± 0.002M/s | 1.489 ± 0.002M/s | 6.20 % | | 16 | 1.391 ± 0.001M/s | 1.481 ± 0.001M/s | 6.47 % | | 20 | 1.379 ± 0.001M/s | 1.477 ± 0.001M/s | 7.10 % | | 32 | 1.358 ± 0.001M/s | 1.469 ± 0.002M/s | 8.17 % | | 40 | 1.339 ± 0.001M/s | 1.462 ± 0.002M/s | 9.18 % | | 64 | 1.302 ± 0.002M/s | 1.449 ± 0.003M/s | 11.29 % | | 128 | 1.214 ± 0.001M/s | 1.443 ± 0.003M/s | 18.86 % | | 256 | 1.080 ± 0.001M/s | 1.423 ± 0.001M/s | 31.75 % | | 512 | 0.887 ± 0.001M/s | 1.411 ± 0.002M/s | 59.07 % | +-------------+------------------+------------------+-------------+ Patch 3 reverts a hack that was done to make the selftest pass on all architectures. Patch 4 adds a selftest for this helper to verify the results produced by this helper in multiple modes and edge cases. Puranjay Mohan (4): net: checksum: move from32to16() to generic header bpf: bpf_csum_diff: optimize and homogenize for all archs selftests/bpf: don't mask result of bpf_csum_diff() in test_verifier selftests/bpf: Add a selftest for bpf_csum_diff() arch/parisc/lib/checksum.c | 13 +- include/net/checksum.h | 6 + lib/checksum.c | 11 +- net/core/filter.c | 39 +- .../selftests/bpf/prog_tests/test_csum_diff.c | 408 ++++++++++++++++++ .../selftests/bpf/progs/csum_diff_test.c | 42 ++ .../bpf/progs/verifier_array_access.c | 3 +- 7 files changed, 471 insertions(+), 51 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_csum_diff.c create mode 100644 tools/testing/selftests/bpf/progs/csum_diff_test.c