From patchwork Sat Oct 26 12:53:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Puranjay Mohan X-Patchwork-Id: 13852232 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5A217D10BF8 for ; Sat, 26 Oct 2024 12:54:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:To:From:Reply-To:Cc:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=h7u9dDbhS2i8KQRjRUGiLXta2+R/L9q4GGSG6j/DsLE=; b=Kt/qdvKKiqyLHs REgs6kY+1/V2+0SJj7WizM5squulleYzGIn4BFo8UaUoAoQLKQpkq0YbMIWcN9jSzPIdqUoV820J8 lk/+J0bVwW7yF1IOMRXEGqyZVXS+6vRs5Zl4zbXf92G6HwqouOs49WJm6GIxJKAFN6A1wzkVB7A6Y V7Yb4PHsS5TjNQm0YV1Wg7veYY9lp4BwyJpvWm4BSExn4JPSkOQZ0NktNbJr9jofT1BkjA9Oh4Wnk Z2GbD2ZjDlrUkBscIu+mV2WXuO+IgnUUIpy52GzfM3Lz4wixPYlk8nUTlhEYHdIKdy9/FWy1CkCa6 7EQ0ZJmi9jF/dRjHOIqg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t4gJ8-00000006eI4-1XNU; Sat, 26 Oct 2024 12:54:18 +0000 Received: from dfw.source.kernel.org ([2604:1380:4641:c500::1]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t4gJ4-00000006eGZ-2tDm for linux-riscv@lists.infradead.org; Sat, 26 Oct 2024 12:54:16 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id BD9D35C5597; Sat, 26 Oct 2024 12:53:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 37E49C4CEC7; Sat, 26 Oct 2024 12:54:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1729947253; bh=Ujw2Hx9DD5sP6nD9PmVq0gKu17NlxlXnL4Y7yK16U4k=; h=From:To:Subject:Date:In-Reply-To:References:From; b=btyTjR7CoXrnFn9lB6V/WCvaEzBKts+Yl58SYsB/qjtH3KczdDkseAvEsdbL/fSND Jy2DnH0yhW+Pw0bC2EqUmQAc+xP/auYt7/P+AD4leNL7UwvNeVnVUDjEwqNICpu9ny RjiuuuXYpKAAwN224/coLKOktsO2NQ+E+CRrA17rCYhrMcv/xL9K5A9S5GMxLzaqEO aCxCco/gPRlGAuc6cGtXte3F9d78sO+/dVg3M47YGGc4/5q957oXncR6d7EUWvgW8/ 4IODCmz/VJjGlkK+t/rZ8b7VAblnCK8VuTQ+IZU3gan+ckCtvjkMq9mvfiapYk4s8q zAe5Utsh+YuRg== From: Puranjay Mohan To: Albert Ou , Alexei Starovoitov , Andrew Morton , Andrii Nakryiko , bpf@vger.kernel.org, Daniel Borkmann , "David S. Miller" , Eduard Zingerman , Eric Dumazet , Hao Luo , Helge Deller , Jakub Kicinski , "James E.J. Bottomley" , Jiri Olsa , John Fastabend , KP Singh , linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org, linux-riscv@lists.infradead.org, Martin KaFai Lau , Mykola Lysenko , netdev@vger.kernel.org, Palmer Dabbelt , Paolo Abeni , Paul Walmsley , Puranjay Mohan , Puranjay Mohan , Shuah Khan , Song Liu , Stanislav Fomichev , Yonghong Song Subject: [PATCH bpf-next v3 2/4] bpf: bpf_csum_diff: optimize and homogenize for all archs Date: Sat, 26 Oct 2024 12:53:37 +0000 Message-Id: <20241026125339.26459-3-puranjay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20241026125339.26459-1-puranjay@kernel.org> References: <20241026125339.26459-1-puranjay@kernel.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241026_055414_840661_40DE5BAB X-CRM114-Status: GOOD ( 19.66 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org 1. Optimization ------------ The current implementation copies the 'from' and 'to' buffers to a scratchpad and it takes the bitwise NOT of 'from' buffer while copying. In the next step csum_partial() is called with this scratchpad. so, mathematically, the current implementation is doing: result = csum(to - from) Here, 'to' and '~ from' are copied in to the scratchpad buffer, we need it in the scratchpad buffer because csum_partial() takes a single contiguous buffer and not two disjoint buffers like 'to' and 'from'. We can re write this equation to: result = csum(to) - csum(from) using the distributive property of csum(). this allows 'to' and 'from' to be at different locations and therefore this scratchpad and copying is not needed. This in C code will look like: result = csum_sub(csum_partial(to, to_size, seed), csum_partial(from, from_size, 0)); 2. Homogenization -------------- The bpf_csum_diff() helper calls csum_partial() which is implemented by some architectures like arm and x86 but other architectures rely on the generic implementation in lib/checksum.c The generic implementation in lib/checksum.c returns a 16 bit value but the arch specific implementations can return more than 16 bits, this works out in most places because before the result is used, it is passed through csum_fold() that turns it into a 16-bit value. bpf_csum_diff() directly returns the value from csum_partial() and therefore the returned values could be different on different architectures. see discussion in [1]: for the int value 28 the calculated checksums are: x86 : -29 : 0xffffffe3 generic (arm64, riscv) : 65507 : 0x0000ffe3 arm : 131042 : 0x0001ffe2 Pass the result of bpf_csum_diff() through from32to16() before returning to homogenize this result for all architectures. NOTE: from32to16() is used instead of csum_fold() because csum_fold() does from32to16() + bitwise NOT of the result, which is not what we want to do here. [1] https://lore.kernel.org/bpf/CAJ+HfNiQbOcqCLxFUP2FMm5QrLXUUaj852Fxe3hn_2JNiucn6g@mail.gmail.com/ Signed-off-by: Puranjay Mohan Acked-by: Daniel Borkmann Reviewed-by: Toke Høiland-Jørgensen --- net/core/filter.c | 39 +++++++++++---------------------------- 1 file changed, 11 insertions(+), 28 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index e31ee8be2de07..f2f8e64f19066 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1654,18 +1654,6 @@ void sk_reuseport_prog_free(struct bpf_prog *prog) bpf_prog_destroy(prog); } -struct bpf_scratchpad { - union { - __be32 diff[MAX_BPF_STACK / sizeof(__be32)]; - u8 buff[MAX_BPF_STACK]; - }; - local_lock_t bh_lock; -}; - -static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp) = { - .bh_lock = INIT_LOCAL_LOCK(bh_lock), -}; - static inline int __bpf_try_make_writable(struct sk_buff *skb, unsigned int write_len) { @@ -2022,11 +2010,6 @@ static const struct bpf_func_proto bpf_l4_csum_replace_proto = { BPF_CALL_5(bpf_csum_diff, __be32 *, from, u32, from_size, __be32 *, to, u32, to_size, __wsum, seed) { - struct bpf_scratchpad *sp = this_cpu_ptr(&bpf_sp); - u32 diff_size = from_size + to_size; - int i, j = 0; - __wsum ret; - /* This is quite flexible, some examples: * * from_size == 0, to_size > 0, seed := csum --> pushing data @@ -2035,19 +2018,19 @@ BPF_CALL_5(bpf_csum_diff, __be32 *, from, u32, from_size, * * Even for diffing, from_size and to_size don't need to be equal. */ - if (unlikely(((from_size | to_size) & (sizeof(__be32) - 1)) || - diff_size > sizeof(sp->diff))) - return -EINVAL; - local_lock_nested_bh(&bpf_sp.bh_lock); - for (i = 0; i < from_size / sizeof(__be32); i++, j++) - sp->diff[j] = ~from[i]; - for (i = 0; i < to_size / sizeof(__be32); i++, j++) - sp->diff[j] = to[i]; + __wsum ret = seed; - ret = csum_partial(sp->diff, diff_size, seed); - local_unlock_nested_bh(&bpf_sp.bh_lock); - return ret; + if (from_size && to_size) + ret = csum_sub(csum_partial(to, to_size, ret), + csum_partial(from, from_size, 0)); + else if (to_size) + ret = csum_partial(to, to_size, ret); + + else if (from_size) + ret = ~csum_partial(from, from_size, ~ret); + + return csum_from32to16((__force unsigned int)ret); } static const struct bpf_func_proto bpf_csum_diff_proto = {