From patchwork Thu Apr 3 14:07:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 14036704 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38AFA1F7904; Thu, 3 Apr 2025 14:08:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743689334; cv=none; b=g9U+2uUTswXw1Me81aUQrPiOALtspeshLDmZxIBmqX3Wl3QEz0eDCFrrmH5Nt707JJGArxv+11RlwK/J/O1nuXR6cWbtfkCxsushxp7yJ7fk/cZeFm+Y7Y9NWKrM2L1okpLYw0nvMc9F7ZV/1jhA7OsbbDy/7z/2rTdIBwgD1g8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743689334; c=relaxed/simple; bh=T/U4FfUjXc11pB4SVp6kzwMcW/hzben4xS9LGpx7vog=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=g6RcyPF2mOs2DYRlCuc+4yHRex3AxyCZ6CiNPBSmNFnCwDcnuUcvU9Rtw0k4z653ZgJsEOViFxWKgW0Kd/0phPzqrUv9V6MOlWGr3bOhbcYtY0xozfBV66pPBTyFpjCFu8KeoN7YomcpxtXtkVCqzfF0j7KERg8lXjcf4kDMGkI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=B+2OzKEm; arc=none smtp.client-ip=209.85.219.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="B+2OzKEm" Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-6e8fca43972so10516186d6.1; Thu, 03 Apr 2025 07:08:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743689331; x=1744294131; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=W8LoOtksMJ05+tcbaak6oIEwrH10sRQyogr42LI2cyM=; b=B+2OzKEmR2VlhqiHhJGlo1zqclZF8KoXqSebYoqLG61A4b4z2M4yu2wTirag0xAc8Y V1mRSjsTqPh7OoKie7ELE5Ss94HdigacLI9A6biE3yxHVuYiY9UoAsGi+MhI43I2SO0J ClVsrlLQvV6ui+ljuM0wLXQ2ZIZSyy2ssEl3YJwHu3vNQ83/awDVrKCHhIMXjIfLq/Cv jGiGG6BgkT1M0FHLC0bLu7fDhivaEirTENwwHslgewx6OEsjpnOqxyGkaJ6gnHFHPjtf vqKj81HRlmUkWCGyX1XSQwHihmw/6n6iDZBrFgvFdq13y8xj4BaA4g/h2BYsAaf7aPaF UAqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743689331; x=1744294131; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W8LoOtksMJ05+tcbaak6oIEwrH10sRQyogr42LI2cyM=; b=UMQq+pGOOGqPUKjRitiPx/s/YxuD3L1Uw9kKSyQptYUjZmu71CVeIF2zmFXzxCM5xX 5doSsxssDqvc/crA7A1aa2koa/f4cpoMVkhXPmstlppF3lLzjDdQpBr2IEXwV6TjnH6Z OWzTE9otCI+/djoTLAHKnvi1RtxmLDxqRMIhP9ZJhDmoE9FtAon2iX2cnQndOQ33u+7k CP9iOdHybE/Y1sPn3qaNPN2CTNqjrMZzg4eG6fo5TUNWHLbFgjukNxCA3OmSkbTBH7IX 9DO8MHAIxsJym2as+fTRTEL7VA/M8FsBCAh7pTO0pRdUsW+EaL+NcFEqk1x3F9X+iz50 VSJA== X-Gm-Message-State: AOJu0Yy0c+irNql/1egTvAFPsL4XsW+WDRais86tErYDVL1TpRQN0v5N 67LrOLCYcD7r/soA5A9lHeDnWcjpUZVxzKbG89QKTOwCoDROgcYLFWS1Kw== X-Gm-Gg: ASbGncsseodbZEwlJ87bkhMSHGigyuhqMNgS1Uwly3/2qWfZWoJgCXdT82RZLwDI5nJ TaDVUnPZ1AfkXzTXGYNov//VGs208maMXxT816xuQa8LWADNBz2u5dJL8hLRSh16G28vvLHCTYs PSh7pR+m9fvlu18tSJZhs5wH0VCGmghxj9SHxAMI3H/wuMkiIWS03TtFBJ41/mw8CTr88gkdoEM nfyM7gK8T8Fbgk86uxDzcoWlKTgFApjOqDZd3PZixNzAlENfUp40tREx76FGmal3828Pt++tKDx rrp6oFzyEcs0jXzpCsAcNlr4E/UhQtclHnAaO6sjXhoIk+XZqTSf/vHjrbJiqRDrbrPK6qAC96I 0tb04EWMThXCMClV2gAmvplkgoyxyYY6bxFx1FuyA7lhv X-Google-Smtp-Source: AGHT+IFhlyCrNpVpozyI8tGD8fSrl6D0978V2VC5KqjdoE5alKtWstIdlxR8E3jZCBkWhwzWdL46xg== X-Received: by 2002:a05:6214:dcc:b0:6e4:2e12:3a0c with SMTP id 6a1803df08f44-6ef0dd1c648mr34013796d6.39.1743689330718; Thu, 03 Apr 2025 07:08:50 -0700 (PDT) Received: from willemb.c.googlers.com.com (86.235.150.34.bc.googleusercontent.com. [34.150.235.86]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ef0f16535bsm7895946d6.123.2025.04.03.07.08.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Apr 2025 07:08:50 -0700 (PDT) From: Willem de Bruijn To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com, Willem de Bruijn , Matt Moeller , =?utf-8?q?Maciej_=C5=BBenczykowski?= Subject: [PATCH bpf 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags Date: Thu, 3 Apr 2025 10:07:45 -0400 Message-ID: <20250403140846.1268564-2-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.49.0.472.ge94155a9ec-goog In-Reply-To: <20250403140846.1268564-1-willemdebruijn.kernel@gmail.com> References: <20250403140846.1268564-1-willemdebruijn.kernel@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Willem de Bruijn Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to read when these offsets extend into frags. This has been observed with iwlwifi and reproduced with tun with IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, applied to a RAW socket, will silently miss matching packets. const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); struct sock_filter filter_code[] = { BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), This is unexpected behavior. Socket filter programs should be consistent regardless of environment. Silent misses are particularly concerning as hard to detect. Use skb_copy_bits for offsets outside linear, same as done for non-SKF_(LL|NET) offsets. Offset is always positive after subtracting the reference threshold SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of the two is an offset against skb->data, and may be negative, but it cannot point before skb->head, as skb_(mac|network)_offset would too. This appears to go back to when frag support was introduced to sk_run_filter in linux-2.4.4, before the introduction of git. The amount of code change and 8/16/32 bit duplication are unfortunate. But any attempt I made to be smarter saved very few LoC while complicating the code. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 Reported-by: Matt Moeller Co-developed-by: Maciej Żenczykowski Signed-off-by: Maciej Żenczykowski Signed-off-by: Willem de Bruijn --- include/linux/filter.h | 3 -- kernel/bpf/core.c | 21 ------------ net/core/filter.c | 75 +++++++++++++++++++++++------------------- 3 files changed, 42 insertions(+), 57 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index f5cf4d35d83e..708ac7e0cd36 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) } } -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, - int k, unsigned int size); - static inline int bpf_tell_extensions(void) { return SKF_AD_MAX; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index ba6b6118cf50..0e836b5ac9a0 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -68,27 +68,6 @@ struct bpf_mem_alloc bpf_global_ma; bool bpf_global_ma_set; -/* No hurry in this branch - * - * Exported for the bpf jit load helper. - */ -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) -{ - u8 *ptr = NULL; - - if (k >= SKF_NET_OFF) { - ptr = skb_network_header(skb) + k - SKF_NET_OFF; - } else if (k >= SKF_LL_OFF) { - if (unlikely(!skb_mac_header_was_set(skb))) - return NULL; - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; - } - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) - return ptr; - - return NULL; -} - /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */ enum page_size_enum { __PAGE_SIZE = PAGE_SIZE diff --git a/net/core/filter.c b/net/core/filter.c index bc6828761a47..b232b70dd10d 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x) BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - u8 tmp, *ptr; + u8 tmp; const int len = sizeof(tmp); - if (offset >= 0) { - if (headlen - offset >= len) - return *(u8 *)(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return tmp; - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return *(u8 *)ptr; + if (offset < 0) { + if (offset >= SKF_NET_OFF) + offset += skb_network_offset(skb) - SKF_NET_OFF; + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) + offset += skb_mac_offset(skb) - SKF_LL_OFF; + else + return -EFAULT; } - return -EFAULT; + if (headlen - offset >= len) + return *(u8 *)(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return tmp; + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb, @@ -248,21 +251,24 @@ BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb, BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - __be16 tmp, *ptr; + __be16 tmp; const int len = sizeof(tmp); - if (offset >= 0) { - if (headlen - offset >= len) - return get_unaligned_be16(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return be16_to_cpu(tmp); - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return get_unaligned_be16(ptr); + if (offset < 0) { + if (offset >= SKF_NET_OFF) + offset += skb_network_offset(skb) - SKF_NET_OFF; + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) + offset += skb_mac_offset(skb) - SKF_LL_OFF; + else + return -EFAULT; } - return -EFAULT; + if (headlen - offset >= len) + return get_unaligned_be16(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return be16_to_cpu(tmp); + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb, @@ -275,21 +281,24 @@ BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb, BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - __be32 tmp, *ptr; + __be32 tmp; const int len = sizeof(tmp); - if (likely(offset >= 0)) { - if (headlen - offset >= len) - return get_unaligned_be32(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return be32_to_cpu(tmp); - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return get_unaligned_be32(ptr); + if (offset < 0) { + if (offset >= SKF_NET_OFF) + offset += skb_network_offset(skb) - SKF_NET_OFF; + else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) + offset += skb_mac_offset(skb) - SKF_LL_OFF; + else + return -EFAULT; } - return -EFAULT; + if (headlen - offset >= len) + return get_unaligned_be32(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return be32_to_cpu(tmp); + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_32_no_cache, const struct sk_buff *, skb, From patchwork Thu Apr 3 14:07:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 14036705 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A78C724EA92; Thu, 3 Apr 2025 14:08:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743689334; cv=none; b=msgayVUfJeQVpNf7CqJDTnBxGqctPhpthygB4aO6l5+/tUzwyAQvUshMhO1r+3IWXqYtppLOtb1l+YPb5NCMgjekf+Lid20x31oRFEt9YZ8W5NduZoEnsCAoBRKioxaLMZAPZ1HXGJp9pCTKD0PKvAKSNy56RLLuCfFD2gnlJgI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743689334; c=relaxed/simple; bh=OfOHx2fu96pDmnajsY5heFUzTJlh/NSXDHqRd3jIRSc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kIgpFSKu0SPvMiLS9LueHSdnu1MTB3CFpK6tOZszmTTdZbgvX04UeSGn6Zv/wJsLP3hZjgAdSGBiNIREG3WQGBLMPqQ78krqY1pfVN+s8q4Z0DQRLPZXJx2lbwhvkSLCT/EoFRGvjLZi44M/WvrNzZ+FjkOYioZ5cPV9FafKMQw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=c9zRx7pP; arc=none smtp.client-ip=209.85.219.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="c9zRx7pP" Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-6eaf1b6ce9aso10939716d6.2; Thu, 03 Apr 2025 07:08:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743689331; x=1744294131; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=izFHJhSELxKokt+h8cixhEiJRvyvay3XdW2U7WVlQbE=; b=c9zRx7pPQ/aSS2Bgwb3acxVVyKVal1u5TKckuWPThiPrMBtxFqSkSWODiNhPiNdGi+ lTmrWC3+M4BLLHeocJU8IrpslFyblycQs2uCD1yHuHSX5o3RU5dnMqSDiMJUzLevNKyJ DL1Lo7HIDkJsacnThruWrv0ZNdjPJ9B8Q3vLuTVnqCIBeuCOFXRN/g90dA+A2HD1kzrp 1ABWE29wJrU/Ftu264iz4uCArfZM0ZL5yBnRk4Q9Qoodyx/3wTjIm5ywffSrfdl6hRlp 27dfGTvV5R4hMjKCvne3JDw/pc9kWoZ302vxuoT7uuFAOXG6UTxeAkAxun5ICgJRkg5P NJMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743689331; x=1744294131; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=izFHJhSELxKokt+h8cixhEiJRvyvay3XdW2U7WVlQbE=; b=OROcLCKEtpxcSNE/6TRY3s39+edtXGp0ZjLKgRoElvq7akjNLy+XwMW0uDKNh9sQ8T jpfpZVDrKR0nqSLXW5UamT35g+xB2tsy1BmA2CsxgPE/QTiPBwTnEm0teINso6XiTvsb 3Rns442zojwKwvjT7nWScYRtIRZOPMSl5P018PLbLyoX6ocWaTda7ohuRGo3eX6nP2RO wug6wgGjnByv4/kv3g2pj6tHQvkcGgJfR2C262GA12QmvH+7IKB7HIpGKctU28qpA0zo dUPlZ79H2a23uuPjIpdpd7zYfTR3Ftf0uqc2qRnsGCqTNgRq8UfmLRkIfBY9CDHDP9kN 032w== X-Gm-Message-State: AOJu0YxM+dNcHNQbi516CY0/TopKtkz8A2DaxoSNt7igvwOMsAYUjGVK M4uPP9vDeO68MDJ1RWmYD30QfnxK+N4gIh7CGaSBrcUMlYxSNpGNOCYmfQ== X-Gm-Gg: ASbGncsrCbnqUdpVzNYJCUMcOz+AWZsDIdBM6B5sXmWT10JOt+T68ARBdM5IexH7hUo Amm6obuBu3Ohge+rOkrK3sJuY2X6xRJSuyqLkyBB8gvwQxQWmT9FhnS2Z21wfgKRPP6e5xPR9lg +KpVp6fKAbvNI45z+5WoVVCSonjO8Hf0LLE1D1lnEenApZ+JPq8MLxyP/BD0+MVpVzFnyXwbSFr k+RxB4jIZsSx+8lLJSX1d4rgbC2uAbmFmg8qLHFpSagi/x/JFXgSQ9iXa883s1uztuV0BtC1Utn tu06WVtTipea1I6JVlZNY0CkefbYFli1LQoQlLaTc1+wLYcWXza1uCUaQpT6VRINd+o1nc1TW6A CsoIMGmtbxAsgn2hTmLqQBR3AhdmX6614byBaMGrlN4XU X-Google-Smtp-Source: AGHT+IG7NXIXcPoLQjMfbvhOFfFNk4XoLIz6U4pX5N6AEOr4Wamud9ChaPpmHszJQwXlc1G91DIouw== X-Received: by 2002:a05:6214:240f:b0:6d8:8a8f:75b0 with SMTP id 6a1803df08f44-6ef02bea643mr99479106d6.14.1743689331504; Thu, 03 Apr 2025 07:08:51 -0700 (PDT) Received: from willemb.c.googlers.com.com (86.235.150.34.bc.googleusercontent.com. [34.150.235.86]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ef0f16535bsm7895946d6.123.2025.04.03.07.08.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Apr 2025 07:08:51 -0700 (PDT) From: Willem de Bruijn To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com, Willem de Bruijn Subject: [PATCH bpf 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Date: Thu, 3 Apr 2025 10:07:46 -0400 Message-ID: <20250403140846.1268564-3-willemdebruijn.kernel@gmail.com> X-Mailer: git-send-email 2.49.0.472.ge94155a9ec-goog In-Reply-To: <20250403140846.1268564-1-willemdebruijn.kernel@gmail.com> References: <20250403140846.1268564-1-willemdebruijn.kernel@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Willem de Bruijn Verify that a classic BPF linux socket filter correctly matches packet contents. Including when accessing contents in an skb_frag. 1. Open a SOCK_RAW socket with a classic BPF filter on UDP dport 8000. 2. Open a tap device with IFF_NAPI_FRAGS to inject skbs with frags. 3. Send a packet for which the UDP header is in frag[0]. 4. Receive this packet to demonstrate that the socket accepted it. Signed-off-by: Willem de Bruijn Acked-by: Stanislav Fomichev --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 2 + tools/testing/selftests/net/skf_net_off.c | 244 +++++++++++++++++++++ tools/testing/selftests/net/skf_net_off.sh | 28 +++ 4 files changed, 275 insertions(+) create mode 100644 tools/testing/selftests/net/skf_net_off.c create mode 100755 tools/testing/selftests/net/skf_net_off.sh diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 679542f565a4..532bb732bc6d 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -39,6 +39,7 @@ scm_rights sk_bind_sendto_listen sk_connect_zero_addr sk_so_peek_off +skf_net_off socket so_incoming_cpu so_netns_cookie diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 6d718b478ed8..124078b56fa4 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -106,6 +106,8 @@ TEST_PROGS += ipv6_route_update_soft_lockup.sh TEST_PROGS += busy_poll_test.sh TEST_GEN_PROGS += proc_net_pktgen TEST_PROGS += lwt_dst_cache_ref_loop.sh +TEST_PROGS += skf_net_off.sh +TEST_GEN_FILES += skf_net_off # YNL files, must be before "include ..lib.mk" YNL_GEN_FILES := busy_poller netlink-dumps diff --git a/tools/testing/selftests/net/skf_net_off.c b/tools/testing/selftests/net/skf_net_off.c new file mode 100644 index 000000000000..1fdf61d6cd7f --- /dev/null +++ b/tools/testing/selftests/net/skf_net_off.c @@ -0,0 +1,244 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* Open a tun device. + * + * [modifications: use IFF_NAPI_FRAGS, add sk filter] + * + * Expects the device to have been configured previously, e.g.: + * sudo ip tuntap add name tap1 mode tap + * sudo ip link set tap1 up + * sudo ip link set dev tap1 addr 02:00:00:00:00:01 + * sudo ip -6 addr add fdab::1 peer fdab::2 dev tap1 nodad + * + * And to avoid premature pskb_may_pull: + * + * sudo ethtool -K tap1 gro off + * sudo bash -c 'echo 0 > /proc/sys/net/ipv4/ip_early_demux' + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static bool cfg_do_filter; +static bool cfg_do_frags; +static int cfg_dst_port = 8000; +static char *cfg_ifname; + +static int tun_open(const char *tun_name) +{ + struct ifreq ifr = {0}; + int fd, ret; + + fd = open("/dev/net/tun", O_RDWR); + if (fd == -1) + error(1, errno, "open /dev/net/tun"); + + ifr.ifr_flags = IFF_TAP; + if (cfg_do_frags) + ifr.ifr_flags |= IFF_NAPI | IFF_NAPI_FRAGS; + + strncpy(ifr.ifr_name, tun_name, IFNAMSIZ - 1); + + ret = ioctl(fd, TUNSETIFF, &ifr); + if (ret) + error(1, ret, "ioctl TUNSETIFF"); + + return fd; +} + +static void sk_set_filter(int fd) +{ + const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); + const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); + + /* Filter UDP packets with destination port cfg_dst_port */ + struct sock_filter filter_code[] = { + BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), + BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), + BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), + BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), + BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), + BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, cfg_dst_port, 1, 0), + BPF_STMT(BPF_RET + BPF_K, 0), + BPF_STMT(BPF_RET + BPF_K, 0xFFFF), + }; + + struct sock_fprog filter = { + sizeof(filter_code) / sizeof(filter_code[0]), + filter_code, + }; + + if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter))) + error(1, errno, "setsockopt attach filter"); +} + +static int raw_open(void) +{ + int fd; + + fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP); + if (fd == -1) + error(1, errno, "socket raw (udp)"); + + if (cfg_do_filter) + sk_set_filter(fd); + + return fd; +} + +static void tun_write(int fd) +{ + const char eth_src[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x02 }; + const char eth_dst[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x01 }; + struct tun_pi pi = {0}; + struct ipv6hdr ip6h = {0}; + struct udphdr uh = {0}; + struct ethhdr eth = {0}; + uint32_t payload; + struct iovec iov[5]; + int ret; + + pi.proto = htons(ETH_P_IPV6); + + memcpy(eth.h_source, eth_src, sizeof(eth_src)); + memcpy(eth.h_dest, eth_dst, sizeof(eth_dst)); + eth.h_proto = htons(ETH_P_IPV6); + + ip6h.version = 6; + ip6h.payload_len = htons(sizeof(uh) + sizeof(uint32_t)); + ip6h.nexthdr = IPPROTO_UDP; + ip6h.hop_limit = 8; + if (inet_pton(AF_INET6, "fdab::2", &ip6h.saddr) != 1) + error(1, errno, "inet_pton src"); + if (inet_pton(AF_INET6, "fdab::1", &ip6h.daddr) != 1) + error(1, errno, "inet_pton src"); + + uh.source = htons(8000); + uh.dest = htons(cfg_dst_port); + uh.len = ip6h.payload_len; + uh.check = 0; + + payload = htonl(0xABABABAB); /* Covered in IPv6 length */ + + iov[0].iov_base = π + iov[0].iov_len = sizeof(pi); + iov[1].iov_base = ð + iov[1].iov_len = sizeof(eth); + iov[2].iov_base = &ip6h; + iov[2].iov_len = sizeof(ip6h); + iov[3].iov_base = &uh; + iov[3].iov_len = sizeof(uh); + iov[4].iov_base = &payload; + iov[4].iov_len = sizeof(payload); + + ret = writev(fd, iov, sizeof(iov) / sizeof(iov[0])); + if (ret <= 0) + error(1, errno, "writev"); +} + +static void raw_read(int fd) +{ + struct timeval tv = { .tv_usec = 100 * 1000 }; + struct msghdr msg = {0}; + struct iovec iov[2]; + struct udphdr uh; + uint32_t payload[2]; + int ret; + + if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv))) + error(1, errno, "setsockopt rcvtimeo udp"); + + iov[0].iov_base = &uh; + iov[0].iov_len = sizeof(uh); + + iov[1].iov_base = payload; + iov[1].iov_len = sizeof(payload); + + msg.msg_iov = iov; + msg.msg_iovlen = sizeof(iov) / sizeof(iov[0]); + + ret = recvmsg(fd, &msg, 0); + if (ret <= 0) + error(1, errno, "read raw"); + if (ret != sizeof(uh) + sizeof(payload[0])) + error(1, errno, "read raw: len=%d\n", ret); + + fprintf(stderr, "raw recv: 0x%x\n", payload[0]); +} + +static void parse_opts(int argc, char **argv) +{ + int c; + + while ((c = getopt(argc, argv, "fFi:")) != -1) { + switch (c) { + case 'f': + cfg_do_filter = true; + printf("bpf filter enabled\n"); + break; + case 'F': + cfg_do_frags = true; + printf("napi frags mode enabled\n"); + break; + case 'i': + cfg_ifname = optarg; + break; + default: + error(1, 0, "unknown option %c", optopt); + break; + } + } + + if (!cfg_ifname) + error(1, 0, "must specify tap interface name (-i)"); +} + +int main(int argc, char **argv) +{ + int fdt, fdr; + + parse_opts(argc, argv); + + fdr = raw_open(); + fdt = tun_open(cfg_ifname); + + tun_write(fdt); + raw_read(fdr); + + if (close(fdt)) + error(1, errno, "close tun"); + if (close(fdr)) + error(1, errno, "close udp"); + + fprintf(stderr, "OK\n"); + return 0; +} + diff --git a/tools/testing/selftests/net/skf_net_off.sh b/tools/testing/selftests/net/skf_net_off.sh new file mode 100755 index 000000000000..e9cce93a0258 --- /dev/null +++ b/tools/testing/selftests/net/skf_net_off.sh @@ -0,0 +1,28 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +readonly NS="ns-$(mktemp -u XXXXXX)" + +cleanup() { + ip netns del $NS +} + +ip netns add $NS +trap cleanup EXIT + +ip -netns $NS link set lo up +ip -netns $NS tuntap add name tap1 mode tap +ip -netns $NS link set tap1 up +ip -netns $NS link set dev tap1 addr 02:00:00:00:00:01 +ip -netns $NS -6 addr add fdab::1 peer fdab::2 dev tap1 nodad +ip netns exec $NS ethtool -K tap1 gro off +ip netns exec $NS sysctl -w net.ipv4.ip_early_demux=0 + +echo "no filter" +ip netns exec $NS ./skf_net_off -i tap1 + +echo "filter, linear skb (-f)" +ip netns exec $NS ./skf_net_off -i tap1 -f + +echo "filter, fragmented skb (-f) (-F)" +ip netns exec $NS ./skf_net_off -i tap1 -f -F