From patchwork Fri Aug 18 02:58:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhai X-Patchwork-Id: 13357242 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 622C6C3DA6F for ; Fri, 18 Aug 2023 02:59:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357349AbjHRC62 (ORCPT ); Thu, 17 Aug 2023 22:58:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357343AbjHRC6Q (ORCPT ); Thu, 17 Aug 2023 22:58:16 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C79AE210D for ; Thu, 17 Aug 2023 19:58:14 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id d75a77b69052e-40ff796e8ddso3446011cf.2 for ; Thu, 17 Aug 2023 19:58:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1692327494; x=1692932294; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=B//PxqTOcLRZSFYVWOXGViYC9OULMJhe+RhXHERECls=; b=tmv/IW4eAJDthaFDwjvi9IaSF4Rcud5RPAMq/Cw6bY18saA/WzMcE2wsVXdr2M9HuI ELfdKCcPerKH0tBnpuC8ZnlQx1DdOGyQU12dGiagEAyQZXRak7SX8KpqDsdc4b4Igmcw 3dfj9Vwi3iaCkZkmtpe6IaOU7anpUGK0uLf/0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692327494; x=1692932294; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=B//PxqTOcLRZSFYVWOXGViYC9OULMJhe+RhXHERECls=; b=TkadSeL2B0+14T6YBNYnT8ZKxysBXVSAFAvteYVSezeOnaMUTe5wA4rL8xk8JLtCyL Eka3zYYjrHnRUPdLhiYm0M0KVZfncjQ4F9/sMWaffH05SneQD5j6//89GMXS1puZ4wXC 7zt3I0/jGzSeLy4ne9XVtpwUGM9vydu4kE9HtMZoEcxe87BG0Qx8+gIroxMB5RhZ//SP kubuaUY6UByqKENBGKlRZ+PKeeeJSsYnPYHPYHw7KNyjVECNOypxU73sSFgH/8PCUGBI +y/C2X4U89zibbARhfXdXkc1MzHyXw+cSqNZxMnDOhgIxLdXb5Fsffouqzb/xrzPyiUo GIWA== X-Gm-Message-State: AOJu0Yyq+7YEjVFnYVSL75uZh1rVsLxnx3LFQkZO28/T+DkzFD9hIzHI F1o+B/daraXz+wBIgaYaqgzmUw== X-Google-Smtp-Source: AGHT+IH7rjUWj757WVMp0Qqu31Vckq9gjzUG4qfABg1kPLHXqiniH9ZGEnmCzeHgJSsJdQ3Rc6uMDA== X-Received: by 2002:ac8:5742:0:b0:408:2e66:3a1b with SMTP id 2-20020ac85742000000b004082e663a1bmr2008883qtx.3.1692327493979; Thu, 17 Aug 2023 19:58:13 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id m17-20020a0cdb91000000b0063d3744c5c5sm350536qvk.5.2023.08.17.19.58.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 19:58:13 -0700 (PDT) Date: Thu, 17 Aug 2023 19:58:11 -0700 From: Yan Zhai To: bpf@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , Yan Zhai , Thomas Graf , Jordan Griege , Dan Carpenter , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v6 bpf 1/4] lwt: fix return values of BPF xmit ops Message-ID: <0d2b878186cfe215fec6b45769c1cd0591d3628d.1692326837.git.yan@cloudflare.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org BPF encap ops can return different types of positive values, such like NET_RX_DROP, NET_XMIT_CN, NETDEV_TX_BUSY, and so on, from function skb_do_redirect and bpf_lwt_xmit_reroute. At the xmit hook, such return values would be treated implicitly as LWTUNNEL_XMIT_CONTINUE in ip(6)_finish_output2. When this happens, skbs that have been freed would continue to the neighbor subsystem, causing use-after-free bug and kernel crashes. To fix the incorrect behavior, skb_do_redirect return values can be simply discarded, the same as tc-egress behavior. On the other hand, bpf_lwt_xmit_reroute returns useful errors to local senders, e.g. PMTU information. Thus convert its return values to avoid the conflict with LWTUNNEL_XMIT_CONTINUE. Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Suggested-by: Martin KaFai Lau Suggested-by: Stanislav Fomichev Reported-by: Jordan Griege Signed-off-by: Yan Zhai --- * v5: discards skb_do_redirect return instead; convert bpf_lwt_xmit_reroute return; * v4: minor commit message changes * v3: converts skb_do_redirect statuses from both ingress and egress * v2: code style amend --- net/core/lwt_bpf.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c index 8b6b5e72b217..4a0797f0a154 100644 --- a/net/core/lwt_bpf.c +++ b/net/core/lwt_bpf.c @@ -60,9 +60,8 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, ret = BPF_OK; } else { skb_reset_mac_header(skb); - ret = skb_do_redirect(skb); - if (ret == 0) - ret = BPF_REDIRECT; + skb_do_redirect(skb); + ret = BPF_REDIRECT; } break; @@ -255,7 +254,7 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb) err = dst_output(dev_net(skb_dst(skb)->dev), skb->sk, skb); if (unlikely(err)) - return err; + return net_xmit_errno(err); /* ip[6]_finish_output2 understand LWTUNNEL_XMIT_DONE */ return LWTUNNEL_XMIT_DONE; From patchwork Fri Aug 18 02:58:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhai X-Patchwork-Id: 13357243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0C87C7113B for ; Fri, 18 Aug 2023 02:59:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357354AbjHRC62 (ORCPT ); Thu, 17 Aug 2023 22:58:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357344AbjHRC6S (ORCPT ); Thu, 17 Aug 2023 22:58:18 -0400 Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBE351BB for ; Thu, 17 Aug 2023 19:58:16 -0700 (PDT) Received: by mail-qv1-xf35.google.com with SMTP id 6a1803df08f44-645181e1eaeso2347996d6.1 for ; Thu, 17 Aug 2023 19:58:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1692327496; x=1692932296; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=t1rveI8qZCR4C4wEbefcxj73KFhEoSnYcRQTDwgiQK4=; b=WQimjvM94ux9B2GBJcDjxNN5CkfPSEt71Ija0YYQ3vdFMYAD9FTBOwEw7+LGPn2WXw /jP2CNqXbUtqXqJ6bDq/sqomHkPUSAde1yqm7ATlWBGqpYwyKtBBhUy0dbwBVKp6erAL Jzn59/doHMy+T6peWT9DJZLX+CC4ogZgBYM9A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692327496; x=1692932296; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=t1rveI8qZCR4C4wEbefcxj73KFhEoSnYcRQTDwgiQK4=; b=booRTniu7FCXj9NMBb4IpZWkam4GCS7w90BhvjHgKH/gAMqhRK3M/ek2ymSpxSLX6n 6EcRDYW2Q1nUmSAIoPwN8VJ2nF54x4owXogwG0V1O4kKNOkz3tGaT1SnnG6GrMXqcBwo ZnTdccX/a58k73Cv65LRaP12oCg1wvj5hC9svD2GNiAUs0KJuOaRn3CsevEcKwqIOcmm drL7wlkjxTLQ2SzftmVB+MSyriY18/b/bDCd4kTch67TUBLjs2BKTnGxfmyUW61+eeQ3 OdYnV83eLb0domgVxkeDeAc0ps5p0nsCf48TTOK0/hTYninJOBHkSuw48iYIoVGmX2VM SZew== X-Gm-Message-State: AOJu0Yw07A4tKG5uLSIwFqoJuZ5pQHu1gnNtZ7X0ZczLxtrKkrr4YfqB tvV+RUtAmR2LZs+eBpwhu71v8g== X-Google-Smtp-Source: AGHT+IE/X2UwchhA+IgpKCTMnMGUnKCh5ViUxP8U7VUsaqTciMUXsTSkymEdxNhNHL25ffS47NYOiA== X-Received: by 2002:ad4:4ba4:0:b0:64a:bff7:90e6 with SMTP id i4-20020ad44ba4000000b0064abff790e6mr1258205qvw.65.1692327496080; Thu, 17 Aug 2023 19:58:16 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id w14-20020a0ce10e000000b006427145590csm347998qvk.48.2023.08.17.19.58.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 19:58:15 -0700 (PDT) Date: Thu, 17 Aug 2023 19:58:14 -0700 From: Yan Zhai To: bpf@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , Yan Zhai , Thomas Graf , Jordan Griege , Dan Carpenter , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v6 bpf 2/4] lwt: check LWTUNNEL_XMIT_CONTINUE strictly Message-ID: <96b939b85eda00e8df4f7c080f770970a4c5f698.1692326837.git.yan@cloudflare.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2, such that any positive return value from a xmit hook could cause unexpected continue behavior, despite that related skb may have been freed. This could be error-prone for future xmit hook ops. One of the possible errors is to return statuses of dst_output directly. To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to distinguish from dst_output statuses and check the continue condition explicitly. Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Suggested-by: Dan Carpenter Signed-off-by: Yan Zhai --- include/net/lwtunnel.h | 5 ++++- net/ipv4/ip_output.c | 2 +- net/ipv6/ip6_output.c | 2 +- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index 6f15e6fa154e..53bd2d02a4f0 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -16,9 +16,12 @@ #define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) #define LWTUNNEL_STATE_XMIT_REDIRECT BIT(2) +/* LWTUNNEL_XMIT_CONTINUE should be distinguishable from dst_output return + * values (NET_XMIT_xxx and NETDEV_TX_xxx in linux/netdevice.h) for safety. + */ enum { LWTUNNEL_XMIT_DONE, - LWTUNNEL_XMIT_CONTINUE, + LWTUNNEL_XMIT_CONTINUE = 0x100, }; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 6ba1a0fafbaa..a6e4c82615d7 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -216,7 +216,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s if (lwtunnel_xmit_redirect(dst->lwtstate)) { int res = lwtunnel_xmit(skb); - if (res < 0 || res == LWTUNNEL_XMIT_DONE) + if (res != LWTUNNEL_XMIT_CONTINUE) return res; } diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 1e8c90e97608..016b0a513259 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -113,7 +113,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff * if (lwtunnel_xmit_redirect(dst->lwtstate)) { int res = lwtunnel_xmit(skb); - if (res < 0 || res == LWTUNNEL_XMIT_DONE) + if (res != LWTUNNEL_XMIT_CONTINUE) return res; } From patchwork Fri Aug 18 02:58:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhai X-Patchwork-Id: 13357245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98B93C71136 for ; Fri, 18 Aug 2023 02:59:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357355AbjHRC63 (ORCPT ); Thu, 17 Aug 2023 22:58:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357346AbjHRC6V (ORCPT ); Thu, 17 Aug 2023 22:58:21 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 356F01BB for ; Thu, 17 Aug 2023 19:58:19 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id d75a77b69052e-4039f7e1d3aso3478111cf.0 for ; Thu, 17 Aug 2023 19:58:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1692327498; x=1692932298; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=xZRdt7TwlHaLP0+ES5VWtWujEx9GcoxM9w8+usdPhy8=; b=EXgOT1iGkPcUAOXmRevjKAN4Y/z7rGMJ9JSa12whO8btCspwXu3I9U4BfvVzhfYq+q HSkge3x52hky+43qKvcXqHAQpApN7WzvtjhJ0sf/3rwhRzWxhDFsYcq1ngJF4CaQo6Ji fMwq1NmuRU2VkJ0wrW6qkUnJXheM61lXN6zgM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692327498; x=1692932298; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=xZRdt7TwlHaLP0+ES5VWtWujEx9GcoxM9w8+usdPhy8=; b=MYIgbYa6wgfp9QTyFVWDW4JtMOLw2h/QMgc7ZHWJ+k4RftpCL4IYG6zTTE9ABrPKLv ZtK01ip1SCLre68weymg3WBdWy5xCWzNydD+0FhGY43QpGnp5fUXPgnE2Py38/qS3pJB gFycvZFWKMEvjV0QuZk5sCpYCLlt9xZQ8Zm0PpHH8DJe22I26wGN3zcNJv0kbduScERD j5GEnXZbWVt1jFsY0jG/GPsI/BDHj5aerKDT3sDugd2PHdJPxBrAyLpJ4J2t9v2h1MA1 B79alBJJ6Ml40iK1M425XYMuCmBU/+y8880TNz01CpCPa8EoOY8Xn/pocq0kzHuRoYoY yogw== X-Gm-Message-State: AOJu0Yw1mJSisFu72zuXdh8xL8q7ctcOlmffJIaNtxGeajmxVi1LBCsV EkiLxNwQH0WtAWyvhqIidNHxaA== X-Google-Smtp-Source: AGHT+IHjA/tqhYU62j0+8ABYfHXXbzEn+57KKrFe75Njs/TX73/yqlE+JaGhkeArFENTczw1mFzKtg== X-Received: by 2002:a05:622a:178c:b0:40f:e171:df4a with SMTP id s12-20020a05622a178c00b0040fe171df4amr1709123qtk.57.1692327498313; Thu, 17 Aug 2023 19:58:18 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id c15-20020ac8110f000000b00403ad6ec2e8sm267571qtj.26.2023.08.17.19.58.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 19:58:17 -0700 (PDT) Date: Thu, 17 Aug 2023 19:58:16 -0700 From: Yan Zhai To: bpf@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , Yan Zhai , Thomas Graf , Jordan Griege , Dan Carpenter , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v6 bpf 3/4] selftests/bpf: add lwt_xmit tests for BPF_REDIRECT Message-ID: <96bf435243641939d9c9da329fab29cb45f7df22.1692326837.git.yan@cloudflare.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org There is no lwt_xmit test case for BPF_REDIRECT yet. Add test cases for both normal and abnormal situations. For abnormal test cases, devices are set down or have its carrier set down. Without proper fixes, BPF_REDIRECT to either ingress or egress of such device would panic the kernel. Signed-off-by: Yan Zhai --- v5->v6: busybox does not support ping mark, redirect via IP address instead. Add missing kernel test config. --- tools/testing/selftests/bpf/config | 1 + .../selftests/bpf/prog_tests/lwt_helpers.h | 139 ++++++++ .../selftests/bpf/prog_tests/lwt_redirect.c | 330 ++++++++++++++++++ .../selftests/bpf/progs/test_lwt_redirect.c | 90 +++++ 4 files changed, 560 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/lwt_helpers.h create mode 100644 tools/testing/selftests/bpf/prog_tests/lwt_redirect.c create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_redirect.c diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config index 3b350bc31343..fa083f1e5b34 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -16,6 +16,7 @@ CONFIG_CRYPTO_USER_API_HASH=y CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_BTF=y CONFIG_DEBUG_INFO_DWARF4=y +CONFIG_DUMMY=y CONFIG_DYNAMIC_FTRACE=y CONFIG_FPROBE=y CONFIG_FTRACE_SYSCALLS=y diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_helpers.h b/tools/testing/selftests/bpf/prog_tests/lwt_helpers.h new file mode 100644 index 000000000000..61333f2a03f9 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/lwt_helpers.h @@ -0,0 +1,139 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LWT_HELPERS_H +#define __LWT_HELPERS_H + +#include +#include +#include +#include + +#include "test_progs.h" + +#define log_err(MSG, ...) \ + fprintf(stderr, "(%s:%d: errno: %s) " MSG "\n", \ + __FILE__, __LINE__, strerror(errno), ##__VA_ARGS__) + +#define RUN_TEST(name) \ + ({ \ + if (test__start_subtest(#name)) \ + if (ASSERT_OK(netns_create(), "netns_create")) { \ + struct nstoken *token = open_netns(NETNS); \ + if (ASSERT_OK_PTR(token, "setns")) { \ + test_ ## name(); \ + close_netns(token); \ + } \ + netns_delete(); \ + } \ + }) + +#define NETNS "ns_lwt" + +static inline int netns_create(void) +{ + return system("ip netns add " NETNS); +} + +static inline int netns_delete(void) +{ + return system("ip netns del " NETNS ">/dev/null 2>&1"); +} + +static int open_tuntap(const char *dev_name, bool need_mac) +{ + int err = 0; + struct ifreq ifr; + int fd = open("/dev/net/tun", O_RDWR); + + if (!ASSERT_GT(fd, 0, "open(/dev/net/tun)")) + return -1; + + ifr.ifr_flags = IFF_NO_PI | (need_mac ? IFF_TAP : IFF_TUN); + memcpy(ifr.ifr_name, dev_name, IFNAMSIZ); + + err = ioctl(fd, TUNSETIFF, &ifr); + if (!ASSERT_OK(err, "ioctl(TUNSETIFF)")) { + close(fd); + return -1; + } + + err = fcntl(fd, F_SETFL, O_NONBLOCK); + if (!ASSERT_OK(err, "fcntl(O_NONBLOCK)")) { + close(fd); + return -1; + } + + return fd; +} + +#define ICMP_PAYLOAD_SIZE 100 + +/* Match an ICMP packet with payload len ICMP_PAYLOAD_SIZE */ +static int __expect_icmp_ipv4(char *buf, ssize_t len) +{ + struct iphdr *ip = (struct iphdr *)buf; + struct icmphdr *icmp = (struct icmphdr *)(ip + 1); + ssize_t min_header_len = sizeof(*ip) + sizeof(*icmp); + + if (len < min_header_len) + return -1; + + if (ip->protocol != IPPROTO_ICMP) + return -1; + + if (icmp->type != ICMP_ECHO) + return -1; + + return len == ICMP_PAYLOAD_SIZE + min_header_len; +} + +typedef int (*filter_t) (char *, ssize_t); + +/* wait_for_packet - wait for a packet that matches the filter + * + * @fd: tun fd/packet socket to read packet + * @filter: filter function, returning 1 if matches + * @timeout: timeout to wait for the packet + * + * Returns 1 if a matching packet is read, 0 if timeout expired, -1 on error. + */ +static int wait_for_packet(int fd, filter_t filter, struct timeval *timeout) +{ + char buf[4096]; + int max_retry = 5; /* in case we read some spurious packets */ + fd_set fds; + + FD_ZERO(&fds); + while (max_retry--) { + /* Linux modifies timeout arg... So make a copy */ + struct timeval copied_timeout = *timeout; + ssize_t ret = -1; + + FD_SET(fd, &fds); + + ret = select(1 + fd, &fds, NULL, NULL, &copied_timeout); + if (ret <= 0) { + if (errno == EINTR) + continue; + else if (errno == EAGAIN || ret == 0) + return 0; + + log_err("select failed"); + return -1; + } + + ret = read(fd, buf, sizeof(buf)); + + if (ret <= 0) { + log_err("read(dev): %ld", ret); + return -1; + } + + if (filter && filter(buf, ret) > 0) + return 1; + } + + return 0; +} + +#endif /* __LWT_HELPERS_H */ diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c new file mode 100644 index 000000000000..59b38569f310 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c @@ -0,0 +1,330 @@ +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause + +/* + * Test suite of lwt_xmit BPF programs that redirect packets + * The file tests focus not only if these programs work as expected normally, + * but also if they can handle abnormal situations gracefully. + * + * WARNING + * ------- + * This test suite may crash the kernel, thus should be run in a VM. + * + * Setup: + * --------- + * All tests are performed in a single netns. Two lwt encap routes are setup for + * each subtest: + * + * ip route add 10.0.0.0/24 encap bpf xmit sec "" dev link_err + * ip route add 20.0.0.0/24 encap bpf xmit sec "" dev link_err + * + * Here is statically defined to test_lwt_redirect.bpf.o, and each section + * of this object holds a program entry to test. The BPF object is built from + * progs/test_lwt_redirect.c. We didn't use generated BPF skeleton since the + * attachment for lwt programs are not supported by libbpf yet. + * + * For testing, ping commands are run in the test netns: + * + * ping 10.0.0. -c 1 -w 1 -s 100 + * ping 20.0.0. -c 1 -w 1 -s 100 + * + * Scenarios: + * -------------------------------- + * 1. Redirect to a running tap/tun device + * 2. Redirect to a down tap/tun device + * 3. Redirect to a vlan device with lower layer down + * + * Case 1, ping packets should be received by packet socket on target device + * when redirected to ingress, and by tun/tap fd when redirected to egress. + * + * Case 2,3 are considered successful as long as they do not crash the kernel + * as a regression. + * + * Case 1,2 use tap device to test redirect to device that requires MAC + * header, and tun device to test the case with no MAC header added. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "lwt_helpers.h" +#include "test_progs.h" +#include "network_helpers.h" + +#define BPF_OBJECT "test_lwt_redirect.bpf.o" +#define INGRESS_SEC(need_mac) ((need_mac) ? "redir_ingress" : "redir_ingress_nomac") +#define EGRESS_SEC(need_mac) ((need_mac) ? "redir_egress" : "redir_egress_nomac") +#define LOCAL_SRC "10.0.0.1" +#define CIDR_TO_INGRESS "10.0.0.0/24" +#define CIDR_TO_EGRESS "20.0.0.0/24" + +/* ping to redirect toward given dev, with last byte of dest IP being the target + * device index. + * + * Note: ping command inside BPF-CI is busybox version, so it does not have certain + * function, such like -m option to set packet mark. + */ +static void ping_dev(const char *dev, bool is_ingress) +{ + int link_index = if_nametoindex(dev); + char ip[256]; + + if (!ASSERT_GE(link_index, 0, "if_nametoindex")) + return; + + if (is_ingress) + snprintf(ip, sizeof(ip), "10.0.0.%d", link_index); + else + snprintf(ip, sizeof(ip), "20.0.0.%d", link_index); + + /* We won't get a reply. Don't fail here */ + SYS_NOFAIL("ping %s -c1 -W1 -s %d >/dev/null 2>&1", + ip, ICMP_PAYLOAD_SIZE); +} + +static int new_packet_sock(const char *ifname) +{ + int err = 0; + int ignore_outgoing = 1; + int ifindex = -1; + int s = -1; + + s = socket(AF_PACKET, SOCK_RAW, 0); + if (!ASSERT_GE(s, 0, "socket(AF_PACKET)")) + return -1; + + ifindex = if_nametoindex(ifname); + if (!ASSERT_GE(ifindex, 0, "if_nametoindex")) { + close(s); + return -1; + } + + struct sockaddr_ll addr = { + .sll_family = AF_PACKET, + .sll_protocol = htons(ETH_P_IP), + .sll_ifindex = ifindex, + }; + + err = bind(s, (struct sockaddr *)&addr, sizeof(addr)); + if (!ASSERT_OK(err, "bind(AF_PACKET)")) { + close(s); + return -1; + } + + /* Use packet socket to capture only the ingress, so we can distinguish + * the case where a regression that actually redirects the packet to + * the egress. + */ + err = setsockopt(s, SOL_PACKET, PACKET_IGNORE_OUTGOING, + &ignore_outgoing, sizeof(ignore_outgoing)); + if (!ASSERT_OK(err, "setsockopt(PACKET_IGNORE_OUTGOING)")) { + close(s); + return -1; + } + + err = fcntl(s, F_SETFL, O_NONBLOCK); + if (!ASSERT_OK(err, "fcntl(O_NONBLOCK)")) { + close(s); + return -1; + } + + return s; +} + +static int expect_icmp(char *buf, ssize_t len) +{ + struct ethhdr *eth = (struct ethhdr *)buf; + + if (len < (ssize_t)sizeof(*eth)) + return -1; + + if (eth->h_proto == htons(ETH_P_IP)) + return __expect_icmp_ipv4((char *)(eth + 1), len - sizeof(*eth)); + + return -1; +} + +static int expect_icmp_nomac(char *buf, ssize_t len) +{ + return __expect_icmp_ipv4(buf, len); +} + +static void send_and_capture_test_packets(const char *test_name, int tap_fd, + const char *target_dev, bool need_mac) +{ + int psock = -1; + struct timeval timeo = { + .tv_sec = 0, + .tv_usec = 250000, + }; + int ret = -1; + + filter_t filter = need_mac ? expect_icmp : expect_icmp_nomac; + + ping_dev(target_dev, false); + + ret = wait_for_packet(tap_fd, filter, &timeo); + if (!ASSERT_EQ(ret, 1, "wait_for_epacket")) { + log_err("%s egress test fails", test_name); + goto out; + } + + psock = new_packet_sock(target_dev); + ping_dev(target_dev, true); + + ret = wait_for_packet(psock, filter, &timeo); + if (!ASSERT_EQ(ret, 1, "wait_for_ipacket")) { + log_err("%s ingress test fails", test_name); + goto out; + } + +out: + if (psock >= 0) + close(psock); +} + +static int setup_redirect_target(const char *target_dev, bool need_mac) +{ + int target_index = -1; + int tap_fd = -1; + + tap_fd = open_tuntap(target_dev, need_mac); + if (!ASSERT_GE(tap_fd, 0, "open_tuntap")) + goto fail; + + target_index = if_nametoindex(target_dev); + if (!ASSERT_GE(target_index, 0, "if_nametoindex")) + goto fail; + + SYS(fail, "ip link add link_err type dummy"); + SYS(fail, "ip link set lo up"); + SYS(fail, "ip addr add dev lo " LOCAL_SRC "/32"); + SYS(fail, "ip link set link_err up"); + SYS(fail, "ip link set %s up", target_dev); + + SYS(fail, "ip route add %s dev link_err encap bpf xmit obj %s sec %s", + CIDR_TO_INGRESS, BPF_OBJECT, INGRESS_SEC(need_mac)); + + SYS(fail, "ip route add %s dev link_err encap bpf xmit obj %s sec %s", + CIDR_TO_EGRESS, BPF_OBJECT, EGRESS_SEC(need_mac)); + + return tap_fd; + +fail: + if (tap_fd >= 0) + close(tap_fd); + return -1; +} + +static void test_lwt_redirect_normal(void) +{ + const char *target_dev = "tap0"; + int tap_fd = -1; + bool need_mac = true; + + tap_fd = setup_redirect_target(target_dev, need_mac); + if (!ASSERT_GE(tap_fd, 0, "setup_redirect_target")) + return; + + send_and_capture_test_packets(__func__, tap_fd, target_dev, need_mac); + close(tap_fd); +} + +static void test_lwt_redirect_normal_nomac(void) +{ + const char *target_dev = "tun0"; + int tap_fd = -1; + bool need_mac = false; + + tap_fd = setup_redirect_target(target_dev, need_mac); + if (!ASSERT_GE(tap_fd, 0, "setup_redirect_target")) + return; + + send_and_capture_test_packets(__func__, tap_fd, target_dev, need_mac); + close(tap_fd); +} + +/* This test aims to prevent regression of future. As long as the kernel does + * not panic, it is considered as success. + */ +static void __test_lwt_redirect_dev_down(bool need_mac) +{ + const char *target_dev = "tap0"; + int tap_fd = -1; + + tap_fd = setup_redirect_target(target_dev, need_mac); + if (!ASSERT_GE(tap_fd, 0, "setup_redirect_target")) + return; + + SYS(out, "ip link set %s down", target_dev); + ping_dev(target_dev, true); + ping_dev(target_dev, false); + +out: + close(tap_fd); +} + +static void test_lwt_redirect_dev_down(void) +{ + __test_lwt_redirect_dev_down(true); +} + +static void test_lwt_redirect_dev_down_nomac(void) +{ + __test_lwt_redirect_dev_down(false); +} + +/* This test aims to prevent regression of future. As long as the kernel does + * not panic, it is considered as success. + */ +static void test_lwt_redirect_dev_carrier_down(void) +{ + const char *lower_dev = "tap0"; + const char *vlan_dev = "vlan100"; + int tap_fd = -1; + + tap_fd = setup_redirect_target(lower_dev, true); + if (!ASSERT_GE(tap_fd, 0, "setup_redirect_target")) + return; + + SYS(out, "ip link add vlan100 link %s type vlan id 100", lower_dev); + SYS(out, "ip link set %s up", vlan_dev); + SYS(out, "ip link set %s down", lower_dev); + ping_dev(vlan_dev, true); + ping_dev(vlan_dev, false); + +out: + close(tap_fd); +} + +static void *test_lwt_redirect_run(void *arg) +{ + netns_delete(); + RUN_TEST(lwt_redirect_normal); + RUN_TEST(lwt_redirect_normal_nomac); + RUN_TEST(lwt_redirect_dev_down); + RUN_TEST(lwt_redirect_dev_down_nomac); + RUN_TEST(lwt_redirect_dev_carrier_down); + return NULL; +} + +void test_lwt_redirect(void) +{ + pthread_t test_thread; + int err; + + /* Run the tests in their own thread to isolate the namespace changes + * so they do not affect the environment of other tests. + * (specifically needed because of unshare(CLONE_NEWNS) in open_netns()) + */ + err = pthread_create(&test_thread, NULL, &test_lwt_redirect_run, NULL); + if (ASSERT_OK(err, "pthread_create")) + ASSERT_OK(pthread_join(test_thread, NULL), "pthread_join"); +} diff --git a/tools/testing/selftests/bpf/progs/test_lwt_redirect.c b/tools/testing/selftests/bpf/progs/test_lwt_redirect.c new file mode 100644 index 000000000000..7ab1fd310efb --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_lwt_redirect.c @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include "bpf_tracing_net.h" + +/* We don't care about whether the packet can be received by network stack. + * Just care if the packet is sent to the correct device at correct direction + * and not panic the kernel. + */ +static int prepend_dummy_mac(struct __sk_buff *skb) +{ + char mac[] = {0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0xf, + 0xe, 0xd, 0xc, 0xb, 0xa, 0x08, 0x00}; + + if (bpf_skb_change_head(skb, ETH_HLEN, 0)) + return -1; + + if (bpf_skb_store_bytes(skb, 0, mac, sizeof(mac), 0)) + return -1; + + return 0; +} + +/* Use the last byte of IP address to redirect the packet */ +static int get_redirect_target(struct __sk_buff *skb) +{ + struct iphdr *iph = NULL; + void *start = (void *)(long)skb->data; + void *end = (void *)(long)skb->data_end; + + if (start + sizeof(*iph) > end) + return -1; + + iph = (struct iphdr *)start; + return bpf_ntohl(iph->daddr) & 0xff; +} + +SEC("redir_ingress") +int test_lwt_redirect_in(struct __sk_buff *skb) +{ + int target = get_redirect_target(skb); + + if (target < 0) + return BPF_OK; + + if (prepend_dummy_mac(skb)) + return BPF_DROP; + + return bpf_redirect(target, BPF_F_INGRESS); +} + +SEC("redir_egress") +int test_lwt_redirect_out(struct __sk_buff *skb) +{ + int target = get_redirect_target(skb); + + if (target < 0) + return BPF_OK; + + if (prepend_dummy_mac(skb)) + return BPF_DROP; + + return bpf_redirect(target, 0); +} + +SEC("redir_egress_nomac") +int test_lwt_redirect_out_nomac(struct __sk_buff *skb) +{ + int target = get_redirect_target(skb); + + if (target < 0) + return BPF_OK; + + return bpf_redirect(target, 0); +} + +SEC("redir_ingress_nomac") +int test_lwt_redirect_in_nomac(struct __sk_buff *skb) +{ + int target = get_redirect_target(skb); + + if (target < 0) + return BPF_OK; + + return bpf_redirect(target, BPF_F_INGRESS); +} + +char _license[] SEC("license") = "GPL"; From patchwork Fri Aug 18 02:58:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhai X-Patchwork-Id: 13357244 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02FFCC7113C for ; Fri, 18 Aug 2023 02:59:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357357AbjHRC63 (ORCPT ); Thu, 17 Aug 2023 22:58:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357352AbjHRC6Y (ORCPT ); Thu, 17 Aug 2023 22:58:24 -0400 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 639362727 for ; Thu, 17 Aug 2023 19:58:21 -0700 (PDT) Received: by mail-qk1-x72d.google.com with SMTP id af79cd13be357-76d764128d2so29518885a.3 for ; Thu, 17 Aug 2023 19:58:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1692327500; x=1692932300; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=vjV6aNUTuuSNDwStLeDVY3FrysRgluVjJM7kylN+jBw=; b=sXJnlisFGWEESfGIPFNx+eHq1c5hVs/cGoCgH5xBTYzrpZsiO0vsuKvP2SMTbRppn7 EswOXQhXHAxVsYPB7xC3vWA42ON3GBBGEXlR5bs++L0yT4czuZNM+OFZcY8T/luV03RP TwHd3RdYEE8E+kiK/gmZ/3DjxNj69idlvv+9U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692327500; x=1692932300; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=vjV6aNUTuuSNDwStLeDVY3FrysRgluVjJM7kylN+jBw=; b=U1y5++kyKgxFjXudYLpO/TTs2Z5eCFjU6vWi+nYRodpWNYz4+Gvm/nLFFmMSeutMKB nfxfTJUeI2Ji+WiFsisDOISTRHMsr5Aqyon5WEEIarDcEbcfUtnF5zOg7vvPN6cRp1TD seUsvQDQgyhkXPvGgFTGtaoYsAwt4zAujUsO5wXIpyVScfL48U/onixsXmvyEtc5PR+T REAfmtsHBjrq8dhNu9lo9jOWRDFn+J0U5fDGGNfFe2vB86dL7r31PFSN6uzoUXDitTWC x8A4A5m9XRaDftmslwAOvBAA45TYkVfA5oE/inoB28iGr/gbbeu14yvo3BnOLPLpk0TP rXZg== X-Gm-Message-State: AOJu0Yx3WEsYITbp3V47Xvkfiy202jl6i44L0NTuLeJvG1nshfG5wPcf QUBEEIydFLFWhrsaxp4rvdCS3Q== X-Google-Smtp-Source: AGHT+IGay1kP8z+wNAb7o3gQ/QH6D47COE6RTPCNK1BWQrV4YyWrBxEw/mIWWzc3g+qDltcs5yZrIg== X-Received: by 2002:a05:620a:471f:b0:767:2bbc:fcc7 with SMTP id bs31-20020a05620a471f00b007672bbcfcc7mr1724541qkb.14.1692327500437; Thu, 17 Aug 2023 19:58:20 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id h7-20020a05620a13e700b0076cb3690ae7sm238671qkl.68.2023.08.17.19.58.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 19:58:19 -0700 (PDT) Date: Thu, 17 Aug 2023 19:58:18 -0700 From: Yan Zhai To: bpf@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , Yan Zhai , Thomas Graf , Jordan Griege , Dan Carpenter , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v6 bpf 4/4] selftests/bpf: add lwt_xmit tests for BPF_REROUTE Message-ID: <62c8ddc1e924269dcf80d2e8af1a1e632cee0b3a.1692326837.git.yan@cloudflare.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org There is no lwt test case for BPF_REROUTE yet. Add test cases for both normal and abnormal situations. The abnormal situation is set up with an fq qdisc on the reroute target device. Without proper fixes, overflow this qdisc queue limit (to trigger a drop) would panic the kernel. Signed-off-by: Yan Zhai --- tools/testing/selftests/bpf/config | 1 + .../selftests/bpf/prog_tests/lwt_reroute.c | 262 ++++++++++++++++++ .../selftests/bpf/progs/test_lwt_reroute.c | 36 +++ 3 files changed, 299 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/lwt_reroute.c create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_reroute.c diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config index fa083f1e5b34..1c7584e8dd9e 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -60,6 +60,7 @@ CONFIG_NET_IPGRE=y CONFIG_NET_IPGRE_DEMUX=y CONFIG_NET_IPIP=y CONFIG_NET_MPLS_GSO=y +CONFIG_NET_SCH_FQ=y CONFIG_NET_SCH_INGRESS=y CONFIG_NET_SCHED=y CONFIG_NETDEVSIM=y diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_reroute.c b/tools/testing/selftests/bpf/prog_tests/lwt_reroute.c new file mode 100644 index 000000000000..f4bb2d5fcae0 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/lwt_reroute.c @@ -0,0 +1,262 @@ +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause + +/* + * Test suite of lwt BPF programs that reroutes packets + * The file tests focus not only if these programs work as expected normally, + * but also if they can handle abnormal situations gracefully. This test + * suite currently only covers lwt_xmit hook. lwt_in tests have not been + * implemented. + * + * WARNING + * ------- + * This test suite can crash the kernel, thus should be run in a VM. + * + * Setup: + * --------- + * all tests are performed in a single netns. A lwt encap route is setup for + * each subtest: + * + * ip route add 10.0.0.0/24 encap bpf xmit sec "" dev link_err + * + * Here is statically defined to test_lwt_reroute.bpf.o, and it contains + * a single test program entry. This program sets packet mark by last byte of + * the IPv4 daddr. For example, a packet going to 1.2.3.4 will receive a skb + * mark 4. A packet will only be marked once, and IP x.x.x.0 will be skipped + * to avoid route loop. We didn't use generated BPF skeleton since the + * attachment for lwt programs are not supported by libbpf yet. + * + * The test program will bring up a tun device, and sets up the following + * routes: + * + * ip rule add pref 100 from all fwmark lookup 100 + * ip route add table 100 default dev tun0 + * + * For normal testing, a ping command is running in the test netns: + * + * ping 10.0.0. -c 1 -w 1 -s 100 + * + * For abnormal testing, fq is used as the qdisc of the tun device. Then a UDP + * socket will try to overflow the fq queue and trigger qdisc drop error. + * + * Scenarios: + * -------------------------------- + * 1. Reroute to a running tun device + * 2. Reroute to a device where qdisc drop + * + * For case 1, ping packets should be received by the tun device. + * + * For case 2, force UDP packets to overflow fq limit. As long as kernel + * is not crashed, it is considered successful. + */ +#include "lwt_helpers.h" +#include "network_helpers.h" +#include + +#define BPF_OBJECT "test_lwt_reroute.bpf.o" +#define LOCAL_SRC "10.0.0.1" +#define TEST_CIDR "10.0.0.0/24" +#define XMIT_HOOK "xmit" +#define XMIT_SECTION "lwt_xmit" +#define NSEC_PER_SEC 1000000000ULL + +/* send a ping to be rerouted to the target device */ +static void ping_once(const char *ip) +{ + /* We won't get a reply. Don't fail here */ + SYS_NOFAIL("ping %s -c1 -W1 -s %d >/dev/null 2>&1", + ip, ICMP_PAYLOAD_SIZE); +} + +/* Send snd_target UDP packets to overflow the fq queue and trigger qdisc drop + * error. This is done via TX tstamp to force buffering delayed packets. + */ +static int overflow_fq(int snd_target, const char *target_ip) +{ + struct sockaddr_in addr = { + .sin_family = AF_INET, + .sin_port = htons(1234), + }; + + char data_buf[8]; /* only #pkts matter, so use a random small buffer */ + char control_buf[CMSG_SPACE(sizeof(uint64_t))]; + struct iovec iov = { + .iov_base = data_buf, + .iov_len = sizeof(data_buf), + }; + int err = -1; + int s = -1; + struct sock_txtime txtime_on = { + .clockid = CLOCK_MONOTONIC, + .flags = 0, + }; + struct msghdr msg = { + .msg_name = &addr, + .msg_namelen = sizeof(addr), + .msg_control = control_buf, + .msg_controllen = sizeof(control_buf), + .msg_iovlen = 1, + .msg_iov = &iov, + }; + struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); + + memset(data_buf, 0, sizeof(data_buf)); + + s = socket(AF_INET, SOCK_DGRAM, 0); + if (!ASSERT_GE(s, 0, "socket")) + goto out; + + err = setsockopt(s, SOL_SOCKET, SO_TXTIME, &txtime_on, sizeof(txtime_on)); + if (!ASSERT_OK(err, "setsockopt(SO_TXTIME)")) + goto out; + + err = inet_pton(AF_INET, target_ip, &addr.sin_addr); + if (!ASSERT_EQ(err, 1, "inet_pton")) + goto out; + + while (snd_target > 0) { + struct timespec now; + + memset(control_buf, 0, sizeof(control_buf)); + cmsg->cmsg_type = SCM_TXTIME; + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_len = CMSG_LEN(sizeof(uint64_t)); + + err = clock_gettime(CLOCK_MONOTONIC, &now); + if (!ASSERT_OK(err, "clock_gettime(CLOCK_MONOTONIC)")) { + err = -1; + goto out; + } + + *(uint64_t *)CMSG_DATA(cmsg) = (now.tv_nsec + 1) * NSEC_PER_SEC + + now.tv_nsec; + + /* we will intentionally send more than fq limit, so ignore + * the error here. + */ + sendmsg(s, &msg, MSG_NOSIGNAL); + snd_target--; + } + + /* no kernel crash so far is considered success */ + err = 0; + +out: + if (s >= 0) + close(s); + + return err; +} + +static int setup(const char *tun_dev) +{ + int target_index = -1; + int tap_fd = -1; + + tap_fd = open_tuntap(tun_dev, false); + if (!ASSERT_GE(tap_fd, 0, "open_tun")) + return -1; + + target_index = if_nametoindex(tun_dev); + if (!ASSERT_GE(target_index, 0, "if_nametoindex")) + return -1; + + SYS(fail, "ip link add link_err type dummy"); + SYS(fail, "ip link set lo up"); + SYS(fail, "ip addr add dev lo " LOCAL_SRC "/32"); + SYS(fail, "ip link set link_err up"); + SYS(fail, "ip link set %s up", tun_dev); + + SYS(fail, "ip route add %s dev link_err encap bpf xmit obj %s sec lwt_xmit", + TEST_CIDR, BPF_OBJECT); + + SYS(fail, "ip rule add pref 100 from all fwmark %d lookup 100", + target_index); + SYS(fail, "ip route add t 100 default dev %s", tun_dev); + + return tap_fd; + +fail: + if (tap_fd >= 0) + close(tap_fd); + return -1; +} + +static void test_lwt_reroute_normal_xmit(void) +{ + const char *tun_dev = "tun0"; + int tun_fd = -1; + int ifindex = -1; + char ip[256]; + struct timeval timeo = { + .tv_sec = 0, + .tv_usec = 250000, + }; + + tun_fd = setup(tun_dev); + if (!ASSERT_GE(tun_fd, 0, "setup_reroute")) + return; + + ifindex = if_nametoindex(tun_dev); + if (!ASSERT_GE(ifindex, 0, "if_nametoindex")) + return; + + snprintf(ip, 256, "10.0.0.%d", ifindex); + + /* ping packets should be received by the tun device */ + ping_once(ip); + + if (!ASSERT_EQ(wait_for_packet(tun_fd, __expect_icmp_ipv4, &timeo), 1, + "wait_for_packet")) + log_err("%s xmit", __func__); +} + +/* + * Test the failure case when the skb is dropped at the qdisc. This is a + * regression prevention at the xmit hook only. + */ +static void test_lwt_reroute_qdisc_dropped(void) +{ + const char *tun_dev = "tun0"; + int tun_fd = -1; + int ifindex = -1; + char ip[256]; + + tun_fd = setup(tun_dev); + if (!ASSERT_GE(tun_fd, 0, "setup_reroute")) + goto fail; + + SYS(fail, "tc qdisc replace dev %s root fq limit 5 flow_limit 5", tun_dev); + + ifindex = if_nametoindex(tun_dev); + if (!ASSERT_GE(ifindex, 0, "if_nametoindex")) + return; + + snprintf(ip, 256, "10.0.0.%d", ifindex); + ASSERT_EQ(overflow_fq(10, ip), 0, "overflow_fq"); + +fail: + if (tun_fd >= 0) + close(tun_fd); +} + +static void *test_lwt_reroute_run(void *arg) +{ + netns_delete(); + RUN_TEST(lwt_reroute_normal_xmit); + RUN_TEST(lwt_reroute_qdisc_dropped); + return NULL; +} + +void test_lwt_reroute(void) +{ + pthread_t test_thread; + int err; + + /* Run the tests in their own thread to isolate the namespace changes + * so they do not affect the environment of other tests. + * (specifically needed because of unshare(CLONE_NEWNS) in open_netns()) + */ + err = pthread_create(&test_thread, NULL, &test_lwt_reroute_run, NULL); + if (ASSERT_OK(err, "pthread_create")) + ASSERT_OK(pthread_join(test_thread, NULL), "pthread_join"); +} diff --git a/tools/testing/selftests/bpf/progs/test_lwt_reroute.c b/tools/testing/selftests/bpf/progs/test_lwt_reroute.c new file mode 100644 index 000000000000..1dc64351929c --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_lwt_reroute.c @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +/* This function extracts the last byte of the daddr, and uses it + * as output dev index. + */ +SEC("lwt_xmit") +int test_lwt_reroute(struct __sk_buff *skb) +{ + struct iphdr *iph = NULL; + void *start = (void *)(long)skb->data; + void *end = (void *)(long)skb->data_end; + + /* set mark at most once */ + if (skb->mark != 0) + return BPF_OK; + + if (start + sizeof(*iph) > end) + return BPF_DROP; + + iph = (struct iphdr *)start; + skb->mark = bpf_ntohl(iph->daddr) & 0xff; + + /* do not reroute x.x.x.0 packets */ + if (skb->mark == 0) + return BPF_OK; + + return BPF_LWT_REROUTE; +} + +char _license[] SEC("license") = "GPL";