From patchwork Wed Nov 3 20:47:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 12601585 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A1CDC433EF for ; Wed, 3 Nov 2021 20:48:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7BCEB60E9C for ; Wed, 3 Nov 2021 20:48:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229893AbhKCUuk (ORCPT ); Wed, 3 Nov 2021 16:50:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230198AbhKCUuj (ORCPT ); Wed, 3 Nov 2021 16:50:39 -0400 Received: from mail-io1-xd30.google.com (mail-io1-xd30.google.com [IPv6:2607:f8b0:4864:20::d30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 761DFC061714; Wed, 3 Nov 2021 13:48:02 -0700 (PDT) Received: by mail-io1-xd30.google.com with SMTP id n128so4412448iod.9; Wed, 03 Nov 2021 13:48:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tZbv5aAJ4v4EiF6Cmu6oZo1oYCe3bfoiySGK/sxgW9U=; b=EmC0sO8kvotE0Y/JF+uh8zs3/jRRnvL7cpGsCyZtvzDbf98RBFq0jeRYmTxJaGJk1t G0+Nbx4Njz+Eegpj5depBexDs/b77GUpkCn1FwzQuV5NWu3nSotD17lc7mbe2ecLImSe N2jv/M2DzLKMUDPfQT1ztpgbZ58fBjce0yPV5lPdAwafXbcQRQ62ZYInAi/viaSSAHJS 7tvMxe0sWalef4i0UZiN7gXnnZz6sLSqz9P53KWwFNCSqOr/q9zZJEugt6yS+cD7ECgU 2bRHr4b5T0TvD4sdK+HyUJ/MLi8fxRm3Cw1SHpTurw4ioaGgP3/V7tv+LaZYZ3yr1leT GGQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tZbv5aAJ4v4EiF6Cmu6oZo1oYCe3bfoiySGK/sxgW9U=; b=eR1orXVtGFGQ5pANCdofurlgY8dStd2JHk2cmJhpK4SJaFxqzT4Xf7ldGOxUICYBCJ HqSPopvFVVu/26oedsMf2Ms56uPEV/UX3jB+dNcAiwdiQtJgc43pVCX43eGnEt7viZzZ 4X3SnU4QZAYyZvFPHOim7g/TBElFvgQ2o5lnVi9yM+HjaMgzflBbqiR3xgX1iuKzkZJS M0Bngk5NY2ex9rldZL5gyV9g/KuQNENCgHue43CvX3HSuMBzlDQlRJgh9eO62mubw8h4 ngYIzpPz7cym6ksuHctUhYKOPfBSY4QW8muD1OxwtgogvsrKv95xOl9f2pgFlgE6zvKt EfsQ== X-Gm-Message-State: AOAM530UlfW4E3zn06y/V15GwrXqBzUbaKACS0pOAa0ZMjHPJXVCwU/e PGhy5myRTdc4BPV4mZGa4A8ySWnCEmFE4Q== X-Google-Smtp-Source: ABdhPJwXxmXoBceyGqG7yK+/s057O5be1LAh1K42KqNJ97TgBgrgaoyz9NRxQ0e7AqQIa/b4wYkNpg== X-Received: by 2002:a05:6602:2ccf:: with SMTP id j15mr19127128iow.77.1635972481622; Wed, 03 Nov 2021 13:48:01 -0700 (PDT) Received: from john.lan ([172.243.151.11]) by smtp.gmail.com with ESMTPSA id y11sm1507612ior.4.2021.11.03.13.47.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Nov 2021 13:48:01 -0700 (PDT) From: John Fastabend To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: daniel@iogearbox.net, joamaki@gmail.com, xiyou.wangcong@gmail.com, jakub@cloudflare.com, john.fastabend@gmail.com Subject: [PATCH bpf v2 1/5] bpf, sockmap: Use stricter sk state checks in sk_lookup_assign Date: Wed, 3 Nov 2021 13:47:32 -0700 Message-Id: <20211103204736.248403-2-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211103204736.248403-1-john.fastabend@gmail.com> References: <20211103204736.248403-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net In order to fix an issue with sockets in TCP sockmap redirect cases we plan to allow CLOSE state sockets to exist in the sockmap. However, the check in bpf_sk_lookup_assign currently only invalidates sockets in the TCP_ESTABLISHED case relying on the checks on sockmap insert to ensure we never SOCK_CLOSE state sockets in the map. To prepare for this change we flip the logic in bpf_sk_lookup_assign() to explicitly test for the accepted cases. Namely, a tcp socket in TCP_LISTEN or a udp socket in TCP_CLOSE state. This also makes the code more resilent to future changes. Suggested-by: Jakub Sitnicki Signed-off-by: John Fastabend Reviewed-by: Jakub Sitnicki --- include/linux/skmsg.h | 12 ++++++++++++ net/core/filter.c | 6 ++++-- net/core/sock_map.c | 6 ------ 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index b4256847c707..584d94be9c8b 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -507,6 +507,18 @@ static inline bool sk_psock_strp_enabled(struct sk_psock *psock) return !!psock->saved_data_ready; } +static inline bool sk_is_tcp(const struct sock *sk) +{ + return sk->sk_type == SOCK_STREAM && + sk->sk_protocol == IPPROTO_TCP; +} + +static inline bool sk_is_udp(const struct sock *sk) +{ + return sk->sk_type == SOCK_DGRAM && + sk->sk_protocol == IPPROTO_UDP; +} + #if IS_ENABLED(CONFIG_NET_SOCK_MSG) #define BPF_F_STRPARSER (1UL << 1) diff --git a/net/core/filter.c b/net/core/filter.c index 8e8d3b49c297..a68418268e92 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -10423,8 +10423,10 @@ BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx, return -EINVAL; if (unlikely(sk && sk_is_refcounted(sk))) return -ESOCKTNOSUPPORT; /* reject non-RCU freed sockets */ - if (unlikely(sk && sk->sk_state == TCP_ESTABLISHED)) - return -ESOCKTNOSUPPORT; /* reject connected sockets */ + if (unlikely(sk && sk_is_tcp(sk) && sk->sk_state != TCP_LISTEN)) + return -ESOCKTNOSUPPORT; /* only accept TCP socket in LISTEN */ + if (unlikely(sk && sk_is_udp(sk) && sk->sk_state != TCP_CLOSE)) + return -ESOCKTNOSUPPORT; /* only accept UDP socket in CLOSE */ /* Check if socket is suitable for packet L3/L4 protocol */ if (sk && sk->sk_protocol != ctx->protocol) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index e252b8ec2b85..f39ef79ced67 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -511,12 +511,6 @@ static bool sock_map_op_okay(const struct bpf_sock_ops_kern *ops) ops->op == BPF_SOCK_OPS_TCP_LISTEN_CB; } -static bool sk_is_tcp(const struct sock *sk) -{ - return sk->sk_type == SOCK_STREAM && - sk->sk_protocol == IPPROTO_TCP; -} - static bool sock_map_redirect_allowed(const struct sock *sk) { if (sk_is_tcp(sk)) From patchwork Wed Nov 3 20:47:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 12601587 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F583C433F5 for ; Wed, 3 Nov 2021 20:48:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5691960720 for ; Wed, 3 Nov 2021 20:48:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231282AbhKCUur (ORCPT ); Wed, 3 Nov 2021 16:50:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230198AbhKCUuq (ORCPT ); Wed, 3 Nov 2021 16:50:46 -0400 Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBC17C061714; Wed, 3 Nov 2021 13:48:09 -0700 (PDT) Received: by mail-il1-x133.google.com with SMTP id s14so3923774ilv.10; Wed, 03 Nov 2021 13:48:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=IsEm7YwJ2jCN77f8FH/qoi3xCimi77N04M1MEq6g5zo=; b=VjEsHs2KoDE3PcvNYWg9ZU9q+FmpHmb6FhRJv622Imf5CFXD9qIhufjISnV1wd/hjU xbQo9Y8bZT0pRoGz7HWM+55cINGrO1/PbGjSZoHSXtg/hEwQDiqPU37wv/ka0gibrS42 ZkL8yaII2ylsL7EmQg92eWdbmp2wRm08v5tTZfVbJ32dUInsFN/zMc0RF01vTakND12+ ustIskSbxcGo6D/KihGEC6dm0EqXPYwIkM+YsNLgVnld4d8OF1TOfBeEUI7Ztmd5j4F/ 2B//iZlMOqEX+74w7zqIpxc/itWeZpmng7KwWCtQnUrA5dymkH4D779lbmvlk68eQMt4 n4ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=IsEm7YwJ2jCN77f8FH/qoi3xCimi77N04M1MEq6g5zo=; b=l3cmvmHz8nXuOzvCfF9tIU7fYpAxnu9h8RJ6MoKwixtWkviA2QWp7avHq8cIsIUxxH 9mggwO8mcanIvnEYxIAPNooLOLWUtvtPiIgq2Bei0vFtYt7137csTFusSpCv0Z63SCvk yMCyaSL+AR5c6its+0v/4d3hM3jYaTTBJdiQR1qfoxsiPlo48NbDe5IidZB2KOKygHjO NsedXZbAxqMwNh5Jw3lbWzZdFiysaIBoBdvktaMD/VdEgl+vBZ5sbM2tGEJrPi9QsheQ JGCyASkw24v5RLE6iTrfNJWneQVjASU6xuTKyonsfsTCUavRR/AExY+FWz7f1efTUULR WIpw== X-Gm-Message-State: AOAM5314ocRwS1EYNDUr5o7S9mlpmZf7xqOGSFZLqBh2r3/J+xnJ5KmM uq6AUzUdhF/ad1s8lQDLbqDDdY9sbLMZCw== X-Google-Smtp-Source: ABdhPJz3iTLVp4wa42AjPV6nn6noOPUC8e5Gkj6Rc+8PP4vcsJwIT111T8onPtN8GFnTcnn0yrM98Q== X-Received: by 2002:a92:cda5:: with SMTP id g5mr8656063ild.97.1635972488917; Wed, 03 Nov 2021 13:48:08 -0700 (PDT) Received: from john.lan ([172.243.151.11]) by smtp.gmail.com with ESMTPSA id y11sm1507612ior.4.2021.11.03.13.48.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Nov 2021 13:48:08 -0700 (PDT) From: John Fastabend To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: daniel@iogearbox.net, joamaki@gmail.com, xiyou.wangcong@gmail.com, jakub@cloudflare.com, john.fastabend@gmail.com Subject: [PATCH bpf v2 2/5] bpf, sockmap: Remove unhash handler for BPF sockmap usage Date: Wed, 3 Nov 2021 13:47:33 -0700 Message-Id: <20211103204736.248403-3-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211103204736.248403-1-john.fastabend@gmail.com> References: <20211103204736.248403-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net We do not need to handle unhash from BPF side we can simply wait for the close to happen. The original concern was a socket could transition from ESTABLISHED state to a new state while the BPF hook was still attached. But, we convinced ourself this is no longer possible and we also improved BPF sockmap to handle listen sockets so this is no longer a problem. More importantly though there are cases where unhash is called when data is in the receive queue. The BPF unhash logic will flush this data which is wrong. To be correct it should keep the data in the receive queue and allow a receiving application to continue reading the data. This may happen when tcp_abort is received for example. Instead of complicating the logic in unhash simply moving all this to tcp_close hook solves this. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Tested-by: Jussi Maki Signed-off-by: John Fastabend --- net/ipv4/tcp_bpf.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 5f4d6f45d87f..246f725b78c9 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -475,7 +475,6 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS], struct proto *base) { prot[TCP_BPF_BASE] = *base; - prot[TCP_BPF_BASE].unhash = sock_map_unhash; prot[TCP_BPF_BASE].close = sock_map_close; prot[TCP_BPF_BASE].recvmsg = tcp_bpf_recvmsg; prot[TCP_BPF_BASE].sock_is_readable = sk_msg_is_readable; From patchwork Wed Nov 3 20:47:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 12601595 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A2D1C433EF for ; Wed, 3 Nov 2021 20:48:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0DAED60720 for ; Wed, 3 Nov 2021 20:48:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231288AbhKCUu4 (ORCPT ); Wed, 3 Nov 2021 16:50:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230198AbhKCUuz (ORCPT ); Wed, 3 Nov 2021 16:50:55 -0400 Received: from mail-io1-xd2d.google.com (mail-io1-xd2d.google.com [IPv6:2607:f8b0:4864:20::d2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1075C061714; Wed, 3 Nov 2021 13:48:18 -0700 (PDT) Received: by mail-io1-xd2d.google.com with SMTP id i79so4116804ioa.13; Wed, 03 Nov 2021 13:48:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aYqyw5Wb1F6fwkcRsbWLwzJSFh6wyQJt0oLb9vIykeo=; b=XdHmDK6q9h7mUBdJonpzvDHOxAbLFnGHryqLe2jKqQwPxJU5QLstI6YvAk+4p9QCVX WI6hs7AKJkxn4ggDeWS2lagISmFgktGVCCvyOsV4vZSfKSFle1qkGtIhDu1Biyspja6l 6a/SimDK/wMIeBRUuxe8K3ZrUi9T7yQqPHcsycNFgE/1Naepkks7omjYymhUGg0++/B4 9j8BUoDJ0pC1M/6qnFkNqd5G8CVSzJh/YddTp9ibrCN79AUnSshAAoXDcaJ7yF7KgcRW ja5imr2JYcneykM3E6Ss+qfeoYJqd+nwLJP8GQZFUMR7/MWqWdQkuDmPdqufXKWqNuak sbFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aYqyw5Wb1F6fwkcRsbWLwzJSFh6wyQJt0oLb9vIykeo=; b=3LEmh7CPXyaQ5rFgiGrxP+eLrgJeLMstHGmFWnigFT24mBIfbtbJ2oSZCNFRC68MuY 6koxm6ccK31TSfq7AefJ0c6oB7Pge4UzzKBl5ZM+sWpY4CQe1u05wlg4TXkPutQPraty UPRo0nHuy9v5QLfzlHaB3vqWTd3Xz51f8vbBMDEw0RDWZq3VwoWwWTklAtRMonbLVzPn zOcRbueNkZjvmCtHTx1B/LImspo8r1GWkY4h2GIcOt70XnDWXFFVmBrCpZh6ZztEJ4bB UDzyVSwuK3l79d59XOCYjBKMYWySGNondRayUCJ/S6uiOnVi3muQF5tdrqHilzy5fNna VvXQ== X-Gm-Message-State: AOAM533F4G2B+fJbQ8yKCelpAmwxQ5C/G7i7APhNKjHj1diO5YhqrDYL tDGfESHTFauif4SBSG+bHvYIwXWmQuqDDA== X-Google-Smtp-Source: ABdhPJxgAe4NHkII8OudmkTXGthNogaywaZDcaFmXmdLCMSOSBDVwK2OBp2aOAU+aUPqDwzz6mkIPw== X-Received: by 2002:a05:6602:1550:: with SMTP id h16mr31909068iow.125.1635972497834; Wed, 03 Nov 2021 13:48:17 -0700 (PDT) Received: from john.lan ([172.243.151.11]) by smtp.gmail.com with ESMTPSA id y11sm1507612ior.4.2021.11.03.13.48.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Nov 2021 13:48:17 -0700 (PDT) From: John Fastabend To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: daniel@iogearbox.net, joamaki@gmail.com, xiyou.wangcong@gmail.com, jakub@cloudflare.com, john.fastabend@gmail.com Subject: [PATCH bpf v2 3/5] bpf, sockmap: Fix race in ingress receive verdict with redirect to self Date: Wed, 3 Nov 2021 13:47:34 -0700 Message-Id: <20211103204736.248403-4-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211103204736.248403-1-john.fastabend@gmail.com> References: <20211103204736.248403-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net A socket in a sockmap may have different combinations of programs attached depending on configuration. There can be no programs in which case the socket acts as a sink only. There can be a TX program in this case a BPF program is attached to sending side, but no RX program is attached. There can be an RX program only where sends have no BPF program attached, but receives are hooked with BPF. And finally, both TX and RX programs may be attached. Giving us the permutations, None, Tx, Rx, and TxRx To date most of our use cases have been TX case being used as a fast datapath to directly copy between local application and a userspace proxy. Or Rx cases and TxRX applications that are operating an in kernel based proxy. The traffic in the first case where we hook applications into a userspace application looks like this, AppA redirect AppB Tx <-----------> Rx | | + + TCP <--> lo <--> TCP In this case all traffic from AppA (after 3whs) is copied into the AppB ingress queue and no traffic is ever on the TCP recieive_queue. In the second case the application never receives, except in some rare error cases, traffic on the actual user space socket. Instead the send happens in the kernel. AppProxy socket pool sk0 ------------->{sk1,sk2, skn} ^ | | | | v ingress lb egress TCP TCP Here because traffic is never read off the socket with userspace recv() APIs there is only ever one reader on the sk receive_queue. Namely the BPF programs. However, we've started to introduce a third configuration where the BPF program on receive should process the data, but then the normal case is to push the data into the receive queue of AppB. AppB recv() (userspace) ----------------------- tcp_bpf_recvmsg() (kernel) | | | | | | ingress_msgQ | | | RX_BPF | | | v v sk->receive_queue This is different from the App{A,B} redirect because traffic is first received on the sk->receive_queue. Now for the issue. The tcp_bpf_recvmsg() handler first checks the ingress_msg queue for any data handled by the BPF rx program and returned with PASS code so that it was enqueued on the ingress msg queue. Then if no data exists on that queue it checks the socket receive queue. Unfortunately, this is the same receive_queue the BPF program is reading data off of. So we get a race. Its possible for the recvmsg() hook to pull data off the receive_queue before the BPF hook has a chance to read it. It typically happens when an application is banging on recv() and getting EAGAINs. Until they manage to race with the RX BPF program. To fix this we note that before this patch at attach time when the socket is loaded into the map we check if it needs a TX program or just the base set of proto bpf hooks. Then it uses the above general RX hook regardless of if we have a BPF program attached at rx or not. This patch now extends this check to handle all cases enumerated above, TX, RX, TXRX, and none. And to fix above race when an RX program is attached we use a new hook that is nearly identical to the old one except now we do not let the recv() call skip the RX BPF program. Now only the BPF program pulls data from sk->receive_queue and recv() only pulls data from the ingress msgQ post BPF program handling. With this resolved our AppB from above has been up and running for many hours without detecting any errors. We do this by correlating counters in RX BPF events and the AppB to ensure data is never skipping the BPF program. Selftests, was not able to detect this because we only run them for a short period of time on well ordered send/recvs so we don't get any of the noise we see in real application environments. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Tested-by: Jussi Maki Acked-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/ipv4/tcp_bpf.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 246f725b78c9..f70aa0932bd6 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -172,6 +172,41 @@ static int tcp_msg_wait_data(struct sock *sk, struct sk_psock *psock, return ret; } +static int tcp_bpf_recvmsg_parser(struct sock *sk, + struct msghdr *msg, + size_t len, + int nonblock, + int flags, + int *addr_len) +{ + struct sk_psock *psock; + int copied; + + if (unlikely(flags & MSG_ERRQUEUE)) + return inet_recv_error(sk, msg, len, addr_len); + + psock = sk_psock_get(sk); + if (unlikely(!psock)) + return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + + lock_sock(sk); +msg_bytes_ready: + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + if (!copied) { + long timeo; + int data; + + timeo = sock_rcvtimeo(sk, nonblock); + data = tcp_msg_wait_data(sk, psock, timeo); + if (data && !sk_psock_queue_empty(psock)) + goto msg_bytes_ready; + copied = -EAGAIN; + } + release_sock(sk); + sk_psock_put(sk, psock); + return copied; +} + static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len) { @@ -464,6 +499,8 @@ enum { enum { TCP_BPF_BASE, TCP_BPF_TX, + TCP_BPF_RX, + TCP_BPF_TXRX, TCP_BPF_NUM_CFGS, }; @@ -482,6 +519,12 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS], prot[TCP_BPF_TX] = prot[TCP_BPF_BASE]; prot[TCP_BPF_TX].sendmsg = tcp_bpf_sendmsg; prot[TCP_BPF_TX].sendpage = tcp_bpf_sendpage; + + prot[TCP_BPF_RX] = prot[TCP_BPF_BASE]; + prot[TCP_BPF_RX].recvmsg = tcp_bpf_recvmsg_parser; + + prot[TCP_BPF_TXRX] = prot[TCP_BPF_TX]; + prot[TCP_BPF_TXRX].recvmsg = tcp_bpf_recvmsg_parser; } static void tcp_bpf_check_v6_needs_rebuild(struct proto *ops) @@ -519,6 +562,10 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore) int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4; int config = psock->progs.msg_parser ? TCP_BPF_TX : TCP_BPF_BASE; + if (psock->progs.stream_verdict || psock->progs.skb_verdict) { + config = (config == TCP_BPF_TX) ? TCP_BPF_TXRX : TCP_BPF_RX; + } + if (restore) { if (inet_csk_has_ulp(sk)) { /* TLS does not have an unhash proto in SW cases, From patchwork Wed Nov 3 20:47:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 12601599 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47C7DC433F5 for ; Wed, 3 Nov 2021 20:48:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 292AC60720 for ; Wed, 3 Nov 2021 20:48:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231340AbhKCUvF (ORCPT ); Wed, 3 Nov 2021 16:51:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230198AbhKCUvE (ORCPT ); Wed, 3 Nov 2021 16:51:04 -0400 Received: from mail-io1-xd2d.google.com (mail-io1-xd2d.google.com [IPv6:2607:f8b0:4864:20::d2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9AAAC061714; Wed, 3 Nov 2021 13:48:27 -0700 (PDT) Received: by mail-io1-xd2d.google.com with SMTP id y73so4439000iof.4; Wed, 03 Nov 2021 13:48:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=S8mjFs7ebR+xEuiRShrGc07G4jO+Mgiscghvbj0Cekg=; b=qAifksKhf+PCp2Wis/FyVi5D9BeFB8Ovz9JcZ46aTsSl9CW3Aag0k1ph4qpzJ1KJNG rNlp+jUmd2qTgS265tCU6bX7+AuWEjdEQWift+7z7BpwbDCf5F/cb15Sakb7HCYC48XP 4JQSte6oPR9qW4cw44DFnSaAy1O5L9QNOKx1VBmGl8Zg0Zq+OhV4wN6Kd22d/1eXJc1m RPP1Qv8n94B1Qx6BBDmja6YbjFAETHGs47x3UBTrDX/1nlCmrppdIIByncJbhwSa3PjY N1J0y5R4KywZ2KIbx+sH9shjPgqGp9x1YjD9I2hpQ9487SHlTNN+NmXYl6OsjstkR9YZ Ww8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=S8mjFs7ebR+xEuiRShrGc07G4jO+Mgiscghvbj0Cekg=; b=00qYwHjZ50vBaqb4ggps0c7jstffV5a0Bqa7o3QaGKKmtFTlgBBgPus11PEAIuq/kE e63Wc3UFNdns07bajichhw4D6K9xyVOfzhcOOZy8mOniBqUS1D0zX6L78StXD26PIyJm KrkPV8fO5d6kzpjY7TZxmGHORLrrkC8r0oXXzTPh7c5qdxAJh0MKsmJP4oh5dqCUEUHo KTfx8rdvsEtW6Ze4U3jG78xNUWjamwRzdbS0WbQauAKjYuqnPpiydxXm5TiFWYbWBrcs vzVvcuVIF0F1u32AHZBSACSzJNfmaTdSv5z2Vi5Kjxvisr1KpRzMpROcmWfJll3qTEsA yD4g== X-Gm-Message-State: AOAM530keE56Kfb0SfT+g8sD3Z7eVq+RX99YrswpssoH+i7/+OPYsGjN EsLOaedasKXUi3DsYR9dNnd5CnEHrLb/ow== X-Google-Smtp-Source: ABdhPJyh6NCUjTbEroXGOFjb6/MV9AohF3QdEc3iKXYJrS9QRqAZNlK80oIMLtFZPJgb0oo1Pg03Rg== X-Received: by 2002:a02:5b82:: with SMTP id g124mr598217jab.89.1635972507050; Wed, 03 Nov 2021 13:48:27 -0700 (PDT) Received: from john.lan ([172.243.151.11]) by smtp.gmail.com with ESMTPSA id y11sm1507612ior.4.2021.11.03.13.48.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Nov 2021 13:48:26 -0700 (PDT) From: John Fastabend To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: daniel@iogearbox.net, joamaki@gmail.com, xiyou.wangcong@gmail.com, jakub@cloudflare.com, john.fastabend@gmail.com Subject: [PATCH bpf v2 4/5] bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding Date: Wed, 3 Nov 2021 13:47:35 -0700 Message-Id: <20211103204736.248403-5-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211103204736.248403-1-john.fastabend@gmail.com> References: <20211103204736.248403-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Strparser is reusing the qdisc_skb_cb struct to stash the skb message handling progress, e.g. offset and length of the skb. First this is poorly named and inherits a struct from qdisc that doesn't reflect the actual usage of cb[] at this layer. But, more importantly strparser is using the following to access its metadata. (struct _strp_msg *)((void *)skb->cb + offsetof(struct qdisc_skb_cb, data)) Where _strp_msg is defined as, struct _strp_msg { struct strp_msg strp; /* 0 8 */ int accum_len; /* 8 4 */ /* size: 12, cachelines: 1, members: 2 */ /* last cacheline: 12 bytes */ }; So we use 12 bytes of ->data[] in struct. However in BPF code running parser and verdict the user has read capabilities into the data[] array as well. Its not too problematic, but we should not be exposing internal state to BPF program. If its really needed then we can use the probe_read() APIs which allow reading kernel memory. And I don't believe cb[] layer poses any API breakage by moving this around because programs can't depend on cb[] across layers. In order to fix another issue with a ctx rewrite we need to stash a temp variable somewhere. To make this work cleanly this patch builds a cb struct for sk_skb types called sk_skb_cb struct. Then we can use this consistently in the strparser, sockmap space. Additionally we can start allowing ->cb[] write access after this. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface" Tested-by: Jussi Maki Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- include/net/strparser.h | 16 +++++++++++++++- net/core/filter.c | 22 ++++++++++++++++++++++ net/strparser/strparser.c | 10 +--------- 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/include/net/strparser.h b/include/net/strparser.h index 1d20b98493a1..bec1439bd3be 100644 --- a/include/net/strparser.h +++ b/include/net/strparser.h @@ -54,10 +54,24 @@ struct strp_msg { int offset; }; +struct _strp_msg { + /* Internal cb structure. struct strp_msg must be first for passing + * to upper layer. + */ + struct strp_msg strp; + int accum_len; +}; + +struct sk_skb_cb { +#define SK_SKB_CB_PRIV_LEN 20 + unsigned char data[SK_SKB_CB_PRIV_LEN]; + struct _strp_msg strp; +}; + static inline struct strp_msg *strp_msg(struct sk_buff *skb) { return (struct strp_msg *)((void *)skb->cb + - offsetof(struct qdisc_skb_cb, data)); + offsetof(struct sk_skb_cb, strp)); } /* Structure for an attached lower socket */ diff --git a/net/core/filter.c b/net/core/filter.c index a68418268e92..c3936d0724b8 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -9782,11 +9782,33 @@ static u32 sk_skb_convert_ctx_access(enum bpf_access_type type, struct bpf_prog *prog, u32 *target_size) { struct bpf_insn *insn = insn_buf; + int off; switch (si->off) { case offsetof(struct __sk_buff, data_end): insn = bpf_convert_data_end_access(si, insn); break; + case offsetof(struct __sk_buff, cb[0]) ... + offsetofend(struct __sk_buff, cb[4]) - 1: + BUILD_BUG_ON(sizeof_field(struct sk_skb_cb, data) < 20); + BUILD_BUG_ON((offsetof(struct sk_buff, cb) + + offsetof(struct sk_skb_cb, data)) % + sizeof(__u64)); + + prog->cb_access = 1; + off = si->off; + off -= offsetof(struct __sk_buff, cb[0]); + off += offsetof(struct sk_buff, cb); + off += offsetof(struct sk_skb_cb, data); + if (type == BPF_WRITE) + *insn++ = BPF_STX_MEM(BPF_SIZE(si->code), si->dst_reg, + si->src_reg, off); + else + *insn++ = BPF_LDX_MEM(BPF_SIZE(si->code), si->dst_reg, + si->src_reg, off); + break; + + default: return bpf_convert_ctx_access(type, si, insn_buf, prog, target_size); diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c index 9c0343568d2a..1a72c67afed5 100644 --- a/net/strparser/strparser.c +++ b/net/strparser/strparser.c @@ -27,18 +27,10 @@ static struct workqueue_struct *strp_wq; -struct _strp_msg { - /* Internal cb structure. struct strp_msg must be first for passing - * to upper layer. - */ - struct strp_msg strp; - int accum_len; -}; - static inline struct _strp_msg *_strp_msg(struct sk_buff *skb) { return (struct _strp_msg *)((void *)skb->cb + - offsetof(struct qdisc_skb_cb, data)); + offsetof(struct sk_skb_cb, strp)); } /* Lower lock held */ From patchwork Wed Nov 3 20:47:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 12601601 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C928C433FE for ; Wed, 3 Nov 2021 20:48:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 086E560720 for ; Wed, 3 Nov 2021 20:48:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230198AbhKCUvO (ORCPT ); Wed, 3 Nov 2021 16:51:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231388AbhKCUvM (ORCPT ); Wed, 3 Nov 2021 16:51:12 -0400 Received: from mail-io1-xd33.google.com (mail-io1-xd33.google.com [IPv6:2607:f8b0:4864:20::d33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3436C061714; Wed, 3 Nov 2021 13:48:35 -0700 (PDT) Received: by mail-io1-xd33.google.com with SMTP id r3so2967571iod.6; Wed, 03 Nov 2021 13:48:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0RJWymqwchaNorQTo4g8giSsicFnOlZGInpHVyvA+Tg=; b=meKKx78Gx1hoCWIENP2RsNHnjEXMiQ4year86XQUuus9qfUmShJKhrmWsSK4DUCaiu uxWF0ge8fXAxYgpCRwBcW49utwEwDEuU5pby3KAgPZB/ArexBUpAn0AYvSU/T2FmFHgv 3tB+2OmJkqxODtmUpVzpPUOXA5VodvFg776SB5loD4KYrW+YeBxMHmARPXa4D1vLGvfC 0d7JjEigyaZhVsidkeBhq4Dzeau0c4K2oVQ/RJV6Rlmabb9or/+NTibxsp07//l9gd5q GM7sU2aGYC/t6d5qZf25Wuw6oIbTiYbzUj9OaRb0gYFS5oP0H205XWeb5XRwRSZq/BTX jaVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0RJWymqwchaNorQTo4g8giSsicFnOlZGInpHVyvA+Tg=; b=TCFI0QKbdcD5Hkw0o2T0Q2QASLi7Zc7cEQf0Qo0P8W+CZPXJ1UZN7Tr/gYJ1zpZQs8 2bYdoIPGV0TLWFdrjNazwIRl0Ar5NHLxtu25MEJpFsnNVJ6JTLBc6FS6/st9Qgi8B8id EdRs3quben6LT1B3k6+F3Qmw3367fDGYTsTSrs+Nth28G2XXy/m+xDVFHwCVRr2Gzvae v5SX8alULJtA14bWNQn7lbRKnzLeFlw04fKPHYoQtOOBSKmX9ltS59Cu5ruBpUP7V57h fOQCT+xQQyPOf0zQ9m+kFAVMQ+1iZmYkJNm2d7gaVaEVlYwGEJGHR/7/6n3wiM3YPjWM DJKA== X-Gm-Message-State: AOAM5326fyXJQSEF+9rAa7VXSYfl4c1OvN5+0DuV/+1zHO1GIPt6/asX dbTYgZQk98Kyu22VnhZjiD3JXXGmX5hFAA== X-Google-Smtp-Source: ABdhPJxDKyctfIgafGan8+pIkxO9uOvgiU3fBh1Lhb1klvwqTXWl3mY6QH//PHtMyECt0vdYQSfwag== X-Received: by 2002:a5d:8b94:: with SMTP id p20mr32081885iol.146.1635972514897; Wed, 03 Nov 2021 13:48:34 -0700 (PDT) Received: from john.lan ([172.243.151.11]) by smtp.gmail.com with ESMTPSA id y11sm1507612ior.4.2021.11.03.13.48.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Nov 2021 13:48:34 -0700 (PDT) From: John Fastabend To: bpf@vger.kernel.org, netdev@vger.kernel.org Cc: daniel@iogearbox.net, joamaki@gmail.com, xiyou.wangcong@gmail.com, jakub@cloudflare.com, john.fastabend@gmail.com Subject: [PATCH bpf v2 5/5] bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg Date: Wed, 3 Nov 2021 13:47:36 -0700 Message-Id: <20211103204736.248403-6-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211103204736.248403-1-john.fastabend@gmail.com> References: <20211103204736.248403-1-john.fastabend@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Jussi Maki The current conversion of skb->data_end reads like this, ; data_end = (void*)(long)skb->data_end; 559: (79) r1 = *(u64 *)(r2 +200) ; r1 = skb->data 560: (61) r11 = *(u32 *)(r2 +112) ; r11 = skb->len 561: (0f) r1 += r11 562: (61) r11 = *(u32 *)(r2 +116) 563: (1f) r1 -= r11 But similar to the case ("bpf: sock_ops sk access may stomp registers when dst_reg = src_reg"), the code will read an incorrect skb->len when src == dst. In this case we end up generating this xlated code. ; data_end = (void*)(long)skb->data_end; 559: (79) r1 = *(u64 *)(r1 +200) ; r1 = skb->data 560: (61) r11 = *(u32 *)(r1 +112) ; r11 = (skb->data)->len 561: (0f) r1 += r11 562: (61) r11 = *(u32 *)(r1 +116) 563: (1f) r1 -= r11 where line 560 is the reading 4B of (skb->data + 112) instead of the intended skb->len Here the skb pointer in r1 gets set to skb->data and the later deref for skb->len ends up following skb->data instead of skb. This fixes the issue similarly to the patch mentioned above by creating an additional temporary variable and using to store the register when dst_reg = src_reg. We name the variable bpf_temp_reg and place it in the cb context for sk_skb. Then we restore from the temp to ensure nothing is lost. Fixes: 16137b09a66f2 ("bpf: Compute data_end dynamically with JIT code") Reviewed-by: Jakub Sitnicki Signed-off-by: Jussi Maki Signed-off-by: John Fastabend --- include/net/strparser.h | 4 ++++ net/core/filter.c | 36 ++++++++++++++++++++++++++++++------ 2 files changed, 34 insertions(+), 6 deletions(-) diff --git a/include/net/strparser.h b/include/net/strparser.h index bec1439bd3be..732b7097d78e 100644 --- a/include/net/strparser.h +++ b/include/net/strparser.h @@ -66,6 +66,10 @@ struct sk_skb_cb { #define SK_SKB_CB_PRIV_LEN 20 unsigned char data[SK_SKB_CB_PRIV_LEN]; struct _strp_msg strp; + /* temp_reg is a temporary register used for bpf_convert_data_end_access + * when dst_reg == src_reg. + */ + u64 temp_reg; }; static inline struct strp_msg *strp_msg(struct sk_buff *skb) diff --git a/net/core/filter.c b/net/core/filter.c index c3936d0724b8..e471c9b09670 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -9756,22 +9756,46 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, static struct bpf_insn *bpf_convert_data_end_access(const struct bpf_insn *si, struct bpf_insn *insn) { - /* si->dst_reg = skb->data */ + int reg; + int temp_reg_off = offsetof(struct sk_buff, cb) + + offsetof(struct sk_skb_cb, temp_reg); + + if (si->src_reg == si->dst_reg) { + /* We need an extra register, choose and save a register. */ + reg = BPF_REG_9; + if (si->src_reg == reg || si->dst_reg == reg) + reg--; + if (si->src_reg == reg || si->dst_reg == reg) + reg--; + *insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg, temp_reg_off); + } else { + reg = si->dst_reg; + } + + /* reg = skb->data */ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, data), - si->dst_reg, si->src_reg, + reg, si->src_reg, offsetof(struct sk_buff, data)); /* AX = skb->len */ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, len), BPF_REG_AX, si->src_reg, offsetof(struct sk_buff, len)); - /* si->dst_reg = skb->data + skb->len */ - *insn++ = BPF_ALU64_REG(BPF_ADD, si->dst_reg, BPF_REG_AX); + /* reg = skb->data + skb->len */ + *insn++ = BPF_ALU64_REG(BPF_ADD, reg, BPF_REG_AX); /* AX = skb->data_len */ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, data_len), BPF_REG_AX, si->src_reg, offsetof(struct sk_buff, data_len)); - /* si->dst_reg = skb->data + skb->len - skb->data_len */ - *insn++ = BPF_ALU64_REG(BPF_SUB, si->dst_reg, BPF_REG_AX); + + /* reg = skb->data + skb->len - skb->data_len */ + *insn++ = BPF_ALU64_REG(BPF_SUB, reg, BPF_REG_AX); + + if (si->src_reg == si->dst_reg) { + /* Restore the saved register */ + *insn++ = BPF_MOV64_REG(BPF_REG_AX, si->src_reg); + *insn++ = BPF_MOV64_REG(si->dst_reg, reg); + *insn++ = BPF_LDX_MEM(BPF_DW, reg, BPF_REG_AX, temp_reg_off); + } return insn; }