From patchwork Wed Jul 7 11:25:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 12362567 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C867C07E95 for ; Wed, 7 Jul 2021 13:14:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 855796199E for ; Wed, 7 Jul 2021 13:14:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231594AbhGGNQg (ORCPT ); Wed, 7 Jul 2021 09:16:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231683AbhGGNQ3 (ORCPT ); Wed, 7 Jul 2021 09:16:29 -0400 Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com [IPv6:2a00:1450:4864:20::131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F40EC06175F; Wed, 7 Jul 2021 06:13:49 -0700 (PDT) Received: by mail-lf1-x131.google.com with SMTP id u18so3971979lff.9; Wed, 07 Jul 2021 06:13:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=m3fFJUM5TUVc6C7AowOZtiRQYfZF/Tu07bFPg8gbzvk=; b=avBxc5pTOrEgOHuxOygbHVxaaOT6NRAmDKH1XpI1skzS0C55PWiVe7+UA6+fgh/5hs EgS4PSNZRs9HOnQ8050Z8M6TqWMwQWmz7krH8ZPqUajRIegbfrHpLlMTfEFwBjEEjPVG vMmORt8BsLz4RV1tO5RONYeVbwy5ivcZdr4L0RQlj5anZziGky1AEq7fuBA+k4vJ1x+n 1CEp5E+0jDdN+b0/vaArdWq7YzNOBWzImjbyMbK7xtf0TLIAjQ0Y1GaSMEu+adQgRPo5 8X/0X28+AOX6BVfJRQft3MB3P/RnynXEasnRW2t9jdbBFx8JW1xLnpq5E0EFq5yYewKo BmkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=m3fFJUM5TUVc6C7AowOZtiRQYfZF/Tu07bFPg8gbzvk=; b=aysqrWa7DVTbpuJYrZV/IaCRHKkE8SOLFlgw4c67RjhHoCng5+wPHfgpeOKBADEfOb 2Me8Fr7K4DBYAMAdw5g25px2HWsO0z/fBy3XiI/zQey0FXtJx+NhvbTyK7JfajZCePK2 sVbYeoLfxw3Z4mkIjgs9MF+MPAFZXNd/W8zQgDCPmdr4j2pWDJHUHwYXgVA7Ip1ec+Ni EKl9IlPBpDlhHj7ElV7cZ6KTxsT9k+5RwpHPPf89M8UOGdgLxLYnWYeW1sDfC2pRojm1 hqd7XJT57lsLdCFl1b53Dj911Qxvbx/QXNTkDwbf38njUAdQzuc3NsnFpd90Q4SkhwM1 aCVQ== X-Gm-Message-State: AOAM532k+6sg/g53Pt8ZmW6WU4qm3Iayn8BOxgML3sjPOvftnskBTKZp 3gwZxJ3ssB8TOJnWhJT8IZzaynLxk9bUw/hGPQ== X-Google-Smtp-Source: ABdhPJxxUF8kkgJbuG2ZRpWiqbd84V6JnzVREiJkvgeDBHXsUhmJd4SuNIt7Ci7JONN/AOAE3jGG7w== X-Received: by 2002:ac2:48a3:: with SMTP id u3mr2658869lfg.38.1625663626979; Wed, 07 Jul 2021 06:13:46 -0700 (PDT) Received: from localhost.localdomain ([89.42.43.188]) by smtp.gmail.com with ESMTPSA id u9sm1423571lfm.127.2021.07.07.06.13.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jul 2021 06:13:46 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, maciej.fijalkowski@intel.com, magnus.karlsson@intel.com, Jussi Maki Subject: [PATCH bpf-next v3 1/5] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Date: Wed, 7 Jul 2021 11:25:47 +0000 Message-Id: <20210707112551.9782-2-joamaki@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210707112551.9782-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> <20210707112551.9782-1-joamaki@gmail.com> Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net In preparation for adding XDP support to the bonding driver refactor the packet hashing functions to be able to work with any linear data buffer without an skb. Signed-off-by: Jussi Maki --- drivers/net/bonding/bond_main.c | 147 +++++++++++++++++++------------- 1 file changed, 90 insertions(+), 57 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 0ff7567bd04f..78284c451668 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3487,55 +3487,80 @@ static struct notifier_block bond_netdev_notifier = { /*---------------------------- Hashing Policies -----------------------------*/ +/* Helper to access data in a packet, with or without a backing skb. + * If skb is given the data is linearized if necessary via pskb_may_pull. + */ +static inline const void *bond_pull_data(struct sk_buff *skb, + const void *data, int hlen, int n) +{ + if (likely(n <= hlen)) + return data; + else if (skb && likely(pskb_may_pull(skb, n))) + return skb->head; + + return NULL; +} + /* L2 hash helper */ -static inline u32 bond_eth_hash(struct sk_buff *skb) +static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen) { - struct ethhdr *ep, hdr_tmp; + struct ethhdr *ep; - ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp); - if (ep) - return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto; - return 0; + data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr)); + if (!data) + return 0; + + ep = (struct ethhdr *)(data + mhoff); + return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto; } -static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, - int *noff, int *proto, bool l34) +static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data, + int hlen, __be16 l2_proto, int *nhoff, int *ip_proto, bool l34) { const struct ipv6hdr *iph6; const struct iphdr *iph; - if (skb->protocol == htons(ETH_P_IP)) { - if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph)))) + if (l2_proto == htons(ETH_P_IP)) { + data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph)); + if (!data) return false; - iph = (const struct iphdr *)(skb->data + *noff); + + iph = (const struct iphdr *)(data + *nhoff); iph_to_flow_copy_v4addrs(fk, iph); - *noff += iph->ihl << 2; + *nhoff += iph->ihl << 2; if (!ip_is_fragment(iph)) - *proto = iph->protocol; - } else if (skb->protocol == htons(ETH_P_IPV6)) { - if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6)))) + *ip_proto = iph->protocol; + } else if (l2_proto == htons(ETH_P_IPV6)) { + data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6)); + if (!data) return false; - iph6 = (const struct ipv6hdr *)(skb->data + *noff); + + iph6 = (const struct ipv6hdr *)(data + *nhoff); iph_to_flow_copy_v6addrs(fk, iph6); - *noff += sizeof(*iph6); - *proto = iph6->nexthdr; + *nhoff += sizeof(*iph6); + *ip_proto = iph6->nexthdr; } else { return false; } - if (l34 && *proto >= 0) - fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto); + if (l34 && *ip_proto >= 0) + fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen); return true; } -static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) +static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen) { - struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb); + struct ethhdr *mac_hdr; u32 srcmac_vendor = 0, srcmac_dev = 0; u16 vlan; int i; + data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr)); + if (!data) + return 0; + mac_hdr = (struct ethhdr *)(data + mhoff); + for (i = 0; i < 3; i++) srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i]; @@ -3551,26 +3576,25 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb) } /* Extract the appropriate headers based on bond's xmit policy */ -static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, - struct flow_keys *fk) +static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, const void *data, + __be16 l2_proto, int nhoff, int hlen, struct flow_keys *fk) { bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34; - int noff, proto = -1; + int ip_proto = -1; switch (bond->params.xmit_policy) { case BOND_XMIT_POLICY_ENCAP23: case BOND_XMIT_POLICY_ENCAP34: memset(fk, 0, sizeof(*fk)); return __skb_flow_dissect(NULL, skb, &flow_keys_bonding, - fk, NULL, 0, 0, 0, 0); + fk, data, l2_proto, nhoff, hlen, 0); default: break; } fk->ports.ports = 0; memset(&fk->icmp, 0, sizeof(fk->icmp)); - noff = skb_network_offset(skb); - if (!bond_flow_ip(skb, fk, &noff, &proto, l34)) + if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34)) return false; /* ICMP error packets contains at least 8 bytes of the header @@ -3578,22 +3602,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, * to correlate ICMP error packets within the same flow which * generated the error. */ - if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) { - skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data, - skb_transport_offset(skb), - skb_headlen(skb)); - if (proto == IPPROTO_ICMP) { + if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) { + skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen); + if (ip_proto == IPPROTO_ICMP) { if (!icmp_is_err(fk->icmp.type)) return true; - noff += sizeof(struct icmphdr); - } else if (proto == IPPROTO_ICMPV6) { + nhoff += sizeof(struct icmphdr); + } else if (ip_proto == IPPROTO_ICMPV6) { if (!icmpv6_is_err(fk->icmp.type)) return true; - noff += sizeof(struct icmp6hdr); + nhoff += sizeof(struct icmp6hdr); } - return bond_flow_ip(skb, fk, &noff, &proto, l34); + return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34); } return true; @@ -3609,33 +3631,26 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow) return hash >> 1; } -/** - * bond_xmit_hash - generate a hash value based on the xmit policy - * @bond: bonding device - * @skb: buffer to use for headers - * - * This function will extract the necessary headers from the skb buffer and use - * them to generate a hash based on the xmit_policy set in the bonding device +/* Generate hash based on xmit policy. If @skb is given it is used to linearize + * the data as required, but this function can be used without it if the data is + * known to be linear (e.g. with xdp_buff). */ -u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) +static u32 __bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, const void *data, + __be16 l2_proto, int mhoff, int nhoff, int hlen) { struct flow_keys flow; u32 hash; - if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 && - skb->l4_hash) - return skb->hash; - if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC) - return bond_vlan_srcmac_hash(skb); + return bond_vlan_srcmac_hash(skb, data, mhoff, hlen); if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 || - !bond_flow_dissect(bond, skb, &flow)) - return bond_eth_hash(skb); + !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow)) + return bond_eth_hash(skb, data, mhoff, hlen); if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 || bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) { - hash = bond_eth_hash(skb); + hash = bond_eth_hash(skb, data, mhoff, hlen); } else { if (flow.icmp.id) memcpy(&hash, &flow.icmp, sizeof(hash)); @@ -3646,6 +3661,25 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) return bond_ip_hash(hash, &flow); } +/** + * bond_xmit_hash - generate a hash value based on the xmit policy + * @bond: bonding device + * @skb: buffer to use for headers + * + * This function will extract the necessary headers from the skb buffer and use + * them to generate a hash based on the xmit_policy set in the bonding device + */ +u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) +{ + if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 && + skb->l4_hash) + return skb->hash; + + return __bond_xmit_hash(bond, skb, skb->head, skb->protocol, + skb->mac_header, skb->network_header, + skb_headlen(skb)); +} + /*-------------------------- Device entry points ----------------------------*/ void bond_work_init_all(struct bonding *bond) @@ -4275,8 +4309,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb, return bond_tx_drop(bond_dev, skb); } -static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond, - struct sk_buff *skb) +static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond) { return rcu_dereference(bond->curr_active_slave); } @@ -4290,7 +4323,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb, struct bonding *bond = netdev_priv(bond_dev); struct slave *slave; - slave = bond_xmit_activebackup_slave_get(bond, skb); + slave = bond_xmit_activebackup_slave_get(bond); if (slave) return bond_dev_queue_xmit(bond, skb, slave->dev); @@ -4588,7 +4621,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev, slave = bond_xmit_roundrobin_slave_get(bond, skb); break; case BOND_MODE_ACTIVEBACKUP: - slave = bond_xmit_activebackup_slave_get(bond, skb); + slave = bond_xmit_activebackup_slave_get(bond); break; case BOND_MODE_8023AD: case BOND_MODE_XOR: From patchwork Wed Jul 7 11:25:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 12362573 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0B9EC07E9E for ; Wed, 7 Jul 2021 13:14:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA2926199E for ; Wed, 7 Jul 2021 13:14:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231734AbhGGNQh (ORCPT ); Wed, 7 Jul 2021 09:16:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231721AbhGGNQb (ORCPT ); Wed, 7 Jul 2021 09:16:31 -0400 Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3117C061574; Wed, 7 Jul 2021 06:13:49 -0700 (PDT) Received: by mail-lf1-x134.google.com with SMTP id n14so3973068lfu.8; Wed, 07 Jul 2021 06:13:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=GWFQPZdk6tgSx4CMjQNACsarfDXRdB/FboAGk1SQ38c=; b=RwIiLF7sIBdjS7Wfxt1T3vsh9oNDHFhfVdtSNNCNnA6jWpD3GbgPKGKSvBQQMz/nqK C27KObtRpbOY3WRp/I0KIRZRSEwrnmfCRUx19A9MZoRI412o08/o2InENuBUUM4sC4rL JjLIz2AYefLSHPKtjXig1Tl9XENxABqIGPTrcJUbyuwvLX49pGK3Cu5/oSZBsieSQrHC fuu0xAJU2yip5c2meHUsMC/jfmeaZgBO3bWzsR3QruKRJCkr+i49V3xm/2/IqZphjdZt aXrR5eVkBp4GASttuFrLwC47Sw3wa0s/aAmGIoHgLCzHR4TjCR0YZE/uFccv2fDR6ogZ fCMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=GWFQPZdk6tgSx4CMjQNACsarfDXRdB/FboAGk1SQ38c=; b=o3mUMysTt5GdYvfilo5lAk1UMOqLPKHYUFjrXHLsMejQPIeN7xMHW6k/X7K0/nn3lb S9GDAoCXCD755sBoqLBhsCnSBtC2ZDnAhD0azz02RuYgqtMHdI/78sSU5fpIxY8tjdsX JUxaNKJcXacvWKhCrmAdDOH8iOLTaK0VlU7bCEbZtave2VtChATkLRUFfgsD2pN6+TMB dXQvkaEyrzZPBAme9XZpUfrJK6XaX8C0lVTtxDIOb/SZSPGNa2oMP2ANszyIbjf4xiJn wjJKO7fBsVVreBrJwesyAyoQLE6CQXwslbHBbuE/DBrWxxGDnf+JT3zhug92RTd4gKoz +qww== X-Gm-Message-State: AOAM532hn9QlaIpzEKRiy/2YCt5vDAkhG09ieeW3J++vlizvu5pNtEce 9sU+PEDD5VY+w6cAsoCKDN9miFu16wCY85st9Q== X-Google-Smtp-Source: ABdhPJxqOYHxSlxumI+Qz9oa2KWDNowTYmmYM9tqJV0snz2+y7DYSeKB9qjnoQw+a9XLDtbYwBgJIw== X-Received: by 2002:a19:e04a:: with SMTP id g10mr19238059lfj.561.1625663627953; Wed, 07 Jul 2021 06:13:47 -0700 (PDT) Received: from localhost.localdomain ([89.42.43.188]) by smtp.gmail.com with ESMTPSA id u9sm1423571lfm.127.2021.07.07.06.13.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jul 2021 06:13:47 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, maciej.fijalkowski@intel.com, magnus.karlsson@intel.com, Jussi Maki Subject: [PATCH bpf-next v3 2/5] net: core: Add support for XDP redirection to slave device Date: Wed, 7 Jul 2021 11:25:48 +0000 Message-Id: <20210707112551.9782-3-joamaki@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210707112551.9782-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> <20210707112551.9782-1-joamaki@gmail.com> Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This adds the ndo_xdp_get_xmit_slave hook for transforming XDP_TX into XDP_REDIRECT after BPF program run when the ingress device is a bond slave. The dev_xdp_prog_count is exposed so that slave devices can be checked for loaded XDP programs in order to avoid the situation where both bond master and slave have programs loaded according to xdp_state. Signed-off-by: Jussi Maki --- include/linux/filter.h | 13 ++++++++++++- include/linux/netdevice.h | 6 ++++++ net/core/dev.c | 9 ++++++++- net/core/filter.c | 25 +++++++++++++++++++++++++ 4 files changed, 51 insertions(+), 2 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 472f97074da0..63b426c58a45 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -760,6 +760,10 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, DECLARE_BPF_DISPATCHER(xdp) +DECLARE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key); + +u32 xdp_master_redirect(struct xdp_buff *xdp); + static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, struct xdp_buff *xdp) { @@ -767,7 +771,14 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, * under local_bh_disable(), which provides the needed RCU protection * for accessing map entries. */ - return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); + u32 act = __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); + + if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { + if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev)) + act = xdp_master_redirect(xdp); + } + + return act; } void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index eaf5bb008aa9..d3b1d882d6fa 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1321,6 +1321,9 @@ struct netdev_net_notifier { * that got dropped are freed/returned via xdp_return_frame(). * Returns negative number, means general error invoking ndo, meaning * no frames were xmit'ed and core-caller will free all frames. + * struct net_device *(*ndo_xdp_get_xmit_slave)(struct net_device *dev, + * struct xdp_buff *xdp); + * Get the xmit slave of master device based on the xdp_buff. * int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags); * This function is used to wake up the softirq, ksoftirqd or kthread * responsible for sending and/or receiving packets on a specific @@ -1539,6 +1542,8 @@ struct net_device_ops { int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp, u32 flags); + struct net_device * (*ndo_xdp_get_xmit_slave)(struct net_device *dev, + struct xdp_buff *xdp); int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags); struct devlink_port * (*ndo_get_devlink_port)(struct net_device *dev); @@ -4069,6 +4074,7 @@ typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, int fd, int expected_fd, u32 flags); int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); +u8 dev_xdp_prog_count(struct net_device *dev); u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode); int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb); diff --git a/net/core/dev.c b/net/core/dev.c index c253c2aafe97..05aac85b2bbc 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9334,7 +9334,7 @@ static struct bpf_prog *dev_xdp_prog(struct net_device *dev, return dev->xdp_state[mode].prog; } -static u8 dev_xdp_prog_count(struct net_device *dev) +u8 dev_xdp_prog_count(struct net_device *dev) { u8 count = 0; int i; @@ -9344,6 +9344,7 @@ static u8 dev_xdp_prog_count(struct net_device *dev) count++; return count; } +EXPORT_SYMBOL_GPL(dev_xdp_prog_count); u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode) { @@ -9379,6 +9380,7 @@ static int dev_xdp_install(struct net_device *dev, enum bpf_xdp_mode mode, xdp.flags = flags; xdp.prog = prog; + /* Drivers assume refcnt is already incremented (i.e, prog pointer is * "moved" into driver), so they don't increment it on their own, but * they do decrement refcnt when program is detached or replaced. @@ -9467,6 +9469,11 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack NL_SET_ERR_MSG(extack, "XDP_FLAGS_REPLACE is not specified"); return -EINVAL; } + /* don't allow loading XDP programs to a bonded device */ + if (netif_is_bond_slave(dev)) { + NL_SET_ERR_MSG(extack, "XDP program can not be attached to a bond slave"); + return -EINVAL; + } mode = dev_xdp_mode(dev, flags); /* can't replace attached link */ diff --git a/net/core/filter.c b/net/core/filter.c index d70187ce851b..10b12577f71d 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3950,6 +3950,31 @@ void bpf_clear_redirect_map(struct bpf_map *map) } } +DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key); +EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key); + +u32 xdp_master_redirect(struct xdp_buff *xdp) +{ + struct net_device *master, *slave; + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev); + slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); + if (slave && slave != xdp->rxq->dev) { + /* The target device is different from the receiving device, so + * redirect it to the new device. + * Using XDP_REDIRECT gets the correct behaviour from XDP enabled + * drivers to unmap the packet from their rx ring. + */ + ri->tgt_index = slave->ifindex; + ri->map_id = INT_MAX; + ri->map_type = BPF_MAP_TYPE_UNSPEC; + return XDP_REDIRECT; + } + return XDP_TX; +} +EXPORT_SYMBOL_GPL(xdp_master_redirect); + int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { From patchwork Wed Jul 7 11:25:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 12362575 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D444CC07E9C for ; Wed, 7 Jul 2021 13:14:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BCDB061A24 for ; Wed, 7 Jul 2021 13:14:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231683AbhGGNQh (ORCPT ); Wed, 7 Jul 2021 09:16:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231720AbhGGNQb (ORCPT ); Wed, 7 Jul 2021 09:16:31 -0400 Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C6C8C06175F; Wed, 7 Jul 2021 06:13:51 -0700 (PDT) Received: by mail-lf1-x135.google.com with SMTP id p16so4014739lfc.5; Wed, 07 Jul 2021 06:13:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=tO853kXQuuVjVM0XOIpBQZojOUtqqTjr1tuCnGnTZ7w=; b=cu4TB1qpeKHh0anij1P4PBr7Hk9r2BrZgxq+LEx0cYWhmICG9F+LzvloUkJvZgm0s8 g1G6yd5s89gSxWViCZ8UPCWkrlrIWdhVupUJbnk3cEPt847IJBCi5a6/fDDu5DRmmYCG 6FVd1YuocbKgBr4Oi5pftBK37dlGb7w/cBsE9hoMKcncqP2nZQmRF5HT0fAlUv1LTyAW zsuG25KH+lOk0op2CZV7qmqOKAspMAPNztfkosPqoiBGyVlAlc8Hln6t2ItoK1BYJdFV zlBOhVxZmqhEI6+iQltzQCQmbpou7xW+pVrJe3F9pr1FvNcThFRktGYYZjtTjG7eNAjq tWIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=tO853kXQuuVjVM0XOIpBQZojOUtqqTjr1tuCnGnTZ7w=; b=BgffbsZ2JAFq525RdXzZfvmSMAYu4IMLQOd6XCFn1X6hhjhcb7779xmX4AGZ3Ab/+2 wG0qcyXbNCkWVL/f6gdGewhxuHsi1CBEMTQ32zH+s6a5u9Qu0g2FANGnwC+3HyahjNkH kI2LBWEanqGQM6A7dajSfvYyI3cjXisbEymgztVwRwtFoF3Z2BrTrgy9lA8Bt12FcuVT ZrsOaLnATmVMjyEbCwupq93xYp3cKnTj+H3Sv+qMQ44WX+mdqBC4DoGXolOCCVh6nJAf UpY4Edv8osk0BowUUnvw776eArT+FXX52TjnK6WWdE+x2KD4NYaCKbya6ewVWlhu8PVM b5yQ== X-Gm-Message-State: AOAM5338OrswRV+4YqUdwHO8qSTylB42k6bvS4akSlJVboYreZFdBY1v sm4q9XXnCD7Xca1J1ESo9sKn8c/s1tOv7AeVhQ== X-Google-Smtp-Source: ABdhPJxxQ5LZkDYWiLHAAGZHDSduPBKdLseuO6txELrA5LqpslWV3atioEe7nEvIlJ+cOfIRZdYUhg== X-Received: by 2002:ac2:514e:: with SMTP id q14mr19300638lfd.296.1625663629019; Wed, 07 Jul 2021 06:13:49 -0700 (PDT) Received: from localhost.localdomain ([89.42.43.188]) by smtp.gmail.com with ESMTPSA id u9sm1423571lfm.127.2021.07.07.06.13.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jul 2021 06:13:48 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, maciej.fijalkowski@intel.com, magnus.karlsson@intel.com, Jussi Maki Subject: [PATCH bpf-next v3 3/5] net: bonding: Add XDP support to the bonding driver Date: Wed, 7 Jul 2021 11:25:49 +0000 Message-Id: <20210707112551.9782-4-joamaki@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210707112551.9782-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> <20210707112551.9782-1-joamaki@gmail.com> Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net XDP is implemented in the bonding driver by transparently delegating the XDP program loading, removal and xmit operations to the bonding slave devices. The overall goal of this work is that XDP programs can be attached to a bond device *without* any further changes (or awareness) necessary to the program itself, meaning the same XDP program can be attached to a native device but also a bonding device. Semantics of XDP_TX when attached to a bond are equivalent in such setting to the case when a tc/BPF program would be attached to the bond, meaning transmitting the packet out of the bond itself using one of the bond's configured xmit methods to select a slave device (rather than XDP_TX on the slave itself). Handling of XDP_TX to transmit using the configured bonding mechanism is therefore implemented by rewriting the BPF program return value in bpf_prog_run_xdp. To avoid performance impact this check is guarded by a static key, which is incremented when a XDP program is loaded onto a bond device. This approach was chosen to avoid changes to drivers implementing XDP. If the slave device does not match the receive device, then XDP_REDIRECT is transparently used to perform the redirection in order to have the network driver release the packet from its RX ring. The bonding driver hashing functions have been refactored to allow reuse with xdp_buff's to avoid code duplication. The motivation for this change is to enable use of bonding (and 802.3ad) in hairpinning L4 load-balancers such as [1] implemented with XDP and also to transparently support bond devices for projects that use XDP given most modern NICs have dual port adapters. An alternative to this approach would be to implement 802.3ad in user-space and implement the bonding load-balancing in the XDP program itself, but is rather a cumbersome endeavor in terms of slave device management (e.g. by watching netlink) and requires separate programs for native vs bond cases for the orchestrator. A native in-kernel implementation overcomes these issues and provides more flexibility. Below are benchmark results done on two machines with 100Gbit Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and 16-core 3950X on receiving machine. 64 byte packets were sent with pktgen-dpdk at full rate. Two issues [2, 3] were identified with the ice driver, so the tests were performed with iommu=off and patch [2] applied. Additionally the bonding round robin algorithm was modified to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate of cache misses were caused by the shared rr_tx_counter (see patch 2/3). The statistics were collected using "sar -n dev -u 1 10". -----------------------| CPU |--| rxpck/s |--| txpck/s |---- without patch (1 dev): XDP_DROP: 3.15% 48.6Mpps XDP_TX: 3.12% 18.3Mpps 18.3Mpps XDP_DROP (RSS): 9.47% 116.5Mpps XDP_TX (RSS): 9.67% 25.3Mpps 24.2Mpps ----------------------- with patch, bond (1 dev): XDP_DROP: 3.14% 46.7Mpps XDP_TX: 3.15% 13.9Mpps 13.9Mpps XDP_DROP (RSS): 10.33% 117.2Mpps XDP_TX (RSS): 10.64% 25.1Mpps 24.0Mpps ----------------------- with patch, bond (2 devs): XDP_DROP: 6.27% 92.7Mpps XDP_TX: 6.26% 17.6Mpps 17.5Mpps XDP_DROP (RSS): 11.38% 117.2Mpps XDP_TX (RSS): 14.30% 28.7Mpps 27.4Mpps -------------------------------------------------------------- RSS: Receive Side Scaling, e.g. the packets were sent to a range of destination IPs. [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/ Signed-off-by: Jussi Maki Reported-by: kernel test robot --- drivers/net/bonding/bond_main.c | 303 ++++++++++++++++++++++++++++++++ include/net/bonding.h | 1 + 2 files changed, 304 insertions(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 78284c451668..07ac40c3dac1 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -317,6 +317,19 @@ bool bond_sk_check(struct bonding *bond) } } +static bool bond_xdp_check(struct bonding *bond) +{ + switch (BOND_MODE(bond)) { + case BOND_MODE_ROUNDROBIN: + case BOND_MODE_ACTIVEBACKUP: + case BOND_MODE_8023AD: + case BOND_MODE_XOR: + return true; + default: + return false; + } +} + /*---------------------------------- VLAN -----------------------------------*/ /* In the following 2 functions, bond_vlan_rx_add_vid and bond_vlan_rx_kill_vid, @@ -2010,6 +2023,39 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, bond_update_slave_arr(bond, NULL); + if (!slave_dev->netdev_ops->ndo_bpf || + !slave_dev->netdev_ops->ndo_xdp_xmit) { + if (bond->xdp_prog) { + NL_SET_ERR_MSG(extack, "Slave does not support XDP"); + slave_err(bond_dev, slave_dev, "Slave does not support XDP\n"); + res = -EOPNOTSUPP; + goto err_sysfs_del; + } + } else { + struct netdev_bpf xdp = { + .command = XDP_SETUP_PROG, + .flags = 0, + .prog = bond->xdp_prog, + .extack = extack, + }; + + if (dev_xdp_prog_count(slave_dev) > 0) { + NL_SET_ERR_MSG(extack, "Slave has XDP program loaded, please unload before enslaving"); + slave_err(bond_dev, slave_dev, "Slave has XDP program loaded, please unload before enslaving\n"); + res = -EOPNOTSUPP; + goto err_sysfs_del; + } + + res = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp); + if (res < 0) { + /* ndo_bpf() sets extack error message */ + slave_dbg(bond_dev, slave_dev, "Error %d calling ndo_bpf\n", res); + goto err_sysfs_del; + } + if (bond->xdp_prog) + bpf_prog_inc(bond->xdp_prog); + } + slave_info(bond_dev, slave_dev, "Enslaving as %s interface with %s link\n", bond_is_active_slave(new_slave) ? "an active" : "a backup", new_slave->link != BOND_LINK_DOWN ? "an up" : "a down"); @@ -2129,6 +2175,17 @@ static int __bond_release_one(struct net_device *bond_dev, /* recompute stats just before removing the slave */ bond_get_stats(bond->dev, &bond->bond_stats); + if (bond->xdp_prog) { + struct netdev_bpf xdp = { + .command = XDP_SETUP_PROG, + .flags = 0, + .prog = NULL, + .extack = NULL, + }; + if (slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp)) + slave_warn(bond_dev, slave_dev, "failed to unload XDP program\n"); + } + bond_upper_dev_unlink(bond, slave); /* unregister rx_handler early so bond_handle_frame wouldn't be called * for this slave anymore. @@ -3680,6 +3737,26 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) skb_headlen(skb)); } +/** + * bond_xmit_hash_xdp - generate a hash value based on the xmit policy + * @bond: bonding device + * @xdp: buffer to use for headers + * + * The XDP variant of bond_xmit_hash. + */ +static u32 bond_xmit_hash_xdp(struct bonding *bond, struct xdp_buff *xdp) +{ + struct ethhdr *eth; + + if (xdp->data + sizeof(struct ethhdr) > xdp->data_end) + return 0; + + eth = (struct ethhdr *)xdp->data; + + return __bond_xmit_hash(bond, NULL, xdp->data, eth->h_proto, 0, + sizeof(struct ethhdr), xdp->data_end - xdp->data); +} + /*-------------------------- Device entry points ----------------------------*/ void bond_work_init_all(struct bonding *bond) @@ -4296,6 +4373,47 @@ static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond, return NULL; } +static struct slave *bond_xdp_xmit_roundrobin_slave_get(struct bonding *bond, + struct xdp_buff *xdp) +{ + struct slave *slave; + int slave_cnt; + u32 slave_id; + const struct ethhdr *eth; + void *data = xdp->data; + + if (data + sizeof(struct ethhdr) > xdp->data_end) + goto non_igmp; + + eth = (struct ethhdr *)data; + data += sizeof(struct ethhdr); + + /* See comment on IGMP in bond_xmit_roundrobin_slave_get() */ + if (eth->h_proto == htons(ETH_P_IP)) { + const struct iphdr *iph; + + if (data + sizeof(struct iphdr) > xdp->data_end) + goto non_igmp; + + iph = (struct iphdr *)data; + + if (iph->protocol == IPPROTO_IGMP) { + slave = rcu_dereference(bond->curr_active_slave); + if (slave) + return slave; + return bond_get_slave_by_id(bond, 0); + } + } + +non_igmp: + slave_cnt = READ_ONCE(bond->slave_cnt); + if (likely(slave_cnt)) { + slave_id = bond_rr_gen_slave_id(bond) % slave_cnt; + return bond_get_slave_by_id(bond, slave_id); + } + return NULL; +} + static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev) { @@ -4511,6 +4629,22 @@ static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond, return slave; } +static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond, + struct xdp_buff *xdp) +{ + struct bond_up_slave *slaves; + unsigned int count; + u32 hash; + + hash = bond_xmit_hash_xdp(bond, xdp); + slaves = bond->usable_slaves; + count = slaves ? READ_ONCE(slaves->count) : 0; + if (unlikely(!count)) + return NULL; + + return slaves->arr[hash % count]; +} + /* Use this Xmit function for 3AD as well as XOR modes. The current * usable slave array is formed in the control path. The xmit function * just calculates hash and sends the packet out. @@ -4795,6 +4929,172 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev) return ret; } +static struct net_device * +bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp) +{ + struct bonding *bond = netdev_priv(bond_dev); + struct slave *slave; + + /* Caller needs to hold rcu_read_lock() */ + + switch (BOND_MODE(bond)) { + case BOND_MODE_ROUNDROBIN: + slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp); + break; + + case BOND_MODE_ACTIVEBACKUP: + slave = bond_xmit_activebackup_slave_get(bond); + break; + + case BOND_MODE_8023AD: + case BOND_MODE_XOR: + slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp); + break; + + default: + /* Should never happen. Mode guarded by bond_xdp_check() */ + netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond)); + WARN_ON_ONCE(1); + return NULL; + } + + if (slave) + return slave->dev; + + return NULL; +} + +static int bond_xdp_xmit(struct net_device *bond_dev, + int n, struct xdp_frame **frames, u32 flags) +{ + int nxmit, err = -ENXIO; + + rcu_read_lock(); + + for (nxmit = 0; nxmit < n; nxmit++) { + struct xdp_frame *frame = frames[nxmit]; + struct xdp_frame *frames1[] = {frame}; + struct net_device *slave_dev; + struct xdp_buff xdp; + + xdp_convert_frame_to_buff(frame, &xdp); + + slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp); + if (!slave_dev) { + err = -ENXIO; + break; + } + + err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags); + if (err < 1) + break; + } + + rcu_read_unlock(); + + /* If error happened on the first frame then we can pass the error up, otherwise + * report the number of frames that were xmitted. + */ + if (err < 0) + return (nxmit == 0 ? err : nxmit); + + return nxmit; +} + +static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog, + struct netlink_ext_ack *extack) +{ + struct bonding *bond = netdev_priv(dev); + struct list_head *iter; + struct slave *slave, *rollback_slave; + struct bpf_prog *old_prog; + struct netdev_bpf xdp = { + .command = XDP_SETUP_PROG, + .flags = 0, + .prog = prog, + .extack = extack, + }; + int err; + + ASSERT_RTNL(); + + if (!bond_xdp_check(bond)) + return -EOPNOTSUPP; + + old_prog = bond->xdp_prog; + bond->xdp_prog = prog; + + bond_for_each_slave(bond, slave, iter) { + struct net_device *slave_dev = slave->dev; + + if (!slave_dev->netdev_ops->ndo_bpf || + !slave_dev->netdev_ops->ndo_xdp_xmit) { + NL_SET_ERR_MSG(extack, "Slave device does not support XDP"); + slave_err(dev, slave_dev, "Slave does not support XDP\n"); + err = -EOPNOTSUPP; + goto err; + } + + if (dev_xdp_prog_count(slave_dev) > 0) { + NL_SET_ERR_MSG(extack, "Slave has XDP program loaded, please unload before enslaving"); + slave_err(dev, slave_dev, "Slave has XDP program loaded, please unload before enslaving\n"); + err = -EOPNOTSUPP; + goto err; + } + + err = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp); + if (err < 0) { + /* ndo_bpf() sets extack error message */ + slave_err(dev, slave_dev, "Error %d calling ndo_bpf\n", err); + goto err; + } + if (prog) + bpf_prog_inc(prog); + } + + if (old_prog) + bpf_prog_put(old_prog); + + if (prog) + static_branch_inc(&bpf_master_redirect_enabled_key); + else + static_branch_dec(&bpf_master_redirect_enabled_key); + + return 0; + +err: + /* unwind the program changes */ + bond->xdp_prog = old_prog; + xdp.prog = old_prog; + xdp.extack = NULL; /* do not overwrite original error */ + + bond_for_each_slave(bond, rollback_slave, iter) { + struct net_device *slave_dev = rollback_slave->dev; + int err_unwind; + + if (slave == rollback_slave) + break; + + err_unwind = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp); + if (err_unwind < 0) + slave_err(dev, slave_dev, + "Error %d when unwinding XDP program change\n", err_unwind); + else if (xdp.prog) + bpf_prog_inc(xdp.prog); + } + return err; +} + +static int bond_xdp(struct net_device *dev, struct netdev_bpf *xdp) +{ + switch (xdp->command) { + case XDP_SETUP_PROG: + return bond_xdp_set(dev, xdp->prog, xdp->extack); + default: + return -EINVAL; + } +} + static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed) { if (speed == 0 || speed == SPEED_UNKNOWN) @@ -4881,6 +5181,9 @@ static const struct net_device_ops bond_netdev_ops = { .ndo_features_check = passthru_features_check, .ndo_get_xmit_slave = bond_xmit_get_slave, .ndo_sk_get_lower_dev = bond_sk_get_lower_dev, + .ndo_bpf = bond_xdp, + .ndo_xdp_xmit = bond_xdp_xmit, + .ndo_xdp_get_xmit_slave = bond_xdp_get_xmit_slave, }; static const struct device_type bond_type = { diff --git a/include/net/bonding.h b/include/net/bonding.h index 15335732e166..8de8180f1be8 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -251,6 +251,7 @@ struct bonding { #ifdef CONFIG_XFRM_OFFLOAD struct xfrm_state *xs; #endif /* CONFIG_XFRM_OFFLOAD */ + struct bpf_prog *xdp_prog; }; #define bond_slave_get_rcu(dev) \ From patchwork Wed Jul 7 11:25:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 12362569 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01E4CC11F67 for ; Wed, 7 Jul 2021 13:14:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DCCAD61C9A for ; Wed, 7 Jul 2021 13:14:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231744AbhGGNQi (ORCPT ); Wed, 7 Jul 2021 09:16:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231732AbhGGNQd (ORCPT ); Wed, 7 Jul 2021 09:16:33 -0400 Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E10BC061574; Wed, 7 Jul 2021 06:13:52 -0700 (PDT) Received: by mail-lf1-x12c.google.com with SMTP id v14so4013508lfb.4; Wed, 07 Jul 2021 06:13:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=szhLgRNn2mxwQb9lPucnAU8zBhE18PLmOepuhR1I3UQ=; b=Jr8z3VOin9cYQ0bDcc8uijLJCLeqKPQNOmbcl3QbpNCwm3elRETn/Eb+jVlnfT6LUw AH06lMIeHQkk9Z9OmEV4ZQCAifqP7odXLnpZcHiH0pB4bgG214kcy/4ygDLWgDPHS8g6 JyjClDKJsFw5F0WsUI+HJMT5UBSPtEgKt5Tu4oJ2USaWlyoejVzZn3+S4btkX+c4tTPk mY7e2s8DVhBl0TYbQMQkmlmhlOpqFwWWEaJ9bs3wnNL1UtgCwj1BPr4IFkd4N2mZbHZ5 zkEz0SgsejL5RmKc2HUS7vJZFzPDwmLXIBknX47RmeW6LflMiq3TnuE0/6mXA6Rdsfq0 Sp/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=szhLgRNn2mxwQb9lPucnAU8zBhE18PLmOepuhR1I3UQ=; b=H1ymihZ18BGUOSdegcjM0V8zx+uVQJ7sl3UBfFT/P6uTgt1EuJ0faouGOnJH0/BhG/ yL+AJifEN814iO3wt81MIq+XnkHU7YJ5E8ytn7v0/N5NXdL15zhbVQpMynLYNjgin9DN AyS8EF5j5IPrLieEcxDSrrfEORVLerJxsMx3+FtN6+xU4iaj4+GLWRjAoJlRyaJhYgtk 58FDvg9ADzDc3rz64rU5xcg/jXL1jGhaBzcyUABi06iqvjAJaue3+QDunp31LSMJT66r 7+s+pKO45tO8U1K3W4O32F0bQ5IOcM15iIyppoBw61XJXYPjDBOsM07up+vJwxkfemKm PPBA== X-Gm-Message-State: AOAM531aPUfU1IRxk8tsno9dat3i90vCFyXtEmEuEVH/6r5fy01CaKbH kP6gVoKcgnYs9JqQZwfuidu9m2dvsmE73RtPlA== X-Google-Smtp-Source: ABdhPJwDyMsBNV0tytFZIq/rZKQ6lqEE88NGMNS512fSlVvvrbdg+rsy9cMGaL2ct0UrmwKcQoTOtw== X-Received: by 2002:a19:c508:: with SMTP id w8mr18719802lfe.446.1625663630124; Wed, 07 Jul 2021 06:13:50 -0700 (PDT) Received: from localhost.localdomain ([89.42.43.188]) by smtp.gmail.com with ESMTPSA id u9sm1423571lfm.127.2021.07.07.06.13.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jul 2021 06:13:49 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, maciej.fijalkowski@intel.com, magnus.karlsson@intel.com, Jussi Maki Subject: [PATCH bpf-next v3 4/5] devmap: Exclude XDP broadcast to master device Date: Wed, 7 Jul 2021 11:25:50 +0000 Message-Id: <20210707112551.9782-5-joamaki@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210707112551.9782-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> <20210707112551.9782-1-joamaki@gmail.com> Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net If the ingress device is bond slave, do not broadcast back through it or the bond master. Signed-off-by: Jussi Maki --- kernel/bpf/devmap.c | 67 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 58 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 2546dafd6672..c1a2dfb88724 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -513,10 +513,9 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, return __xdp_enqueue(dev, xdp, dev_rx, dst->xdp_prog); } -static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp, - int exclude_ifindex) +static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp) { - if (!obj || obj->dev->ifindex == exclude_ifindex || + if (!obj || !obj->dev->netdev_ops->ndo_xdp_xmit) return false; @@ -541,17 +540,48 @@ static int dev_map_enqueue_clone(struct bpf_dtab_netdev *obj, return 0; } +static inline bool is_ifindex_excluded(int *excluded, int num_excluded, int ifindex) +{ + while (num_excluded--) { + if (ifindex == excluded[num_excluded]) + return true; + } + return false; +} + +/* Get ifindex of each upper device. 'indexes' must be able to hold at + * least MAX_NEST_DEV elements. + * Returns the number of ifindexes added. + */ +static int get_upper_ifindexes(struct net_device *dev, int *indexes) +{ + struct net_device *upper; + struct list_head *iter; + int n = 0; + + netdev_for_each_upper_dev_rcu(dev, upper, iter) { + indexes[n++] = upper->ifindex; + } + return n; +} + int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, struct bpf_map *map, bool exclude_ingress) { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); - int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0; struct bpf_dtab_netdev *dst, *last_dst = NULL; + int excluded_devices[1+MAX_NEST_DEV]; struct hlist_head *head; struct xdp_frame *xdpf; + int num_excluded = 0; unsigned int i; int err; + if (exclude_ingress) { + num_excluded = get_upper_ifindexes(dev_rx, excluded_devices); + excluded_devices[num_excluded++] = dev_rx->ifindex; + } + xdpf = xdp_convert_buff_to_frame(xdp); if (unlikely(!xdpf)) return -EOVERFLOW; @@ -559,7 +589,10 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, if (map->map_type == BPF_MAP_TYPE_DEVMAP) { for (i = 0; i < map->max_entries; i++) { dst = READ_ONCE(dtab->netdev_map[i]); - if (!is_valid_dst(dst, xdp, exclude_ifindex)) + if (!is_valid_dst(dst, xdp)) + continue; + + if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex)) continue; /* we only need n-1 clones; last_dst enqueued below */ @@ -579,7 +612,10 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx, head = dev_map_index_hash(dtab, i); hlist_for_each_entry_rcu(dst, head, index_hlist, lockdep_is_held(&dtab->index_lock)) { - if (!is_valid_dst(dst, xdp, exclude_ifindex)) + if (!is_valid_dst(dst, xdp)) + continue; + + if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex)) continue; /* we only need n-1 clones; last_dst enqueued below */ @@ -645,17 +681,26 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, bool exclude_ingress) { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); - int exclude_ifindex = exclude_ingress ? dev->ifindex : 0; struct bpf_dtab_netdev *dst, *last_dst = NULL; + int excluded_devices[1+MAX_NEST_DEV]; struct hlist_head *head; struct hlist_node *next; + int num_excluded = 0; unsigned int i; int err; + if (exclude_ingress) { + num_excluded = get_upper_ifindexes(dev, excluded_devices); + excluded_devices[num_excluded++] = dev->ifindex; + } + if (map->map_type == BPF_MAP_TYPE_DEVMAP) { for (i = 0; i < map->max_entries; i++) { dst = READ_ONCE(dtab->netdev_map[i]); - if (!dst || dst->dev->ifindex == exclude_ifindex) + if (!dst) + continue; + + if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex)) continue; /* we only need n-1 clones; last_dst enqueued below */ @@ -669,12 +714,16 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, return err; last_dst = dst; + } } else { /* BPF_MAP_TYPE_DEVMAP_HASH */ for (i = 0; i < dtab->n_buckets; i++) { head = dev_map_index_hash(dtab, i); hlist_for_each_entry_safe(dst, next, head, index_hlist) { - if (!dst || dst->dev->ifindex == exclude_ifindex) + if (!dst) + continue; + + if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex)) continue; /* we only need n-1 clones; last_dst enqueued below */ From patchwork Wed Jul 7 11:25:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jussi Maki X-Patchwork-Id: 12362571 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16376C11F69 for ; Wed, 7 Jul 2021 13:14:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0B20861CC0 for ; Wed, 7 Jul 2021 13:14:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231752AbhGGNQi (ORCPT ); Wed, 7 Jul 2021 09:16:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231733AbhGGNQe (ORCPT ); Wed, 7 Jul 2021 09:16:34 -0400 Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 449AEC06175F; Wed, 7 Jul 2021 06:13:53 -0700 (PDT) Received: by mail-lj1-x234.google.com with SMTP id u25so2602994ljj.11; Wed, 07 Jul 2021 06:13:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=h25gHlOngZgr86BE+CbM0uJNam8WoUfeVATamHxArOw=; b=RXO8ymGzxfsEu2ttOuCke4ohRTFXlQktxVetUEhdx57MFwf10ngGBTlPxdiFhb0UPA d85N0+HjCqAtlB0gyYnX9x3LgBuuCgBQ5eUmHpQccGDD/mtjP7PJylSCjRKwloD4qmkq dXccUP8RutEDTXrcbFGnVYugRklUFengVt4OycsGn7ajbWLD6i4CGd5IVAylnrJBvyU5 hBScKyc/y1syBXpAZzEB7L1g25kcpcesreYJ9eZwLu+rZd7XUnR5KgE7mLVSX6QZsvLN mQXAMekNo5pezEPncekDFxQ4VFNJZXg5ny+r2EWWZbb4vpVQM6o13ai69fP6eoLHaman zKMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=h25gHlOngZgr86BE+CbM0uJNam8WoUfeVATamHxArOw=; b=BeiCrMYK2fvhmi6G8UGzlqJHTHN/CrYxBLPL0JQD46T/0KUno54P1Faqj4Cz6sQd3/ Pzvd4+F1aS0kWRnrNaKUJUqeYYOB/bKsI1SJ1dVO+sHFVlTN0cCxIcQbKgJ5g561vGaG 4hmcKkYaGtGxsctgaCSWvGHeks+tbv8fWeo5edqQyWd0Igsbk9ERfABpBUoNh8DOQJ79 Nx5PAmSDGvF9CvBBIr0GaWiT1v30dcu5O/KVGBit4K19OnL8+l1t4K1Kht5h27xGrkb1 PbR/epJ1IGjqwvxiyIDf4YuTOKEHd64b0vBEbumXf5qPiDtH6qQaGnIERmRDHHDms2iH ED/w== X-Gm-Message-State: AOAM531aDomzKjVjNbuoslbm5V5VfHamUHRxgfsWlTR/L/t+MPPlvfgh zzOtxq2EN8zqv8bMJQw48DdX35rQCXbNHHCE1Q== X-Google-Smtp-Source: ABdhPJzC/+0kPktq84jrmOss4dX9lnay2HQ5PHuUtkFZew4yP9+HtEs4QL/0WJNnnChbVBT46ecUeA== X-Received: by 2002:a2e:b804:: with SMTP id u4mr2728465ljo.312.1625663631378; Wed, 07 Jul 2021 06:13:51 -0700 (PDT) Received: from localhost.localdomain ([89.42.43.188]) by smtp.gmail.com with ESMTPSA id u9sm1423571lfm.127.2021.07.07.06.13.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jul 2021 06:13:50 -0700 (PDT) From: Jussi Maki To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, daniel@iogearbox.net, j.vosburgh@gmail.com, andy@greyhouse.net, vfalico@gmail.com, andrii@kernel.org, maciej.fijalkowski@intel.com, magnus.karlsson@intel.com, Jussi Maki , toke@redhat.com Subject: [PATCH bpf-next v3 5/5] net: core: Allow netdev_lower_get_next_private_rcu in bh context Date: Wed, 7 Jul 2021 11:25:51 +0000 Message-Id: <20210707112551.9782-6-joamaki@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210707112551.9782-1-joamaki@gmail.com> References: <20210609135537.1460244-1-joamaki@gmail.com> <20210707112551.9782-1-joamaki@gmail.com> Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net For the XDP bonding slave lookup to work in the NAPI poll context in which the redudant rcu_read_lock() has been removed we have to follow the same approach as in [1] and modify the WARN_ON to also check rcu_read_lock_bh_held(). [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=694cea395fded425008e93cd90cfdf7a451674af Signed-off-by: Jussi Maki --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 05aac85b2bbc..27f95aeddc59 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7569,7 +7569,7 @@ void *netdev_lower_get_next_private_rcu(struct net_device *dev, { struct netdev_adjacent *lower; - WARN_ON_ONCE(!rcu_read_lock_held()); + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()); lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);