[PATCHv18,bpf-next,3/6] xdp: add a new helper for dev map multicast support

This patch is for xdp multicast support, which has been discussed
before[0], The goal is to be able to implement an OVS-like data plane in
XDP, i.e., a software switch that can forward XDP frames to multiple ports.

To achieve this, an application needs to specify a group of interfaces
to forward a packet to. It is also common to want to exclude one or more
physical interfaces from the forwarding operation - e.g., to forward a
packet to all interfaces in the multicast group except the interface it
arrived on. While this could be done simply by adding more groups, this
quickly leads to a combinatorial explosion in the number of groups an
application has to maintain.

To avoid the combinatorial explosion, we propose to include the ability
to specify an "exclude group" as part of the forwarding operation. This
needs to be a group (instead of just a single port index), because a
physical interface can be part of a logical grouping, such as a bond
device.

Thus, the logical forwarding operation becomes a "set difference"
operation, i.e. "forward to all ports in group A that are not also in
group B". This series implements such an operation using device maps to
represent the groups. This means that the XDP program specifies two
device maps, one containing the list of netdevs to redirect to, and the
other containing the exclude list.

To achieve this, a new helper bpf_redirect_map_multi() is implemented
to accept two maps, the forwarding map and exclude map. The forwarding
map could be DEVMAP or DEVMAP_HASH, but the exclude map *must* be
DEVMAP_HASH to get better performace. If user don't want to use exclude
map and just want simply stop redirecting back to ingress device, they
can use flag BPF_F_EXCLUDE_INGRESS.

As both bpf_xdp_redirect_map() and this new helpers are using struct
bpf_redirect_info, a new field ex_map is added and tgt_value is set to NULL
in the new helper to make a difference with bpf_xdp_redirect_map().

At last, keep the general data path in net/core/filter.c, the native data
path in kernel/bpf/devmap.c so we can use direct calls to get better
performace.

[0] https://xdp-project.net/#Handling-multicast

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

v16-v18: no update

v15:
a) Update bpf_redirect_map_multi() helper description that ex_map must be
   keyed by ifindex.
b) remove variable last_one in dev_map_enqueue_multi() as it's pointless.
c) add a comment about why we don't use READ/WRITE_ONCE() for ex_map.

v14: no update, only rebase the code

v13:
pass xdp_prog through bq_enqueue

v12:
rebase the code based on Jespoer's devmap xdp_prog patch

v11:
Fix bpf_redirect_map_multi() helper description typo.
Add loop limit for devmap_get_next_obj() and dev_map_redirect_multi().

v10:
Update helper bpf_xdp_redirect_map_multi()
- No need to check map pointer as we will do the check in verifier.

v9:
Update helper bpf_xdp_redirect_map_multi()
- Use ARG_CONST_MAP_PTR_OR_NULL for helper arg2

v8:
Update function dev_in_exclude_map():
- remove duplicate ex_map map_type check in
- lookup the element in dev map by obj dev index directly instead
  of looping all the map

v7:
a) Fix helper flag check
b) Limit the *ex_map* to use DEVMAP_HASH only and update function
   dev_in_exclude_map() to get better performance.

v6: converted helper return types from int to long

v5:
a) Check devmap_get_next_key() return value.
b) Pass through flags to __bpf_tx_xdp_map() instead of bool value.
c) In function dev_map_enqueue_multi(), consume xdpf for the last
   obj instead of the first on.
d) Update helper description and code comments to explain that we
   use NULL target value to distinguish multicast and unicast
   forwarding.
e) Update memory model, memory id and frame_sz in xdpf_clone().

v4: Fix bpf_xdp_redirect_map_multi_proto arg2_type typo

v3: Based on Toke's suggestion, do the following update
a) Update bpf_redirect_map_multi() description in bpf.h.
b) Fix exclude_ifindex checking order in dev_in_exclude_map().
c) Fix one more xdpf clone in dev_map_enqueue_multi().
d) Go find next one in dev_map_enqueue_multi() if the interface is not
   able to forward instead of abort the whole loop.
e) Remove READ_ONCE/WRITE_ONCE for ex_map.

v2: Add new syscall bpf_xdp_redirect_map_multi() which could accept
include/exclude maps directly.
---
 include/linux/bpf.h            |  20 ++++++
 include/linux/filter.h         |   1 +
 include/net/xdp.h              |   1 +
 include/uapi/linux/bpf.h       |  28 ++++++++
 kernel/bpf/devmap.c            | 128 +++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c          |   6 ++
 net/core/filter.c              | 124 ++++++++++++++++++++++++++++++--
 net/core/xdp.c                 |  29 ++++++++
 tools/include/uapi/linux/bpf.h |  28 ++++++++
 9 files changed, 360 insertions(+), 5 deletions(-)

Message ID	20210204140317.384296-4-liuhangbin@gmail.com (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76006C433E0 for <bpf@archiver.kernel.org>; Thu, 4 Feb 2021 14:13:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3434E64F51 for <bpf@archiver.kernel.org>; Thu, 4 Feb 2021 14:13:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236512AbhBDOMk (ORCPT <rfc822;bpf@archiver.kernel.org>); Thu, 4 Feb 2021 09:12:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236615AbhBDOE6 (ORCPT <rfc822;bpf@vger.kernel.org>); Thu, 4 Feb 2021 09:04:58 -0500 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7478BC061786; Thu, 4 Feb 2021 06:04:18 -0800 (PST) Received: by mail-pl1-x62a.google.com with SMTP id b17so1817088plz.6; Thu, 04 Feb 2021 06:04:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7onjWwXmUvvkxgxCpyMH/BcwCvYCyR40qkqaWvLeyjI=; b=F7LkporacaNF6AsdjSRuS7s37rriyUXMSLYcYunwo5hHPZIgK4tg3uQSUI7EMHsPm8 4dMZcBTrb3F6lwaLRoWYfC2apskSX4OaWMLaaw0fHqNaWSsi4nTYv+tc9efUl/yuQqzL zSgTGAK+p38HFQe/Zn5CbRf9BxlX9yEUG+kuE5Jqk4NXnJveC0WDlc5sUcYjK08dH955 oV04yop2Dwm31WKSJT4ML+corTDtz1QmFLyZLXofC0/wPloLGN9TShFX3IAqYxlsLQmr YWJJk9A1p8njMQgG3wpBRCQ0Qih16OO7BxJl2pEIR8Gg0qYaAo0Gbdph6aUjbbfULTqf qP4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7onjWwXmUvvkxgxCpyMH/BcwCvYCyR40qkqaWvLeyjI=; b=LtAZYJyIg20YCZfVQ0af9WGez/WMdNQHvrl2Z/e6KJetQV4g4F3mnL7fWMaP1sCJuH b6eB4AbzcWwLn3HKdHUH2buqOGH2q5UaONypXtccjrAOJFcSLUNshKcU+DNRtQYGUOaB pryERuYGvcn4drtPkxszgxA6CIkfIeUynD9Qc5YeyYR0O5rqA/NqUUVS14EdqcOgYd80 X0dEYpkRxZRWaX9k1CeyWf/DAM6Wx3aXVejVtUl1o8aO2FEeJsumUKQYPyXVt7YuHGy2 CNoJaTxmAHaQfcAVTpfxOSR59fk5Q+HfqGfMTE33wyUcNAz4VEXs1fnPitJNAwE6s06u 3MiA== X-Gm-Message-State: AOAM533S//VVLhjXP2ruiQcuVZJZUaIzDKnImFt41CZ0wZjURcpPqL7z NLxVgGr6GX9yPtOzxxKeNV1AGE6VhvbtR9A3 X-Google-Smtp-Source: ABdhPJwDNDZvWu4UzunaTIetPAdGSQ4/Ot9+qzd+TC7QpDaTpI2BtvMM0Oqwo9vdxsiDAWxD6JmS4A== X-Received: by 2002:a17:90b:8c4:: with SMTP id ds4mr7627614pjb.89.1612447457206; Thu, 04 Feb 2021 06:04:17 -0800 (PST) Received: from Leo-laptop-t470s.redhat.com ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id 21sm5889394pfh.56.2021.02.04.06.04.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Feb 2021 06:04:16 -0800 (PST) From: Hangbin Liu <liuhangbin@gmail.com> To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= <toke@redhat.com>, Jiri Benc <jbenc@redhat.com>, Jesper Dangaard Brouer <brouer@redhat.com>, Eelco Chaudron <echaudro@redhat.com>, ast@kernel.org, Daniel Borkmann <daniel@iogearbox.net>, Lorenzo Bianconi <lorenzo.bianconi@redhat.com>, David Ahern <dsahern@gmail.com>, Andrii Nakryiko <andrii.nakryiko@gmail.com>, Alexei Starovoitov <alexei.starovoitov@gmail.com>, John Fastabend <john.fastabend@gmail.com>, Maciej Fijalkowski <maciej.fijalkowski@intel.com>, Hangbin Liu <liuhangbin@gmail.com> Subject: [PATCHv18 bpf-next 3/6] xdp: add a new helper for dev map multicast support Date: Thu, 4 Feb 2021 22:03:14 +0800 Message-Id: <20210204140317.384296-4-liuhangbin@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210204140317.384296-1-liuhangbin@gmail.com> References: <20210125124516.3098129-1-liuhangbin@gmail.com> <20210204140317.384296-1-liuhangbin@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <bpf.vger.kernel.org> X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net
Series	xdp: add a new helper for dev map multicast support \| expand [PATCHv18,bpf-next,0/6] xdp: add a new helper for dev map multicast support [PATCHv18,bpf-next,1/6] bpf: run devmap xdp_prog on flush instead of bulk enqueue [PATCHv18,bpf-next,2/6] bpf: add a new bpf argument type ARG_CONST_MAP_PTR_OR_NULL [PATCHv18,bpf-next,3/6] xdp: add a new helper for dev map multicast support [PATCHv18,bpf-next,4/6] sample/bpf: add xdp_redirect_map_multicast test [PATCHv18,bpf-next,5/6] selftests/bpf: Add verifier tests for bpf arg ARG_CONST_MAP_PTR_OR_NULL [PATCHv18,bpf-next,6/6] selftests/bpf: add xdp_redirect_multi test

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	success	Link
netdev/patch_count	success	Link
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/subject_prefix	success	Link
netdev/cc_maintainers	warning	9 maintainers not CCed: quentin@isovalent.com songliubraving@fb.com kafai@fb.com andrii@kernel.org kpsingh@kernel.org davem@davemloft.net kuba@kernel.org hawk@kernel.org yhs@fb.com
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 12238 this patch: 12238
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	warning	CHECK: Blank lines aren't necessary after an open brace '{' CHECK: Comparison to NULL could be written "__dev_map_hash_lookup_elem" WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 93 exceeds 80 columns WARNING: please, no space before tabs
netdev/build_allmodconfig_warn	success	Errors and warnings before: 12886 this patch: 12886
netdev/header_inline	success	Link
netdev/stable	success	Stable not CCed

[PATCHv18,bpf-next,3/6] xdp: add a new helper for dev map multicast support

Checks

Commit Message

Patch