[nf-next,0/8] netfilter: make nf_flowtable lifetime differ from container struct

Message ID	20231121122800.13521-1-fw@strlen.de (mailing list archive)
Headers	show Authentication-Results: smtp.subspace.kernel.org; dkim=none From: Florian Westphal <fw@strlen.de> To: <netfilter-devel@vger.kernel.org> Cc: lorenzo@kernel.org, <netdev@vger.kernel.org>, Florian Westphal <fw@strlen.de> Subject: [PATCH nf-next 0/8] netfilter: make nf_flowtable lifetime differ from container struct Date: Tue, 21 Nov 2023 13:27:43 +0100 Message-ID: <20231121122800.13521-1-fw@strlen.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	netfilter: make nf_flowtable lifetime differ from container struct \| expand [nf-next,0/8] netfilter: make nf_flowtable lifetime differ from container struct [nf-next,1/8] netfilter: flowtable: move nf_flowtable out of container structures [nf-next,2/8] netfilter: nf_flowtable: replace init callback with a create one [nf-next,3/8] netfilter: nf_flowtable: make free a real free function [nf-next,4/8] netfilter: nf_flowtable: delay flowtable release a second time [nf-next,5/8] netfilter: nf_tables: reject flowtable hw offload for same device [nf-next,6/8] netfilter: nf_tables: add xdp offload flag [nf-next,7/8] netfilter: nf_tables: add flowtable map for xdp offload [nf-next,8/8] netfilter: nf_tables: permit duplicate flowtable mappings

Message ID

20231121122800.13521-1-fw@strlen.de (mailing list archive)

Headers

From: Florian Westphal <fw@strlen.de>
To: <netfilter-devel@vger.kernel.org>
Cc: lorenzo@kernel.org,
	<netdev@vger.kernel.org>,
	Florian Westphal <fw@strlen.de>
Subject: [PATCH nf-next 0/8] netfilter: make nf_flowtable lifetime differ from
 container struct
Date: Tue, 21 Nov 2023 13:27:43 +0100
Message-ID: <20231121122800.13521-1-fw@strlen.de>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

netfilter: make nf_flowtable lifetime differ from container struct | expand

Message

Florian Westphal Nov. 21, 2023, 12:27 p.m. UTC

This series detaches nf_flowtable from the two existing container
structures.

Allocation and freeing is moved to the flowtable core.
Then, memory release is changed so it passes through another
synchronize_rcu() call.

Next, a new nftables flowtable flag is introduced to mark a flowtable
for explicit XDP-based offload.

Such flowtables have more restrictions,
in particular, if two flowtables are tagged as 'xdp offloaded', they
cannot share any net devices.

It would be possible to avoid such new 'xdp flag', but I see no way
to do so without breaking backwards compatbility: at this time the same
net_device can be part of any number of flowtables, this is very
inefficient from an XDP point of view: it would have to perform lookups
in all associated flowtables in a loop until a match is found.

This is hardly desirable.

Last two patches expose the hash table mapping and make utility
function available for XDP.

The XDP kfunc will be added in a followup patch.

Florian Westphal (8):
  netfilter: flowtable: move nf_flowtable out of container structures
  netfilter: nf_flowtable: replace init callback with a create one
  netfilter: nf_flowtable: make free a real free function
  netfilter: nf_flowtable: delay flowtable release a second time
  netfilter: nf_tables: reject flowtable hw offload for same device
  netfilter: nf_tables: add xdp offload flag
  netfilter: nf_tables: add flowtable map for xdp offload
  netfilter: nf_tables: permit duplicate flowtable mappings

 include/net/netfilter/nf_flow_table.h    |  15 ++-
 include/net/netfilter/nf_tables.h        |  15 ++-
 include/uapi/linux/netfilter/nf_tables.h |   5 +-
 net/netfilter/nf_flow_table_core.c       |  39 ++++--
 net/netfilter/nf_flow_table_inet.c       |   6 +-
 net/netfilter/nf_flow_table_offload.c    | 157 ++++++++++++++++++++++-
 net/netfilter/nf_tables_api.c            | 113 +++++++++++-----
 net/netfilter/nft_flow_offload.c         |   4 +-
 net/sched/act_ct.c                       |  37 +++---
 9 files changed, 315 insertions(+), 76 deletions(-)

Comments

Pablo Neira Ayuso Nov. 24, 2023, 9:50 a.m. UTC | #1

Hi Florian,

Sorry for taking a long while.

On Tue, Nov 21, 2023 at 01:27:43PM +0100, Florian Westphal wrote:
> This series detaches nf_flowtable from the two existing container
> structures.
> 
> Allocation and freeing is moved to the flowtable core.
> Then, memory release is changed so it passes through another
> synchronize_rcu() call.
> 
> Next, a new nftables flowtable flag is introduced to mark a flowtable
> for explicit XDP-based offload.

If XDP uses the hardware offload infrastructure, then I don't see how
would it be possible to combine a software dataplane with hardware
offload, ie. assuming XDP for software acceleration and hardware
offload, because it takes a while for the flowtable hw offload
workqueue to set up things and meanwhile that happens, the software
path is exercised.

> Such flowtables have more restrictions, in particular, if two
> flowtables are tagged as 'xdp offloaded', they cannot share any net
> devices.
> 
> It would be possible to avoid such new 'xdp flag', but I see no way
> to do so without breaking backwards compatbility: at this time the same
> net_device can be part of any number of flowtables, this is very
> inefficient from an XDP point of view: it would have to perform lookups
> in all associated flowtables in a loop until a match is found.
> 
> This is hardly desirable.
> 
> Last two patches expose the hash table mapping and make utility
> function available for XDP.
> 
> The XDP kfunc will be added in a followup patch.

What is the plan to support for stackable device? eg. VLAN, or even
tunneling drivers such as VxLAN. I have (incomplete) patches to use
dev_fill_forward_path() to discover the path then configure the
flowtable datapath forwarding.

My understand is that XDP is all about programmibility, if user
decides to go for XDP then simply fully implement the fast path is the
XDP framework? I know of software already does so and they are
perfectly fine with this approach.

Florian Westphal Nov. 24, 2023, 9:55 a.m. UTC | #2

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Next, a new nftables flowtable flag is introduced to mark a flowtable
> > for explicit XDP-based offload.
> 
> If XDP uses the hardware offload infrastructure, then I don't see how
> would it be possible to combine a software dataplane with hardware
> offload, ie. assuming XDP for software acceleration and hardware
> offload, because it takes a while for the flowtable hw offload
> workqueue to set up things and meanwhile that happens, the software
> path is exercised.

Lorenzo adds a kfunc that gets called from the xdp program
to do a lookup in the flowtable.

This patchset prepares for the kfunc by adding a function that
returns the flowtable based on net_device pointer.

The work queue for hw offload (or ndo ops) are not used.

> > The XDP kfunc will be added in a followup patch.
> 
> What is the plan to support for stackable device? eg. VLAN, or even
> tunneling drivers such as VxLAN. I have (incomplete) patches to use
> dev_fill_forward_path() to discover the path then configure the
> flowtable datapath forwarding.

If the xdp program can't handle it packet will be pushed up the stack,
i.e. nf ingress hook will handle it next.

> My understand is that XDP is all about programmibility, if user
> decides to go for XDP then simply fully implement the fast path is the
> XDP framework? I know of software already does so and they are
> perfectly fine with this approach.

I don't understand, you mean no integration at all?

Pablo Neira Ayuso Nov. 24, 2023, 10:10 a.m. UTC | #3

On Fri, Nov 24, 2023 at 10:55:12AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > Next, a new nftables flowtable flag is introduced to mark a flowtable
> > > for explicit XDP-based offload.
> > 
> > If XDP uses the hardware offload infrastructure, then I don't see how
> > would it be possible to combine a software dataplane with hardware
> > offload, ie. assuming XDP for software acceleration and hardware
> > offload, because it takes a while for the flowtable hw offload
> > workqueue to set up things and meanwhile that happens, the software
> > path is exercised.
> 
> Lorenzo adds a kfunc that gets called from the xdp program
> to do a lookup in the flowtable.
> 
> This patchset prepares for the kfunc by adding a function that
> returns the flowtable based on net_device pointer.
> 
> The work queue for hw offload (or ndo ops) are not used.

OK, but is it possible to combine this XDP approach with hardware
offload?

> > > The XDP kfunc will be added in a followup patch.
> > 
> > What is the plan to support for stackable device? eg. VLAN, or even
> > tunneling drivers such as VxLAN. I have (incomplete) patches to use
> > dev_fill_forward_path() to discover the path then configure the
> > flowtable datapath forwarding.
> 
> If the xdp program can't handle it packet will be pushed up the stack,
> i.e. nf ingress hook will handle it next.

Then, only very simple scenarios will benefit from this acceleration.

> > My understand is that XDP is all about programmibility, if user
> > decides to go for XDP then simply fully implement the fast path is the
> > XDP framework? I know of software already does so and they are
> > perfectly fine with this approach.
> 
> I don't understand, you mean no integration at all?

I mean, fully implement a fastpath in XDP/BPF using the datastructures
that it provides.

Florian Westphal Nov. 24, 2023, 10:16 a.m. UTC | #4

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > The work queue for hw offload (or ndo ops) are not used.
> 
> OK, but is it possible to combine this XDP approach with hardware
> offload?

Yes.  We could disallow it if you prefer.

Ordering is, for ingress packet processing:
HW -> XDP -> nf flowtable -> classic forward path

instead of:

HW -> nf flowtable -> classic forward path

For the existing design.

> > If the xdp program can't handle it packet will be pushed up the stack,
> > i.e. nf ingress hook will handle it next.
> 
> Then, only very simple scenarios will benefit from this acceleration.

Yes.  I don't see a reason to worry about more complex things right now.
E.g. PPPoE encap can be added later.

Or do you think this has to be added right from the very beginning?

I hope not.

> > > My understand is that XDP is all about programmibility, if user
> > > decides to go for XDP then simply fully implement the fast path is the
> > > XDP framework? I know of software already does so and they are
> > > perfectly fine with this approach.
> > 
> > I don't understand, you mean no integration at all?
> 
> I mean, fully implement a fastpath in XDP/BPF using the datastructures
> that it provides.

I think its very bad for netfilter.

Toke Høiland-Jørgensen Nov. 24, 2023, 10:48 a.m. UTC | #5

> My understand is that XDP is all about programmibility, if user
> decides to go for XDP then simply fully implement the fast path is the
> XDP framework? I know of software already does so and they are
> perfectly fine with this approach.

Yes, you can do that. But if you're reimplementing everything anyway,
why bother with XDP at all? Might as well go with DPDK and full bypass
then.

The benefit of XDP as a data path is the integration with the kernel
infrastructure: we have robust implementations of a bunch of protocols,
a control plane API that works with a bunch of userspace utilities
(e.g., routing daemons), and lots of data battle-tested data structures
for various things (e.g., the routing table fib). With XDP, you can use
this infrastructure in a pick-and-choose manner and implement your fast
path using just the features you care about for your use case, gaining
performance while still using the kernel path for the slow path to get
full functionality.

The first example of this paradigm was the bpf_fib_lookup() helper. With
this you can accelerate the forwarding fast path and still have the
kernel stack handle neighbour lookup, etc. Adding flowtable lookup
support is a natural extension of this, adding another integration point
you can use for a more complete forwarding acceleration, while still
integrating with the rest of the stack.

This was the "making XDP a magical go faster button" thing I was talking
about at Netconf (and again at Netdevconf), BTW: we should work towards
making XDP a complete (forwarding) acceleration solution, so we can
replace all the crappy hardware "fast path" and kernel bypass
implementations in the world :)

-Toke