[bpf-next,v3,00/16] bpfilter

Message ID	20221224000402.476079-1-qde@naccy.de (mailing list archive)
Headers	show Return-Path: <linux-kselftest-owner@vger.kernel.org> From: Quentin Deslandes <qde@naccy.de> To: <qde@naccy.de> CC: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Mykola Lysenko <mykolal@fb.com>, Shuah Khan <shuah@kernel.org>, Dmitrii Banshchikov <me@ubique.spb.ru>, <linux-kernel@vger.kernel.org>, <bpf@vger.kernel.org>, <linux-kselftest@vger.kernel.org>, <netdev@vger.kernel.org>, Kernel Team <kernel-team@meta.com> Subject: [PATCH bpf-next v3 00/16] bpfilter Date: Sat, 24 Dec 2022 01:03:46 +0100 Message-ID: <20221224000402.476079-1-qde@naccy.de> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII Precedence: bulk
Series	bpfilter \| expand [bpf-next,v3,00/16] bpfilter [bpf-next,v3,01/16] bpfilter: add types for usermode helper [bpf-next,v3,02/16] tools: add bpfilter usermode helper header [bpf-next,v3,03/16] bpfilter: add logging facility [bpf-next,v3,04/16] bpfilter: add map container [bpf-next,v3,05/16] bpfilter: add runtime context [bpf-next,v3,06/16] bpfilter: add BPF bytecode generation infrastructure [bpf-next,v3,07/16] bpfilter: add support for TC bytecode generation [bpf-next,v3,08/16] bpfilter: add match structure [bpf-next,v3,09/16] bpfilter: add support for src/dst addr and ports [bpf-next,v3,10/16] bpfilter: add target structure [bpf-next,v3,11/16] bpfilter: add rule structure [bpf-next,v3,12/16] bpfilter: add table structure [bpf-next,v3,13/16] bpfilter: add table code generation [bpf-next,v3,14/16] bpfilter: add setsockopt() support [bpf-next,v3,15/16] bpfilter: add filter table [bpf-next,v3,16/16] bpfilter: handle setsockopt() calls

Quentin Deslandes Dec. 24, 2022, 12:03 a.m. UTC

The patchset is based on the patches from David S. Miller [1],
Daniel Borkmann [2], and Dmitrii Banshchikov [3].

Note: I've partially sent this patchset earlier due to a
mistake on my side, sorry for then noise.

The main goal of the patchset is to prepare bpfilter for
iptables' configuration blob parsing and code generation.

The patchset introduces data structures and code for matches,
targets, rules and tables. Beside that the code generation
is introduced.

The first version of the code generation supports only "inline"
mode - all chains and their rules emit instructions in linear
approach.

Things that are not implemented yet:
  1) The process of switching from the previous BPF programs to the
     new set isn't atomic.
  2) No support of device ifindex - it's hardcoded
  3) No helper subprog for counters update

Another problem is using iptables' blobs for tests and filter
table initialization. While it saves lines something more
maintainable should be done here.

The plan for the next iteration:
  1) Add a helper program for counters update
  2) Handle ifindex

Patches 1/2 adds definitions of the used types.
Patch 3 adds logging to bpfilter.
Patch 4 adds an associative map.
Patch 5 add runtime context structure.
Patches 6/7 add code generation infrastructure and TC code generator.
Patches 8/9/10/11/12 add code for matches, targets, rules and table.
Patch 13 adds code generation for table.
Patch 14 handles hooked setsockopt(2) calls.
Patch 15 adds filter table
Patch 16 uses prepared code in main().

Due to poor hardware availability on my side, I've not been able to
benchmark those changes. I plan to get some numbers for the next iteration.

FORWARD filter chain is now supported, however, it's attached to
TC INGRESS along with INPUT filter chain. This is due to XDP not supporting
multiple programs to be attached. I could generate a single program
out of both INPUT and FORWARD chains, but that would prevent another
BPF program to be attached to the interface anyway. If a solution
exists to attach both those programs to XDP while allowing for other
programs to be attached, it requires more investigation. In the meantime,
INPUT and FORWARD filtering is supported using TC.

Most of the code in this series was written by Dmitrii Banshchikov,
my changes are limited to v3. I've tried to reflect this fact in the
commits by adding 'Co-developed-by:' and 'Signed-off-by:' for Dmitrii,
please tell me this was done the wrong way.

v2 -> v3
Chains:
  * Add support for FORWARD filter chain.
  * Add generation of BPF bytecode to assess whether a packet should be
    forwarded or not, using bpf_fib_lookup().
  * Allow for multiple programs to be attached to TC.
  * Allow for multiple TC hooks to be used.
Code generation:
  * Remove duplicated BPF bytecode generation.
  * Fix a bug regarding jump offset during generation.
  * Remove support for XDP from the series, as it's not currently
    used.
Table:
  * Add new filter_table_update_counters() virtual call. It updates
    the table's counter stored in the ipt_entry structure. This way,
    when iptables tries to fetch the values of the counters, bpfilter only
    has to copy the ipt_entry cached in the table structure.
Logging:
  * Refactor logging primitives.
Sockopts:
  * Add support for userspace counters querying.
Rule:
  * Store the rule's index inside struct rule, to each counters'
    map usage.

v1 -> v2
Maps:
  * Use map_upsert instead of separate map_insert and map_update
Matches:
  * Add a new virtual call - gen_inline. The call is used for
  * inline generating of a rule's match.
Targets:
  * Add a new virtual call - gen_inline. The call is used for inline
    generating of a rule's target.
Rules:
  * Add code generation for rules
Table:
  * Add struct table_ops
  * Add map for table_ops
  * Add filter table
  * Reorganize the way filter table is initialized
Sockopts:
  * Install/uninstall BPF programs while handling
    IPT_SO_SET_REPLACE
Code generation:
  * Add first version of the code generation
Dependencies:
  * Add libbpf

v0 -> v1
IO:
  * Use ssize_t in pvm_read, pvm_write for total_bytes
  * Move IO functions into sockopt.c and main.c
Logging:
  * Use LOGLEVEL_EMERG, LOGLEVEL_NOTICE, LOGLEVE_DEBUG
    while logging to /dev/kmsg
  * Prepend log message with <n> where n is log level
  * Conditionally enable BFLOG_DEBUG messages
  * Merge bflog.{h,c} into context.h
Matches:
  * Reorder fields in struct match_ops for tight packing
  * Get rid of struct match_ops_map
  * Rename udp_match_ops to xt_udp
  * Use XT_ALIGN macro
  * Store payload size in match size
  * Move udp match routines into a separate file
Targets:
  * Reorder fields in struct target_ops for tight packing
  * Get rid of struct target_ops_map
  * Add comments for convert_verdict function
Rules:
  * Add validation
Tables:
  * Combine table_map and table_list into table_index
  * Add validation
Sockopts:
  * Handle IPT_SO_GET_REVISION_TARGET

1. https://lore.kernel.org/patchwork/patch/902785/
2. https://lore.kernel.org/patchwork/patch/902783/
3. https://kernel.ubuntu.com/~cking/stress-ng/stress-ng.pdf

Quentin Deslandes (16):
  bpfilter: add types for usermode helper
  tools: add bpfilter usermode helper header
  bpfilter: add logging facility
  bpfilter: add map container
  bpfilter: add runtime context
  bpfilter: add BPF bytecode generation infrastructure
  bpfilter: add support for TC bytecode generation
  bpfilter: add match structure
  bpfilter: add support for src/dst addr and ports
  bpfilter: add target structure
  bpfilter: add rule structure
  bpfilter: add table structure
  bpfilter: add table code generation
  bpfilter: add setsockopt() support
  bpfilter: add filter table
  bpfilter: handle setsockopt() calls

 include/uapi/linux/bpfilter.h                 |  154 +++
 net/bpfilter/Makefile                         |   16 +-
 net/bpfilter/codegen.c                        | 1040 +++++++++++++++++
 net/bpfilter/codegen.h                        |  183 +++
 net/bpfilter/context.c                        |  168 +++
 net/bpfilter/context.h                        |   24 +
 net/bpfilter/filter-table.c                   |  344 ++++++
 net/bpfilter/filter-table.h                   |   18 +
 net/bpfilter/logger.c                         |   52 +
 net/bpfilter/logger.h                         |   80 ++
 net/bpfilter/main.c                           |  132 ++-
 net/bpfilter/map-common.c                     |   51 +
 net/bpfilter/map-common.h                     |   19 +
 net/bpfilter/match.c                          |   55 +
 net/bpfilter/match.h                          |   37 +
 net/bpfilter/rule.c                           |  286 +++++
 net/bpfilter/rule.h                           |   37 +
 net/bpfilter/sockopt.c                        |  533 +++++++++
 net/bpfilter/sockopt.h                        |   15 +
 net/bpfilter/table.c                          |  391 +++++++
 net/bpfilter/table.h                          |   59 +
 net/bpfilter/target.c                         |  203 ++++
 net/bpfilter/target.h                         |   57 +
 net/bpfilter/xt_udp.c                         |  111 ++
 tools/include/uapi/linux/bpfilter.h           |  175 +++
 .../testing/selftests/bpf/bpfilter/.gitignore |    8 +
 tools/testing/selftests/bpf/bpfilter/Makefile |   57 +
 .../selftests/bpf/bpfilter/bpfilter_util.h    |   80 ++
 .../selftests/bpf/bpfilter/test_codegen.c     |  338 ++++++
 .../testing/selftests/bpf/bpfilter/test_map.c |   63 +
 .../selftests/bpf/bpfilter/test_match.c       |   69 ++
 .../selftests/bpf/bpfilter/test_rule.c        |   56 +
 .../selftests/bpf/bpfilter/test_target.c      |   83 ++
 .../selftests/bpf/bpfilter/test_xt_udp.c      |   48 +
 34 files changed, 4999 insertions(+), 43 deletions(-)
 create mode 100644 net/bpfilter/codegen.c
 create mode 100644 net/bpfilter/codegen.h
 create mode 100644 net/bpfilter/context.c
 create mode 100644 net/bpfilter/context.h
 create mode 100644 net/bpfilter/filter-table.c
 create mode 100644 net/bpfilter/filter-table.h
 create mode 100644 net/bpfilter/logger.c
 create mode 100644 net/bpfilter/logger.h
 create mode 100644 net/bpfilter/map-common.c
 create mode 100644 net/bpfilter/map-common.h
 create mode 100644 net/bpfilter/match.c
 create mode 100644 net/bpfilter/match.h
 create mode 100644 net/bpfilter/rule.c
 create mode 100644 net/bpfilter/rule.h
 create mode 100644 net/bpfilter/sockopt.c
 create mode 100644 net/bpfilter/sockopt.h
 create mode 100644 net/bpfilter/table.c
 create mode 100644 net/bpfilter/table.h
 create mode 100644 net/bpfilter/target.c
 create mode 100644 net/bpfilter/target.h
 create mode 100644 net/bpfilter/xt_udp.c
 create mode 100644 tools/include/uapi/linux/bpfilter.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/.gitignore
 create mode 100644 tools/testing/selftests/bpf/bpfilter/Makefile
 create mode 100644 tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_codegen.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_map.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_match.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_rule.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_target.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_xt_udp.c

--
2.38.1

Alexei Starovoitov Dec. 27, 2022, 6:22 p.m. UTC | #1

On Sat, Dec 24, 2022 at 01:03:46AM +0100, Quentin Deslandes wrote:
> 
> Due to poor hardware availability on my side, I've not been able to
> benchmark those changes. I plan to get some numbers for the next iteration.

Yeah. Performance numbers would be my main question :)

> FORWARD filter chain is now supported, however, it's attached to
> TC INGRESS along with INPUT filter chain. This is due to XDP not supporting
> multiple programs to be attached. I could generate a single program
> out of both INPUT and FORWARD chains, but that would prevent another
> BPF program to be attached to the interface anyway. If a solution
> exists to attach both those programs to XDP while allowing for other
> programs to be attached, it requires more investigation. In the meantime,
> INPUT and FORWARD filtering is supported using TC.

I think we can ignore XDP chaining for now assuming that Daniel's bpf_link-tc work
will be applicable to XDP as well, so we'll have a simple chaining
for XDP eventually.

As far as attaching to TC... I think it would be great to combine bpfilter
codegen and attach to Florian's bpf hooks exactly at netfilter.
See
https://git.breakpoint.cc/cgit/fw/nf-next.git/commit/?h=nf_hook_jit_bpf_29&id=0c1ec06503cb8a142d3ad9f760b72d94ea0091fa
With nf_hook_ingress() calling either into classic iptable or into bpf_prog_run_nf
which is either generated by Florian's optimizer of nf chains or into
bpfilter generated code would be ideal.

Florian Westphal Jan. 3, 2023, 11:38 a.m. UTC | #2

Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> codegen and attach to Florian's bpf hooks exactly at netfilter.
> See
> https://git.breakpoint.cc/cgit/fw/nf-next.git/commit/?h=nf_hook_jit_bpf_29&id=0c1ec06503cb8a142d3ad9f760b72d94ea0091fa

FWIW I plan to submit this patchset for 6.2.

Florian Westphal Jan. 3, 2023, 11:45 a.m. UTC | #3

Quentin Deslandes <qde@naccy.de> wrote:
> The patchset is based on the patches from David S. Miller [1],
> Daniel Borkmann [2], and Dmitrii Banshchikov [3].
> 
> Note: I've partially sent this patchset earlier due to a
> mistake on my side, sorry for then noise.
> 
> The main goal of the patchset is to prepare bpfilter for
> iptables' configuration blob parsing and code generation.
> 
> The patchset introduces data structures and code for matches,
> targets, rules and tables. Beside that the code generation
> is introduced.
> 
> The first version of the code generation supports only "inline"
> mode - all chains and their rules emit instructions in linear
> approach.
> 
> Things that are not implemented yet:
>   1) The process of switching from the previous BPF programs to the
>      new set isn't atomic.

You can't make this atomic from userspace perspective, the
get/setsockopt API of iptables uses a read-modify-write model.

Tentatively I'd try to extend libnftnl and generate bpf code there,
since its used by both iptables(-nft) and nftables we'd automatically
get support for both.

I was planning to look into "attach bpf progs to raw netfilter hooks"
in Q1 2023, once the initial nf-bpf-codegen is merged.

Quentin Deslandes Jan. 6, 2023, 2:15 p.m. UTC | #4

Le 27/12/2022 à 19:22, Alexei Starovoitov a écrit :
> On Sat, Dec 24, 2022 at 01:03:46AM +0100, Quentin Deslandes wrote:
>>
>> Due to poor hardware availability on my side, I've not been able to
>> benchmark those changes. I plan to get some numbers for the next iteration.
> 
> Yeah. Performance numbers would be my main question :)

Hardware is on the way! :)

>> FORWARD filter chain is now supported, however, it's attached to
>> TC INGRESS along with INPUT filter chain. This is due to XDP not supporting
>> multiple programs to be attached. I could generate a single program
>> out of both INPUT and FORWARD chains, but that would prevent another
>> BPF program to be attached to the interface anyway. If a solution
>> exists to attach both those programs to XDP while allowing for other
>> programs to be attached, it requires more investigation. In the meantime,
>> INPUT and FORWARD filtering is supported using TC.
> 
> I think we can ignore XDP chaining for now assuming that Daniel's bpf_link-tc work
> will be applicable to XDP as well, so we'll have a simple chaining
> for XDP eventually.
> 
> As far as attaching to TC... I think it would be great to combine bpfilter
> codegen and attach to Florian's bpf hooks exactly at netfilter.
> See
> https://git.breakpoint.cc/cgit/fw/nf-next.git/commit/?h=nf_hook_jit_bpf_29&id=0c1ec06503cb8a142d3ad9f760b72d94ea0091fa
> With nf_hook_ingress() calling either into classic iptable or into bpf_prog_run_nf
> which is either generated by Florian's optimizer of nf chains or into
> bpfilter generated code would be ideal.

That sounds interesting. If my understanding is correct, Florian's
work doesn't yet allow for userspace-generated programs to be attached,
which will be required for bpfilter.

Quentin Deslandes Jan. 6, 2023, 2:43 p.m. UTC | #5

Le 03/01/2023 à 12:45, Florian Westphal a écrit :
> Quentin Deslandes <qde@naccy.de> wrote:
>> The patchset is based on the patches from David S. Miller [1],
>> Daniel Borkmann [2], and Dmitrii Banshchikov [3].
>>
>> Note: I've partially sent this patchset earlier due to a
>> mistake on my side, sorry for then noise.
>>
>> The main goal of the patchset is to prepare bpfilter for
>> iptables' configuration blob parsing and code generation.
>>
>> The patchset introduces data structures and code for matches,
>> targets, rules and tables. Beside that the code generation
>> is introduced.
>>
>> The first version of the code generation supports only "inline"
>> mode - all chains and their rules emit instructions in linear
>> approach.
>>
>> Things that are not implemented yet:
>>    1) The process of switching from the previous BPF programs to the
>>       new set isn't atomic.
> 
> You can't make this atomic from userspace perspective, the
> get/setsockopt API of iptables uses a read-modify-write model.

This refers to updating the programs from bpfilter's side. It won't
be atomic from iptables point of view, but currently bpfilter will
remove the program associated to a table, before installing the new
one. This means packets received in between those operations are
not filtered. I assume a better solution is possible.

> Tentatively I'd try to extend libnftnl and generate bpf code there,
> since its used by both iptables(-nft) and nftables we'd automatically
> get support for both.

That's one of the option, this could also remain in the kernel
tree or in a dedicated git repository. I don't know which one would
be the best, I'm open to suggestions.

> I was planning to look into "attach bpf progs to raw netfilter hooks"
> in Q1 2023, once the initial nf-bpf-codegen is merged.

Is there any plan to support non raw hooks? That's mainly out
of curiosity, I don't even know whether that would be a good thing
or not.

Florian Westphal Jan. 12, 2023, 3:03 a.m. UTC | #6

Quentin Deslandes <qde@naccy.de> wrote:
> That sounds interesting. If my understanding is correct, Florian's
> work doesn't yet allow for userspace-generated programs to be attached,
> which will be required for bpfilter.

Yes, but I started working on the attachment side.  It doesn't depend
on the nf-bpf generator patch set.

I think I can share PoC/RFC draft next week.

Florian Westphal Jan. 12, 2023, 3:17 a.m. UTC | #7

Quentin Deslandes <qde@naccy.de> wrote:
> Le 03/01/2023 à 12:45, Florian Westphal a écrit :
> > You can't make this atomic from userspace perspective, the
> > get/setsockopt API of iptables uses a read-modify-write model.
> 
> This refers to updating the programs from bpfilter's side. It won't
> be atomic from iptables point of view, but currently bpfilter will
> remove the program associated to a table, before installing the new
> one. This means packets received in between those operations are
> not filtered. I assume a better solution is possible.

Ah, I see, thanks.

> > Tentatively I'd try to extend libnftnl and generate bpf code there,
> > since its used by both iptables(-nft) and nftables we'd automatically
> > get support for both.
> 
> That's one of the option, this could also remain in the kernel
> tree or in a dedicated git repository. I don't know which one would
> be the best, I'm open to suggestions.

I can imagine that this will see a flurry of activity in the early
phase so I think a 'semi test repo' makes sense.

Provideded license allows this, useable bits and pieces can then
be grafted on to libnftnl (or iptables or whatever).

> > I was planning to look into "attach bpf progs to raw netfilter hooks"
> > in Q1 2023, once the initial nf-bpf-codegen is merged.
> 
> Is there any plan to support non raw hooks? That's mainly out
> of curiosity, I don't even know whether that would be a good thing
> or not.

Not sure what 'non raw hook' is.  Idea was to expose

1. protcocol family
2. hook number (prerouting, input etc)
3. priority

to userspace via bpf syscall/bpf link.

userspace would then provide the above info to kernel via
bpf(... BPF_LINK_CREATE )

which would then end up doing:
--------------
h.hook = nf_hook_run_bpf; // wrapper to call BPF_PROG_RUN
h.priv = prog; // the bpf program to run
h.pf = attr->netfilter.pf;
h.priority = attr->netfilter.priority;
h.hooknum = attr->netfilter.hooknum;

nf_register_net_hook(net, &h);
--------------

After that nf_hook_slow() calls the bpf program just like any
other of the netfilter hooks.

Does that make sense or did you have something else in mind?

Quentin Deslandes Jan. 25, 2023, 10:25 a.m. UTC | #8

On Thu, Jan 12, 2023 at 04:17:28AM +0100, Florian Westphal wrote:
> Quentin Deslandes <qde@naccy.de> wrote:
> > Le 03/01/2023 à 12:45, Florian Westphal a écrit :
> > > You can't make this atomic from userspace perspective, the
> > > get/setsockopt API of iptables uses a read-modify-write model.
> > 
> > This refers to updating the programs from bpfilter's side. It won't
> > be atomic from iptables point of view, but currently bpfilter will
> > remove the program associated to a table, before installing the new
> > one. This means packets received in between those operations are
> > not filtered. I assume a better solution is possible.
> 
> Ah, I see, thanks.
> 
> > > Tentatively I'd try to extend libnftnl and generate bpf code there,
> > > since its used by both iptables(-nft) and nftables we'd automatically
> > > get support for both.
> > 
> > That's one of the option, this could also remain in the kernel
> > tree or in a dedicated git repository. I don't know which one would
> > be the best, I'm open to suggestions.
> 
> I can imagine that this will see a flurry of activity in the early
> phase so I think a 'semi test repo' makes sense.
> 
> Provideded license allows this, useable bits and pieces can then
> be grafted on to libnftnl (or iptables or whatever).
> 
> > > I was planning to look into "attach bpf progs to raw netfilter hooks"
> > > in Q1 2023, once the initial nf-bpf-codegen is merged.
> > 
> > Is there any plan to support non raw hooks? That's mainly out
> > of curiosity, I don't even know whether that would be a good thing
> > or not.
> 
> Not sure what 'non raw hook' is.  Idea was to expose
> 
> 1. protcocol family
> 2. hook number (prerouting, input etc)
> 3. priority
> 
> to userspace via bpf syscall/bpf link.
> 
> userspace would then provide the above info to kernel via
> bpf(... BPF_LINK_CREATE )
> 
> which would then end up doing:
> --------------
> h.hook = nf_hook_run_bpf; // wrapper to call BPF_PROG_RUN
> h.priv = prog; // the bpf program to run
> h.pf = attr->netfilter.pf;
> h.priority = attr->netfilter.priority;
> h.hooknum = attr->netfilter.hooknum;
> 
> nf_register_net_hook(net, &h);
> --------------
> 
> After that nf_hook_slow() calls the bpf program just like any
> other of the netfilter hooks.
> 
> Does that make sense or did you have something else in mind?

Sounds good to me. I thought you were referring to hooks available for
the RAW table (as in `iptables --table raw...`).

Thanks,
Quentin

[bpf-next,v3,00/16] bpfilter

Message

Comments