Message ID | 20230505-bpf-add-tbid-fib-lookup-v1-1-fd99f7162e76@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf: utilize table ID in bpf_fib_lookup helper | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch, async |
bpf/vmtest-bpf-next-PR | success | PR summary |
bpf/vmtest-bpf-next-VM_Test-1 | success | Logs for ShellCheck |
bpf/vmtest-bpf-next-VM_Test-6 | success | Logs for set-matrix |
bpf/vmtest-bpf-next-VM_Test-2 | success | Logs for build for aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-4 | success | Logs for build for x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-5 | success | Logs for build for x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-3 | success | Logs for build for s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-10 | success | Logs for test_maps on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-19 | success | Logs for test_progs_no_alu32_parallel on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-20 | success | Logs for test_progs_no_alu32_parallel on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-21 | success | Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-24 | success | Logs for test_progs_parallel on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-25 | success | Logs for test_verifier on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-27 | success | Logs for test_verifier on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-28 | success | Logs for test_verifier on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-29 | success | Logs for veristat |
bpf/vmtest-bpf-next-VM_Test-7 | success | Logs for test_maps on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-9 | success | Logs for test_maps on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-11 | success | Logs for test_progs on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-13 | success | Logs for test_progs on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-14 | success | Logs for test_progs on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-15 | success | Logs for test_progs_no_alu32 on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-17 | success | Logs for test_progs_no_alu32 on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-18 | success | Logs for test_progs_no_alu32 on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-22 | success | Logs for test_progs_parallel on aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-23 | success | Logs for test_progs_parallel on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-26 | success | Logs for test_verifier on s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-16 | success | Logs for test_progs_no_alu32 on s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-12 | success | Logs for test_progs on s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-8 | success | Logs for test_maps on s390x with gcc |
On 5/25/23 7:27 AM, Louis DeLosSantos wrote: > Add ability to specify routing table ID to the `bpf_fib_lookup` BPF > helper. > > A new field `tbid` is added to `struct bpf_fib_lookup` used as > parameters to the `bpf_fib_lookup` BPF helper. > > When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` flag and the > `tbid` field in `struct bpf_fib_lookup` is greater then 0, the `tbid` > field will be used as the table ID for the fib lookup. I think table id 0 is legal in the kernel, right? It is probably okay to consider table id 0 not supported to simplify the user interface. But it would be great to add some explanations in the commit message. > > If the `tbid` does not exist the fib lookup will fail with > `BPF_FIB_LKUP_RET_NOT_FWDED`. > > The `tbid` field becomes a union over the vlan related output fields in > `struct bpf_fib_lookup` and will be zeroed immediately after usage. > > This functionality is useful in containerized environments. > > For instance, if a CNI wants to dictate the next-hop for traffic leaving > a container it can create a container-specific routing table and perform > a fib lookup against this table in a "host-net-namespace-side" TC program. > > This functionality also allows `ip rule` like functionality at the TC > layer, allowing an eBPF program to pick a routing table based on some > aspect of the sk_buff. > > As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN > datapath. > > When egress traffic leaves a Pod an eBPF program attached by Cilium will > determine which VRF the egress traffic should target, and then perform a > FIB lookup in a specific table representing this VRF's FIB. > > Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com> > --- > include/uapi/linux/bpf.h | 17 ++++++++++++++--- > net/core/filter.c | 12 ++++++++++++ > tools/include/uapi/linux/bpf.h | 17 ++++++++++++++--- > 3 files changed, 40 insertions(+), 6 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 1bb11a6ee6676..2096fbb328a9b 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -3167,6 +3167,8 @@ union bpf_attr { > * **BPF_FIB_LOOKUP_DIRECT** > * Do a direct table lookup vs full lookup using FIB > * rules. > + * If *params*->tbid is non-zero, this value designates > + * a routing table ID to perform the lookup against. > * **BPF_FIB_LOOKUP_OUTPUT** > * Perform lookup from an egress perspective (default is > * ingress). > @@ -6881,9 +6883,18 @@ struct bpf_fib_lookup { > __u32 ipv6_dst[4]; /* in6_addr; network order */ > }; > > - /* output */ > - __be16 h_vlan_proto; > - __be16 h_vlan_TCI; > + union { > + struct { > + /* output */ > + __be16 h_vlan_proto; > + __be16 h_vlan_TCI; > + }; > + /* input: when accompanied with the 'BPF_FIB_LOOKUP_DIRECT` flag, a > + * specific routing table to use for the fib lookup. > + */ > + __u32 tbid; > + }; > + > __u8 smac[6]; /* ETH_ALEN */ > __u8 dmac[6]; /* ETH_ALEN */ > }; > diff --git a/net/core/filter.c b/net/core/filter.c > index 451b0ec7f2421..6f710aa0a54b3 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -5803,6 +5803,12 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, > u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; > struct fib_table *tb; > > + if (params->tbid) { > + tbid = params->tbid; > + /* zero out for vlan output */ > + params->tbid = 0; > + } > + > tb = fib_get_table(net, tbid); > if (unlikely(!tb)) > return BPF_FIB_LKUP_RET_NOT_FWDED; > @@ -5936,6 +5942,12 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, > u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; > struct fib6_table *tb; > > + if (params->tbid) { > + tbid = params->tbid; > + /* zero out for vlan output */ > + params->tbid = 0; > + } > + > tb = ipv6_stub->fib6_get_table(net, tbid); > if (unlikely(!tb)) > return BPF_FIB_LKUP_RET_NOT_FWDED; > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index 1bb11a6ee6676..2096fbb328a9b 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -3167,6 +3167,8 @@ union bpf_attr { > * **BPF_FIB_LOOKUP_DIRECT** > * Do a direct table lookup vs full lookup using FIB > * rules. > + * If *params*->tbid is non-zero, this value designates > + * a routing table ID to perform the lookup against. > * **BPF_FIB_LOOKUP_OUTPUT** > * Perform lookup from an egress perspective (default is > * ingress). > @@ -6881,9 +6883,18 @@ struct bpf_fib_lookup { > __u32 ipv6_dst[4]; /* in6_addr; network order */ > }; > > - /* output */ > - __be16 h_vlan_proto; > - __be16 h_vlan_TCI; > + union { > + struct { > + /* output */ > + __be16 h_vlan_proto; > + __be16 h_vlan_TCI; > + }; > + /* input: when accompanied with the 'BPF_FIB_LOOKUP_DIRECT` flag, a > + * specific routing table to use for the fib lookup. > + */ > + __u32 tbid; > + }; > + > __u8 smac[6]; /* ETH_ALEN */ > __u8 dmac[6]; /* ETH_ALEN */ > }; >
Louis DeLosSantos wrote: > Add ability to specify routing table ID to the `bpf_fib_lookup` BPF > helper. > > A new field `tbid` is added to `struct bpf_fib_lookup` used as > parameters to the `bpf_fib_lookup` BPF helper. > > When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` flag and the > `tbid` field in `struct bpf_fib_lookup` is greater then 0, the `tbid` > field will be used as the table ID for the fib lookup. > > If the `tbid` does not exist the fib lookup will fail with > `BPF_FIB_LKUP_RET_NOT_FWDED`. > > The `tbid` field becomes a union over the vlan related output fields in > `struct bpf_fib_lookup` and will be zeroed immediately after usage. > > This functionality is useful in containerized environments. > > For instance, if a CNI wants to dictate the next-hop for traffic leaving > a container it can create a container-specific routing table and perform > a fib lookup against this table in a "host-net-namespace-side" TC program. > > This functionality also allows `ip rule` like functionality at the TC > layer, allowing an eBPF program to pick a routing table based on some > aspect of the sk_buff. > > As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN > datapath. > > When egress traffic leaves a Pod an eBPF program attached by Cilium will > determine which VRF the egress traffic should target, and then perform a > FIB lookup in a specific table representing this VRF's FIB. > > Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com> > --- > include/uapi/linux/bpf.h | 17 ++++++++++++++--- > net/core/filter.c | 12 ++++++++++++ > tools/include/uapi/linux/bpf.h | 17 ++++++++++++++--- > 3 files changed, 40 insertions(+), 6 deletions(-) > Looks good one question. Should we hide tbid behind a flag we have lots of room. Is there any concern a user could feed a bpf_fib_lookup into the helper without clearing the vlan fields? Perhaps by pulling the struct from a map or something where it had been previously used. Thanks, John
On Thu, May 25, 2023 at 11:48:12PM -0700, John Fastabend wrote: > Louis DeLosSantos wrote: > > Add ability to specify routing table ID to the `bpf_fib_lookup` BPF > > helper. > > > > A new field `tbid` is added to `struct bpf_fib_lookup` used as > > parameters to the `bpf_fib_lookup` BPF helper. > > > > When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` flag and the > > `tbid` field in `struct bpf_fib_lookup` is greater then 0, the `tbid` > > field will be used as the table ID for the fib lookup. > > > > If the `tbid` does not exist the fib lookup will fail with > > `BPF_FIB_LKUP_RET_NOT_FWDED`. > > > > The `tbid` field becomes a union over the vlan related output fields in > > `struct bpf_fib_lookup` and will be zeroed immediately after usage. > > > > This functionality is useful in containerized environments. > > > > For instance, if a CNI wants to dictate the next-hop for traffic leaving > > a container it can create a container-specific routing table and perform > > a fib lookup against this table in a "host-net-namespace-side" TC program. > > > > This functionality also allows `ip rule` like functionality at the TC > > layer, allowing an eBPF program to pick a routing table based on some > > aspect of the sk_buff. > > > > As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN > > datapath. > > > > When egress traffic leaves a Pod an eBPF program attached by Cilium will > > determine which VRF the egress traffic should target, and then perform a > > FIB lookup in a specific table representing this VRF's FIB. > > > > Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com> > > --- > > include/uapi/linux/bpf.h | 17 ++++++++++++++--- > > net/core/filter.c | 12 ++++++++++++ > > tools/include/uapi/linux/bpf.h | 17 ++++++++++++++--- > > 3 files changed, 40 insertions(+), 6 deletions(-) > > > > Looks good one question. Should we hide tbid behind a flag we have > lots of room. Is there any concern a user could feed a bpf_fib_lookup > into the helper without clearing the vlan fields? Perhaps by > pulling the struct from a map or something where it had been > previously used. > > Thanks, > John This is a fair point. I could imagine a scenario where an individual is caching bpf_fib_lookup structs, pulls in a kernel with this change, and is now accidentally feeding the stale vlan fields as table ID's, since their code is using `BPF_FIB_LOOKUP_DIRECT` with the old semantics. Guarding with a new flag like this (just a quick example, not a full diff)... ``` diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 2096fbb328a9b..22095ccaaa64d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6823,6 +6823,7 @@ enum { BPF_FIB_LOOKUP_DIRECT = (1U << 0), BPF_FIB_LOOKUP_OUTPUT = (1U << 1), BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), + BPF_FIB_LOOKUP_TBID = (1U << 3), }; enum { diff --git a/net/core/filter.c b/net/core/filter.c index 6f710aa0a54b3..9b78460e39af2 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5803,7 +5803,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; struct fib_table *tb; - if (params->tbid) { + if (flags & BPF_FIB_LOOKUP_TBID) { tbid = params->tbid; /* zero out for vlan output */ params->tbid = 0; ``` Maybe a bit safer, you're right. In this case the semantics around `BPF_FIB_LOOKUP_DIRECT` remain exactly the same, and if we do `flags = BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_TBID`, only then will the `tbid` field in the incoming params wil be considered. If I squint at this, it technically also allows us to consider `tbid=0` as a valid table id, since the caller now explicitly opts into it, where previously table id 0 was not selectable, tho I don't know if there's a *real* use case for selecting the `all` table. I'm happy to make this change, what are your thoughts?
On Thu, May 25, 2023 at 11:01:34PM -0700, Yonghong Song wrote: > > > On 5/25/23 7:27 AM, Louis DeLosSantos wrote: > > Add ability to specify routing table ID to the `bpf_fib_lookup` BPF > > helper. > > > > A new field `tbid` is added to `struct bpf_fib_lookup` used as > > parameters to the `bpf_fib_lookup` BPF helper. > > > > When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` flag and the > > `tbid` field in `struct bpf_fib_lookup` is greater then 0, the `tbid` > > field will be used as the table ID for the fib lookup. > > I think table id 0 is legal in the kernel, right? > It is probably okay to consider table id 0 not supported to > simplify the user interface. But it would be great to > add some explanations in the commit message. > > > > > If the `tbid` does not exist the fib lookup will fail with > > `BPF_FIB_LKUP_RET_NOT_FWDED`. > > > > The `tbid` field becomes a union over the vlan related output fields in > > `struct bpf_fib_lookup` and will be zeroed immediately after usage. > > > > This functionality is useful in containerized environments. > > > > For instance, if a CNI wants to dictate the next-hop for traffic leaving > > a container it can create a container-specific routing table and perform > > a fib lookup against this table in a "host-net-namespace-side" TC program. > > > > This functionality also allows `ip rule` like functionality at the TC > > layer, allowing an eBPF program to pick a routing table based on some > > aspect of the sk_buff. > > > > As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN > > datapath. > > > > When egress traffic leaves a Pod an eBPF program attached by Cilium will > > determine which VRF the egress traffic should target, and then perform a > > FIB lookup in a specific table representing this VRF's FIB. > > > > Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com> > > --- > > include/uapi/linux/bpf.h | 17 ++++++++++++++--- > > net/core/filter.c | 12 ++++++++++++ > > tools/include/uapi/linux/bpf.h | 17 ++++++++++++++--- > > 3 files changed, 40 insertions(+), 6 deletions(-) > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 1bb11a6ee6676..2096fbb328a9b 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -3167,6 +3167,8 @@ union bpf_attr { > > * **BPF_FIB_LOOKUP_DIRECT** > > * Do a direct table lookup vs full lookup using FIB > > * rules. > > + * If *params*->tbid is non-zero, this value designates > > + * a routing table ID to perform the lookup against. > > * **BPF_FIB_LOOKUP_OUTPUT** > > * Perform lookup from an egress perspective (default is > > * ingress). > > @@ -6881,9 +6883,18 @@ struct bpf_fib_lookup { > > __u32 ipv6_dst[4]; /* in6_addr; network order */ > > }; > > - /* output */ > > - __be16 h_vlan_proto; > > - __be16 h_vlan_TCI; > > + union { > > + struct { > > + /* output */ > > + __be16 h_vlan_proto; > > + __be16 h_vlan_TCI; > > + }; > > + /* input: when accompanied with the 'BPF_FIB_LOOKUP_DIRECT` flag, a > > + * specific routing table to use for the fib lookup. > > + */ > > + __u32 tbid; > > + }; > > + > > __u8 smac[6]; /* ETH_ALEN */ > > __u8 dmac[6]; /* ETH_ALEN */ > > }; > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 451b0ec7f2421..6f710aa0a54b3 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -5803,6 +5803,12 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, > > u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; > > struct fib_table *tb; > > + if (params->tbid) { > > + tbid = params->tbid; > > + /* zero out for vlan output */ > > + params->tbid = 0; > > + } > > + > > tb = fib_get_table(net, tbid); > > if (unlikely(!tb)) > > return BPF_FIB_LKUP_RET_NOT_FWDED; > > @@ -5936,6 +5942,12 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, > > u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; > > struct fib6_table *tb; > > + if (params->tbid) { > > + tbid = params->tbid; > > + /* zero out for vlan output */ > > + params->tbid = 0; > > + } > > + > > tb = ipv6_stub->fib6_get_table(net, tbid); > > if (unlikely(!tb)) > > return BPF_FIB_LKUP_RET_NOT_FWDED; > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > > index 1bb11a6ee6676..2096fbb328a9b 100644 > > --- a/tools/include/uapi/linux/bpf.h > > +++ b/tools/include/uapi/linux/bpf.h > > @@ -3167,6 +3167,8 @@ union bpf_attr { > > * **BPF_FIB_LOOKUP_DIRECT** > > * Do a direct table lookup vs full lookup using FIB > > * rules. > > + * If *params*->tbid is non-zero, this value designates > > + * a routing table ID to perform the lookup against. > > * **BPF_FIB_LOOKUP_OUTPUT** > > * Perform lookup from an egress perspective (default is > > * ingress). > > @@ -6881,9 +6883,18 @@ struct bpf_fib_lookup { > > __u32 ipv6_dst[4]; /* in6_addr; network order */ > > }; > > - /* output */ > > - __be16 h_vlan_proto; > > - __be16 h_vlan_TCI; > > + union { > > + struct { > > + /* output */ > > + __be16 h_vlan_proto; > > + __be16 h_vlan_TCI; > > + }; > > + /* input: when accompanied with the 'BPF_FIB_LOOKUP_DIRECT` flag, a > > + * specific routing table to use for the fib lookup. > > + */ > > + __u32 tbid; > > + }; > > + > > __u8 smac[6]; /* ETH_ALEN */ > > __u8 dmac[6]; /* ETH_ALEN */ > > }; > > > I think table id 0 is legal in the kernel, right? > It is probably okay to consider table id 0 not supported to > simplify the user interface. But it would be great to > add some explanations in the commit message. Agreed. My initial feelings were there is no real use case to query against the Kernel's `all` table. The response from John will dictate if this remains the case, as the suggestion of using a new flag bit will nullify this issue, I think. If it stays tho, I will def add details in the commit message around this on next rev.
On 5/26/23 7:07 AM, Louis DeLosSantos wrote: > On Thu, May 25, 2023 at 11:48:12PM -0700, John Fastabend wrote: >> Louis DeLosSantos wrote: >>> Add ability to specify routing table ID to the `bpf_fib_lookup` BPF >>> helper. >>> >>> A new field `tbid` is added to `struct bpf_fib_lookup` used as >>> parameters to the `bpf_fib_lookup` BPF helper. >>> >>> When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` flag and the >>> `tbid` field in `struct bpf_fib_lookup` is greater then 0, the `tbid` >>> field will be used as the table ID for the fib lookup. >>> >>> If the `tbid` does not exist the fib lookup will fail with >>> `BPF_FIB_LKUP_RET_NOT_FWDED`. >>> >>> The `tbid` field becomes a union over the vlan related output fields in >>> `struct bpf_fib_lookup` and will be zeroed immediately after usage. >>> >>> This functionality is useful in containerized environments. >>> >>> For instance, if a CNI wants to dictate the next-hop for traffic leaving >>> a container it can create a container-specific routing table and perform >>> a fib lookup against this table in a "host-net-namespace-side" TC program. >>> >>> This functionality also allows `ip rule` like functionality at the TC >>> layer, allowing an eBPF program to pick a routing table based on some >>> aspect of the sk_buff. >>> >>> As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN >>> datapath. >>> >>> When egress traffic leaves a Pod an eBPF program attached by Cilium will >>> determine which VRF the egress traffic should target, and then perform a >>> FIB lookup in a specific table representing this VRF's FIB. >>> >>> Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com> >>> --- >>> include/uapi/linux/bpf.h | 17 ++++++++++++++--- >>> net/core/filter.c | 12 ++++++++++++ >>> tools/include/uapi/linux/bpf.h | 17 ++++++++++++++--- >>> 3 files changed, 40 insertions(+), 6 deletions(-) >>> >> >> Looks good one question. Should we hide tbid behind a flag we have >> lots of room. Is there any concern a user could feed a bpf_fib_lookup >> into the helper without clearing the vlan fields? Perhaps by >> pulling the struct from a map or something where it had been >> previously used. >> >> Thanks, >> John > > This is a fair point. > > I could imagine a scenario where an individual is caching bpf_fib_lookup structs, > pulls in a kernel with this change, and is now accidentally feeding the stale vlan > fields as table ID's, since their code is using `BPF_FIB_LOOKUP_DIRECT` with > the old semantics. > > Guarding with a new flag like this (just a quick example, not a full diff)... > > ``` > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 2096fbb328a9b..22095ccaaa64d 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -6823,6 +6823,7 @@ enum { > BPF_FIB_LOOKUP_DIRECT = (1U << 0), > BPF_FIB_LOOKUP_OUTPUT = (1U << 1), > BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), > + BPF_FIB_LOOKUP_TBID = (1U << 3), > }; > > enum { > diff --git a/net/core/filter.c b/net/core/filter.c > index 6f710aa0a54b3..9b78460e39af2 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -5803,7 +5803,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, > u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; > struct fib_table *tb; > > - if (params->tbid) { > + if (flags & BPF_FIB_LOOKUP_TBID) { > tbid = params->tbid; > /* zero out for vlan output */ > params->tbid = 0; > ``` > > Maybe a bit safer, you're right. > > In this case the semantics around `BPF_FIB_LOOKUP_DIRECT` remain exactly the same, > and if we do `flags = BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_TBID`, only then will > the `tbid` field in the incoming params wil be considered. > > If I squint at this, it technically also allows us to consider `tbid=0` as a > valid table id, since the caller now explicitly opts into it, where previously > table id 0 was not selectable, tho I don't know if there's a *real* use case > for selecting the `all` table. > > I'm happy to make this change, what are your thoughts? Sounds good to me so we won't reject legal table id. >
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 1bb11a6ee6676..2096fbb328a9b 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3167,6 +3167,8 @@ union bpf_attr { * **BPF_FIB_LOOKUP_DIRECT** * Do a direct table lookup vs full lookup using FIB * rules. + * If *params*->tbid is non-zero, this value designates + * a routing table ID to perform the lookup against. * **BPF_FIB_LOOKUP_OUTPUT** * Perform lookup from an egress perspective (default is * ingress). @@ -6881,9 +6883,18 @@ struct bpf_fib_lookup { __u32 ipv6_dst[4]; /* in6_addr; network order */ }; - /* output */ - __be16 h_vlan_proto; - __be16 h_vlan_TCI; + union { + struct { + /* output */ + __be16 h_vlan_proto; + __be16 h_vlan_TCI; + }; + /* input: when accompanied with the 'BPF_FIB_LOOKUP_DIRECT` flag, a + * specific routing table to use for the fib lookup. + */ + __u32 tbid; + }; + __u8 smac[6]; /* ETH_ALEN */ __u8 dmac[6]; /* ETH_ALEN */ }; diff --git a/net/core/filter.c b/net/core/filter.c index 451b0ec7f2421..6f710aa0a54b3 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5803,6 +5803,12 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; struct fib_table *tb; + if (params->tbid) { + tbid = params->tbid; + /* zero out for vlan output */ + params->tbid = 0; + } + tb = fib_get_table(net, tbid); if (unlikely(!tb)) return BPF_FIB_LKUP_RET_NOT_FWDED; @@ -5936,6 +5942,12 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; struct fib6_table *tb; + if (params->tbid) { + tbid = params->tbid; + /* zero out for vlan output */ + params->tbid = 0; + } + tb = ipv6_stub->fib6_get_table(net, tbid); if (unlikely(!tb)) return BPF_FIB_LKUP_RET_NOT_FWDED; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 1bb11a6ee6676..2096fbb328a9b 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3167,6 +3167,8 @@ union bpf_attr { * **BPF_FIB_LOOKUP_DIRECT** * Do a direct table lookup vs full lookup using FIB * rules. + * If *params*->tbid is non-zero, this value designates + * a routing table ID to perform the lookup against. * **BPF_FIB_LOOKUP_OUTPUT** * Perform lookup from an egress perspective (default is * ingress). @@ -6881,9 +6883,18 @@ struct bpf_fib_lookup { __u32 ipv6_dst[4]; /* in6_addr; network order */ }; - /* output */ - __be16 h_vlan_proto; - __be16 h_vlan_TCI; + union { + struct { + /* output */ + __be16 h_vlan_proto; + __be16 h_vlan_TCI; + }; + /* input: when accompanied with the 'BPF_FIB_LOOKUP_DIRECT` flag, a + * specific routing table to use for the fib lookup. + */ + __u32 tbid; + }; + __u8 smac[6]; /* ETH_ALEN */ __u8 dmac[6]; /* ETH_ALEN */ };
Add ability to specify routing table ID to the `bpf_fib_lookup` BPF helper. A new field `tbid` is added to `struct bpf_fib_lookup` used as parameters to the `bpf_fib_lookup` BPF helper. When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` flag and the `tbid` field in `struct bpf_fib_lookup` is greater then 0, the `tbid` field will be used as the table ID for the fib lookup. If the `tbid` does not exist the fib lookup will fail with `BPF_FIB_LKUP_RET_NOT_FWDED`. The `tbid` field becomes a union over the vlan related output fields in `struct bpf_fib_lookup` and will be zeroed immediately after usage. This functionality is useful in containerized environments. For instance, if a CNI wants to dictate the next-hop for traffic leaving a container it can create a container-specific routing table and perform a fib lookup against this table in a "host-net-namespace-side" TC program. This functionality also allows `ip rule` like functionality at the TC layer, allowing an eBPF program to pick a routing table based on some aspect of the sk_buff. As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN datapath. When egress traffic leaves a Pod an eBPF program attached by Cilium will determine which VRF the egress traffic should target, and then perform a FIB lookup in a specific table representing this VRF's FIB. Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com> --- include/uapi/linux/bpf.h | 17 ++++++++++++++--- net/core/filter.c | 12 ++++++++++++ tools/include/uapi/linux/bpf.h | 17 ++++++++++++++--- 3 files changed, 40 insertions(+), 6 deletions(-)