Message ID | 20210119155013.154808-2-bjorn.topel@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | Introduce bpf_redirect_xsk() helper | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 7 maintainers not CCed: songliubraving@fb.com andrii@kernel.org rostedt@goodmis.org mingo@redhat.com kpsingh@kernel.org kafai@fb.com yhs@fb.com |
netdev/source_inline | success | Was 1 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 4577 this patch: 4577 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 370 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 5005 this patch: 5005 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
Björn Töpel <bjorn.topel@gmail.com> writes: > From: Björn Töpel <bjorn.topel@intel.com> > > The XDP_REDIRECT implementations for maps and non-maps are fairly > similar, but obviously need to take different code paths depending on > if the target is using a map or not. Today, the redirect targets for > XDP either uses a map, or is based on ifindex. > > Future commits will introduce yet another redirect target via the a > new helper, bpf_redirect_xsk(). To pave the way for that, we introduce > an explicit redirect type to bpf_redirect_info. This makes the code > easier to follow, and makes it easier to add new redirect targets. > > Further, using an explicit type in bpf_redirect_info has a slight > positive performance impact by avoiding a pointer indirection for the > map type lookup, and instead use the hot cacheline for > bpf_redirect_info. > > The bpf_redirect_info flags member is not used by XDP, and not > read/written any more. The map member is only written to when > required/used, and not unconditionally. I like the simplification. However, the handling of map clearing becomes a bit murky with this change: You're not changing anything in bpf_clear_redirect_map(), and you're removing most of the reads and writes of ri->map. Instead, bpf_xdp_redirect_map() will store the bpf_dtab_netdev pointer in ri->tgt_value, which xdp_do_redirect() will just read and use without checking. But if the map element (or the entire map) has been freed in the meantime that will be a dangling pointer. I *think* the RCU callback in dev_map_delete_elem() and the rcu_barrier() in dev_map_free() protects against this, but that is by no means obvious. So confirming this, and explaining it in a comment would be good. Also, as far as I can tell after this, ri->map is only used for the tracepoint. So how about just storing the map ID and getting rid of the READ/WRITE_ONCE() entirely? (Oh, and related to this I think this patch set will conflict with Hangbin's multi-redirect series, so maybe you two ought to coordinate? :)) -Toke
On 2021-01-20 13:44, Toke Høiland-Jørgensen wrote: > Björn Töpel <bjorn.topel@gmail.com> writes: > >> From: Björn Töpel <bjorn.topel@intel.com> >> >> The XDP_REDIRECT implementations for maps and non-maps are fairly >> similar, but obviously need to take different code paths depending on >> if the target is using a map or not. Today, the redirect targets for >> XDP either uses a map, or is based on ifindex. >> >> Future commits will introduce yet another redirect target via the a >> new helper, bpf_redirect_xsk(). To pave the way for that, we introduce >> an explicit redirect type to bpf_redirect_info. This makes the code >> easier to follow, and makes it easier to add new redirect targets. >> >> Further, using an explicit type in bpf_redirect_info has a slight >> positive performance impact by avoiding a pointer indirection for the >> map type lookup, and instead use the hot cacheline for >> bpf_redirect_info. >> >> The bpf_redirect_info flags member is not used by XDP, and not >> read/written any more. The map member is only written to when >> required/used, and not unconditionally. > > I like the simplification. However, the handling of map clearing becomes > a bit murky with this change: > > You're not changing anything in bpf_clear_redirect_map(), and you're > removing most of the reads and writes of ri->map. Instead, > bpf_xdp_redirect_map() will store the bpf_dtab_netdev pointer in > ri->tgt_value, which xdp_do_redirect() will just read and use without > checking. But if the map element (or the entire map) has been freed in > the meantime that will be a dangling pointer. I *think* the RCU callback > in dev_map_delete_elem() and the rcu_barrier() in dev_map_free() > protects against this, but that is by no means obvious. So confirming > this, and explaining it in a comment would be good. > Yes, *most* of the READ_ONCE(ri->map) are removed, it's pretty much only the bpf_redirect_map(), and as you write, the tracepoints. The content/element of the map is RCU protected, and actually even the map will be around until the XDP processing is complete. Note the synchronize_rcu() followed after all bpf_clear_redirect_map() calls. I'll try to make it clearer in the commit message! Thanks for pointing that out! > Also, as far as I can tell after this, ri->map is only used for the > tracepoint. So how about just storing the map ID and getting rid of the > READ/WRITE_ONCE() entirely? > ...and the bpf_redirect_map() helper. Don't you think the current READ_ONCE(ri->map) scheme is more obvious/clear? > (Oh, and related to this I think this patch set will conflict with > Hangbin's multi-redirect series, so maybe you two ought to coordinate? :)) > Yeah, good idea! I would guess Hangbin's would go in before this, so I would need to adapt. Thanks for taking of look at the series, Toke! Much appreciated! Björn
Björn Töpel <bjorn.topel@intel.com> writes: > On 2021-01-20 13:44, Toke Høiland-Jørgensen wrote: >> Björn Töpel <bjorn.topel@gmail.com> writes: >> >>> From: Björn Töpel <bjorn.topel@intel.com> >>> >>> The XDP_REDIRECT implementations for maps and non-maps are fairly >>> similar, but obviously need to take different code paths depending on >>> if the target is using a map or not. Today, the redirect targets for >>> XDP either uses a map, or is based on ifindex. >>> >>> Future commits will introduce yet another redirect target via the a >>> new helper, bpf_redirect_xsk(). To pave the way for that, we introduce >>> an explicit redirect type to bpf_redirect_info. This makes the code >>> easier to follow, and makes it easier to add new redirect targets. >>> >>> Further, using an explicit type in bpf_redirect_info has a slight >>> positive performance impact by avoiding a pointer indirection for the >>> map type lookup, and instead use the hot cacheline for >>> bpf_redirect_info. >>> >>> The bpf_redirect_info flags member is not used by XDP, and not >>> read/written any more. The map member is only written to when >>> required/used, and not unconditionally. >> >> I like the simplification. However, the handling of map clearing becomes >> a bit murky with this change: >> >> You're not changing anything in bpf_clear_redirect_map(), and you're >> removing most of the reads and writes of ri->map. Instead, >> bpf_xdp_redirect_map() will store the bpf_dtab_netdev pointer in >> ri->tgt_value, which xdp_do_redirect() will just read and use without >> checking. But if the map element (or the entire map) has been freed in >> the meantime that will be a dangling pointer. I *think* the RCU callback >> in dev_map_delete_elem() and the rcu_barrier() in dev_map_free() >> protects against this, but that is by no means obvious. So confirming >> this, and explaining it in a comment would be good. >> > > Yes, *most* of the READ_ONCE(ri->map) are removed, it's pretty much only > the bpf_redirect_map(), and as you write, the tracepoints. > > The content/element of the map is RCU protected, and actually even the > map will be around until the XDP processing is complete. Note the > synchronize_rcu() followed after all bpf_clear_redirect_map() calls. > > I'll try to make it clearer in the commit message! Thanks for pointing > that out! > >> Also, as far as I can tell after this, ri->map is only used for the >> tracepoint. So how about just storing the map ID and getting rid of the >> READ/WRITE_ONCE() entirely? >> > > ...and the bpf_redirect_map() helper. Don't you think the current > READ_ONCE(ri->map) scheme is more obvious/clear? Yeah, after your patch we WRITE_ONCE() the pointer in bpf_redirect_map(), but the only place it is actually *read* is in the tracepoint. So the only purpose of bpf_clear_redirect_map() is to ensure that an invalid pointer is not read in the tracepoint function. Which seems a bit excessive when we could just store the map ID for direct use in the tracepoint and get rid of bpf_clear_redirect_map() entirely, no? Besides, from a UX point of view, having the tracepoint display the map ID even if that map ID is no longer valid seems to me like it makes more sense than just displaying a map ID of 0 and leaving it up to the user to figure out that this is because the map was cleared. I mean, at the time the redirect was made, that *was* the map ID that was used... Oh, and as you say due to the synchronize_rcu() call in dev_map_free() I think this whole discussion is superfluous anyway, since it can't actually happen that the map gets freed between the setting and reading of ri->map, no? >> (Oh, and related to this I think this patch set will conflict with >> Hangbin's multi-redirect series, so maybe you two ought to coordinate? :)) >> > > Yeah, good idea! I would guess Hangbin's would go in before this, so I > would need to adapt. > > > Thanks for taking of look at the series, Toke! Much appreciated! You're welcome :) -Toke
On 2021-01-20 15:52, Toke Høiland-Jørgensen wrote: > Björn Töpel <bjorn.topel@intel.com> writes: > >> On 2021-01-20 13:44, Toke Høiland-Jørgensen wrote: >>> Björn Töpel <bjorn.topel@gmail.com> writes: >>> >>>> From: Björn Töpel <bjorn.topel@intel.com> >>>> >>>> The XDP_REDIRECT implementations for maps and non-maps are fairly >>>> similar, but obviously need to take different code paths depending on >>>> if the target is using a map or not. Today, the redirect targets for >>>> XDP either uses a map, or is based on ifindex. >>>> >>>> Future commits will introduce yet another redirect target via the a >>>> new helper, bpf_redirect_xsk(). To pave the way for that, we introduce >>>> an explicit redirect type to bpf_redirect_info. This makes the code >>>> easier to follow, and makes it easier to add new redirect targets. >>>> >>>> Further, using an explicit type in bpf_redirect_info has a slight >>>> positive performance impact by avoiding a pointer indirection for the >>>> map type lookup, and instead use the hot cacheline for >>>> bpf_redirect_info. >>>> >>>> The bpf_redirect_info flags member is not used by XDP, and not >>>> read/written any more. The map member is only written to when >>>> required/used, and not unconditionally. >>> >>> I like the simplification. However, the handling of map clearing becomes >>> a bit murky with this change: >>> >>> You're not changing anything in bpf_clear_redirect_map(), and you're >>> removing most of the reads and writes of ri->map. Instead, >>> bpf_xdp_redirect_map() will store the bpf_dtab_netdev pointer in >>> ri->tgt_value, which xdp_do_redirect() will just read and use without >>> checking. But if the map element (or the entire map) has been freed in >>> the meantime that will be a dangling pointer. I *think* the RCU callback >>> in dev_map_delete_elem() and the rcu_barrier() in dev_map_free() >>> protects against this, but that is by no means obvious. So confirming >>> this, and explaining it in a comment would be good. >>> >> >> Yes, *most* of the READ_ONCE(ri->map) are removed, it's pretty much only >> the bpf_redirect_map(), and as you write, the tracepoints. >> >> The content/element of the map is RCU protected, and actually even the >> map will be around until the XDP processing is complete. Note the >> synchronize_rcu() followed after all bpf_clear_redirect_map() calls. >> >> I'll try to make it clearer in the commit message! Thanks for pointing >> that out! >> >>> Also, as far as I can tell after this, ri->map is only used for the >>> tracepoint. So how about just storing the map ID and getting rid of the >>> READ/WRITE_ONCE() entirely? >>> >> >> ...and the bpf_redirect_map() helper. Don't you think the current >> READ_ONCE(ri->map) scheme is more obvious/clear? > > Yeah, after your patch we WRITE_ONCE() the pointer in > bpf_redirect_map(), but the only place it is actually *read* is in the > tracepoint. So the only purpose of bpf_clear_redirect_map() is to ensure > that an invalid pointer is not read in the tracepoint function. Which > seems a bit excessive when we could just store the map ID for direct use > in the tracepoint and get rid of bpf_clear_redirect_map() entirely, no? > > Besides, from a UX point of view, having the tracepoint display the map > ID even if that map ID is no longer valid seems to me like it makes more > sense than just displaying a map ID of 0 and leaving it up to the user > to figure out that this is because the map was cleared. I mean, at the > time the redirect was made, that *was* the map ID that was used... > Convinced! Getting rid of bpf_clear_redirect_map() would be good! I'll take a stab at this for v3! > Oh, and as you say due to the synchronize_rcu() call in dev_map_free() I > think this whole discussion is superfluous anyway, since it can't > actually happen that the map gets freed between the setting and reading > of ri->map, no? > It can't be free'd but, ri->map can be cleared via bpf_clear_redirect_map(). So, between the helper (setting) and the tracepoint in xdp_do_redirect() it can be cleared (say if the XDP program is swapped out, prior running xdp_do_redirect()). Moving to the scheme you suggested, does make the discussion superfluous. :-) Thanks for the input! Björn
Björn Töpel <bjorn.topel@intel.com> writes: > On 2021-01-20 15:52, Toke Høiland-Jørgensen wrote: >> Björn Töpel <bjorn.topel@intel.com> writes: >> >>> On 2021-01-20 13:44, Toke Høiland-Jørgensen wrote: >>>> Björn Töpel <bjorn.topel@gmail.com> writes: >>>> >>>>> From: Björn Töpel <bjorn.topel@intel.com> >>>>> >>>>> The XDP_REDIRECT implementations for maps and non-maps are fairly >>>>> similar, but obviously need to take different code paths depending on >>>>> if the target is using a map or not. Today, the redirect targets for >>>>> XDP either uses a map, or is based on ifindex. >>>>> >>>>> Future commits will introduce yet another redirect target via the a >>>>> new helper, bpf_redirect_xsk(). To pave the way for that, we introduce >>>>> an explicit redirect type to bpf_redirect_info. This makes the code >>>>> easier to follow, and makes it easier to add new redirect targets. >>>>> >>>>> Further, using an explicit type in bpf_redirect_info has a slight >>>>> positive performance impact by avoiding a pointer indirection for the >>>>> map type lookup, and instead use the hot cacheline for >>>>> bpf_redirect_info. >>>>> >>>>> The bpf_redirect_info flags member is not used by XDP, and not >>>>> read/written any more. The map member is only written to when >>>>> required/used, and not unconditionally. >>>> >>>> I like the simplification. However, the handling of map clearing becomes >>>> a bit murky with this change: >>>> >>>> You're not changing anything in bpf_clear_redirect_map(), and you're >>>> removing most of the reads and writes of ri->map. Instead, >>>> bpf_xdp_redirect_map() will store the bpf_dtab_netdev pointer in >>>> ri->tgt_value, which xdp_do_redirect() will just read and use without >>>> checking. But if the map element (or the entire map) has been freed in >>>> the meantime that will be a dangling pointer. I *think* the RCU callback >>>> in dev_map_delete_elem() and the rcu_barrier() in dev_map_free() >>>> protects against this, but that is by no means obvious. So confirming >>>> this, and explaining it in a comment would be good. >>>> >>> >>> Yes, *most* of the READ_ONCE(ri->map) are removed, it's pretty much only >>> the bpf_redirect_map(), and as you write, the tracepoints. >>> >>> The content/element of the map is RCU protected, and actually even the >>> map will be around until the XDP processing is complete. Note the >>> synchronize_rcu() followed after all bpf_clear_redirect_map() calls. >>> >>> I'll try to make it clearer in the commit message! Thanks for pointing >>> that out! >>> >>>> Also, as far as I can tell after this, ri->map is only used for the >>>> tracepoint. So how about just storing the map ID and getting rid of the >>>> READ/WRITE_ONCE() entirely? >>>> >>> >>> ...and the bpf_redirect_map() helper. Don't you think the current >>> READ_ONCE(ri->map) scheme is more obvious/clear? >> >> Yeah, after your patch we WRITE_ONCE() the pointer in >> bpf_redirect_map(), but the only place it is actually *read* is in the >> tracepoint. So the only purpose of bpf_clear_redirect_map() is to ensure >> that an invalid pointer is not read in the tracepoint function. Which >> seems a bit excessive when we could just store the map ID for direct use >> in the tracepoint and get rid of bpf_clear_redirect_map() entirely, no? >> >> Besides, from a UX point of view, having the tracepoint display the map >> ID even if that map ID is no longer valid seems to me like it makes more >> sense than just displaying a map ID of 0 and leaving it up to the user >> to figure out that this is because the map was cleared. I mean, at the >> time the redirect was made, that *was* the map ID that was used... >> > > Convinced! Getting rid of bpf_clear_redirect_map() would be good! I'll > take a stab at this for v3! Cool! >> Oh, and as you say due to the synchronize_rcu() call in dev_map_free() I >> think this whole discussion is superfluous anyway, since it can't >> actually happen that the map gets freed between the setting and reading >> of ri->map, no? >> > > It can't be free'd but, ri->map can be cleared via > bpf_clear_redirect_map(). So, between the helper (setting) and the > tracepoint in xdp_do_redirect() it can be cleared (say if the XDP > program is swapped out, prior running xdp_do_redirect()). But xdp_do_redirect() should be called on driver flush before exiting the NAPI cycle, so how can the XDP program be swapped out? > Moving to the scheme you suggested, does make the discussion > superfluous. :-) Yup, awesome :) -Toke
On 2021-01-20 17:30, Toke Høiland-Jørgensen wrote: > Björn Töpel <bjorn.topel@intel.com> writes: > [...] >> >> It can't be free'd but, ri->map can be cleared via >> bpf_clear_redirect_map(). So, between the helper (setting) and the >> tracepoint in xdp_do_redirect() it can be cleared (say if the XDP >> program is swapped out, prior running xdp_do_redirect()). > > But xdp_do_redirect() should be called on driver flush before exiting > the NAPI cycle, so how can the XDP program be swapped out? > To clarify; xdp_do_redirect() is called for each packet in the NAPI poll loop. Yeah, you're right. The xdp_do_redirect() is within the RCU scope, so the program wont be destroyed (but can be swapped though!). Hmm, so IOW the bpf_clear_redirect_map() is not needed anymore... Björn [...]
diff --git a/include/linux/filter.h b/include/linux/filter.h index 7fdce5407214..5fc336a271c2 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -637,10 +637,19 @@ struct bpf_redirect_info { u32 tgt_index; void *tgt_value; struct bpf_map *map; + u32 tgt_type; u32 kern_flags; struct bpf_nh_params nh; }; +enum xdp_redirect_type { + XDP_REDIR_UNSET, + XDP_REDIR_DEV_IFINDEX, + XDP_REDIR_DEV_MAP, + XDP_REDIR_CPU_MAP, + XDP_REDIR_XSK_MAP, +}; + DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); /* flags for bpf_redirect_info kern_flags */ diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index 76a97176ab81..0e17b9a74f28 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -96,9 +96,10 @@ DECLARE_EVENT_CLASS(xdp_redirect_template, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, const void *tgt, int err, - const struct bpf_map *map, u32 index), + enum xdp_redirect_type type, + const struct bpf_redirect_info *ri), - TP_ARGS(dev, xdp, tgt, err, map, index), + TP_ARGS(dev, xdp, tgt, err, type, ri), TP_STRUCT__entry( __field(int, prog_id) @@ -111,12 +112,19 @@ DECLARE_EVENT_CLASS(xdp_redirect_template, ), TP_fast_assign( + struct bpf_map *map = NULL; + u32 index = ri->tgt_index; + + if (type == XDP_REDIR_DEV_MAP || type == XDP_REDIR_CPU_MAP || + type == XDP_REDIR_XSK_MAP) + map = READ_ONCE(ri->map); + __entry->prog_id = xdp->aux->id; __entry->act = XDP_REDIRECT; __entry->ifindex = dev->ifindex; __entry->err = err; __entry->to_ifindex = map ? devmap_ifindex(tgt, map) : - index; + (u32)(long)tgt; __entry->map_id = map ? map->id : 0; __entry->map_index = map ? index : 0; ), @@ -133,45 +141,49 @@ DEFINE_EVENT(xdp_redirect_template, xdp_redirect, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, const void *tgt, int err, - const struct bpf_map *map, u32 index), - TP_ARGS(dev, xdp, tgt, err, map, index) + enum xdp_redirect_type type, + const struct bpf_redirect_info *ri), + TP_ARGS(dev, xdp, tgt, err, type, ri) ); DEFINE_EVENT(xdp_redirect_template, xdp_redirect_err, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, const void *tgt, int err, - const struct bpf_map *map, u32 index), - TP_ARGS(dev, xdp, tgt, err, map, index) + enum xdp_redirect_type type, + const struct bpf_redirect_info *ri), + TP_ARGS(dev, xdp, tgt, err, type, ri) ); #define _trace_xdp_redirect(dev, xdp, to) \ - trace_xdp_redirect(dev, xdp, NULL, 0, NULL, to) + trace_xdp_redirect(dev, xdp, NULL, 0, XDP_REDIR_DEV_IFINDEX, NULL) #define _trace_xdp_redirect_err(dev, xdp, to, err) \ - trace_xdp_redirect_err(dev, xdp, NULL, err, NULL, to) + trace_xdp_redirect_err(dev, xdp, NULL, err, XDP_REDIR_DEV_IFINDEX, NULL) -#define _trace_xdp_redirect_map(dev, xdp, to, map, index) \ - trace_xdp_redirect(dev, xdp, to, 0, map, index) +#define _trace_xdp_redirect_map(dev, xdp, to, type, ri) \ + trace_xdp_redirect(dev, xdp, to, 0, type, ri) -#define _trace_xdp_redirect_map_err(dev, xdp, to, map, index, err) \ - trace_xdp_redirect_err(dev, xdp, to, err, map, index) +#define _trace_xdp_redirect_map_err(dev, xdp, to, type, ri, err) \ + trace_xdp_redirect_err(dev, xdp, to, err, type, ri) /* not used anymore, but kept around so as not to break old programs */ DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, const void *tgt, int err, - const struct bpf_map *map, u32 index), - TP_ARGS(dev, xdp, tgt, err, map, index) + enum xdp_redirect_type type, + const struct bpf_redirect_info *ri), + TP_ARGS(dev, xdp, tgt, err, type, ri) ); DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map_err, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, const void *tgt, int err, - const struct bpf_map *map, u32 index), - TP_ARGS(dev, xdp, tgt, err, map, index) + enum xdp_redirect_type type, + const struct bpf_redirect_info *ri), + TP_ARGS(dev, xdp, tgt, err, type, ri) ); TRACE_EVENT(xdp_cpumap_kthread, diff --git a/net/core/filter.c b/net/core/filter.c index 9ab94e90d660..5f31e21be531 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3923,23 +3923,6 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { .arg2_type = ARG_ANYTHING, }; -static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, - struct bpf_map *map, struct xdp_buff *xdp) -{ - switch (map->map_type) { - case BPF_MAP_TYPE_DEVMAP: - case BPF_MAP_TYPE_DEVMAP_HASH: - return dev_map_enqueue(fwd, xdp, dev_rx); - case BPF_MAP_TYPE_CPUMAP: - return cpu_map_enqueue(fwd, xdp, dev_rx); - case BPF_MAP_TYPE_XSKMAP: - return __xsk_map_redirect(fwd, xdp); - default: - return -EBADRQC; - } - return 0; -} - void xdp_do_flush(void) { __dev_flush(); @@ -3948,22 +3931,6 @@ void xdp_do_flush(void) } EXPORT_SYMBOL_GPL(xdp_do_flush); -static inline void *__xdp_map_lookup_elem(struct bpf_map *map, u32 index) -{ - switch (map->map_type) { - case BPF_MAP_TYPE_DEVMAP: - return __dev_map_lookup_elem(map, index); - case BPF_MAP_TYPE_DEVMAP_HASH: - return __dev_map_hash_lookup_elem(map, index); - case BPF_MAP_TYPE_CPUMAP: - return __cpu_map_lookup_elem(map, index); - case BPF_MAP_TYPE_XSKMAP: - return __xsk_map_lookup_elem(map, index); - default: - return NULL; - } -} - void bpf_clear_redirect_map(struct bpf_map *map) { struct bpf_redirect_info *ri; @@ -3985,34 +3952,42 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - struct bpf_map *map = READ_ONCE(ri->map); - u32 index = ri->tgt_index; + enum xdp_redirect_type type = ri->tgt_type; void *fwd = ri->tgt_value; int err; - ri->tgt_index = 0; + ri->tgt_type = XDP_REDIR_UNSET; ri->tgt_value = NULL; - WRITE_ONCE(ri->map, NULL); - if (unlikely(!map)) { - fwd = dev_get_by_index_rcu(dev_net(dev), index); + switch (type) { + case XDP_REDIR_DEV_IFINDEX: + fwd = dev_get_by_index_rcu(dev_net(dev), (u32)(long)fwd); if (unlikely(!fwd)) { err = -EINVAL; - goto err; + break; } - err = dev_xdp_enqueue(fwd, xdp, dev); - } else { - err = __bpf_tx_xdp_map(dev, fwd, map, xdp); + break; + case XDP_REDIR_DEV_MAP: + err = dev_map_enqueue(fwd, xdp, dev); + break; + case XDP_REDIR_CPU_MAP: + err = cpu_map_enqueue(fwd, xdp, dev); + break; + case XDP_REDIR_XSK_MAP: + err = __xsk_map_redirect(fwd, xdp); + break; + default: + err = -EBADRQC; } if (unlikely(err)) goto err; - _trace_xdp_redirect_map(dev, xdp_prog, fwd, map, index); + _trace_xdp_redirect_map(dev, xdp_prog, fwd, type, ri); return 0; err: - _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map, index, err); + _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, type, ri, err); return err; } EXPORT_SYMBOL_GPL(xdp_do_redirect); @@ -4021,41 +3996,40 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog, - struct bpf_map *map) + void *fwd, + enum xdp_redirect_type type) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - u32 index = ri->tgt_index; - void *fwd = ri->tgt_value; - int err = 0; - - ri->tgt_index = 0; - ri->tgt_value = NULL; - WRITE_ONCE(ri->map, NULL); + int err; - if (map->map_type == BPF_MAP_TYPE_DEVMAP || - map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) { + switch (type) { + case XDP_REDIR_DEV_MAP: { struct bpf_dtab_netdev *dst = fwd; err = dev_map_generic_redirect(dst, skb, xdp_prog); if (unlikely(err)) goto err; - } else if (map->map_type == BPF_MAP_TYPE_XSKMAP) { + break; + } + case XDP_REDIR_XSK_MAP: { struct xdp_sock *xs = fwd; err = xsk_generic_rcv(xs, xdp); if (err) goto err; consume_skb(skb); - } else { + break; + } + default: /* TODO: Handle BPF_MAP_TYPE_CPUMAP */ err = -EBADRQC; goto err; } - _trace_xdp_redirect_map(dev, xdp_prog, fwd, map, index); + _trace_xdp_redirect_map(dev, xdp_prog, fwd, type, ri); return 0; err: - _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map, index, err); + _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, type, ri, err); return err; } @@ -4063,29 +4037,31 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - struct bpf_map *map = READ_ONCE(ri->map); - u32 index = ri->tgt_index; - struct net_device *fwd; + enum xdp_redirect_type type = ri->tgt_type; + void *fwd = ri->tgt_value; int err = 0; - if (map) - return xdp_do_generic_redirect_map(dev, skb, xdp, xdp_prog, - map); - ri->tgt_index = 0; - fwd = dev_get_by_index_rcu(dev_net(dev), index); - if (unlikely(!fwd)) { - err = -EINVAL; - goto err; - } + ri->tgt_type = XDP_REDIR_UNSET; + ri->tgt_value = NULL; - err = xdp_ok_fwd_dev(fwd, skb->len); - if (unlikely(err)) - goto err; + if (type == XDP_REDIR_DEV_IFINDEX) { + fwd = dev_get_by_index_rcu(dev_net(dev), (u32)(long)fwd); + if (unlikely(!fwd)) { + err = -EINVAL; + goto err; + } - skb->dev = fwd; - _trace_xdp_redirect(dev, xdp_prog, index); - generic_xdp_tx(skb, xdp_prog); - return 0; + err = xdp_ok_fwd_dev(fwd, skb->len); + if (unlikely(err)) + goto err; + + skb->dev = fwd; + _trace_xdp_redirect(dev, xdp_prog, index); + generic_xdp_tx(skb, xdp_prog); + return 0; + } + + return xdp_do_generic_redirect_map(dev, skb, xdp, xdp_prog, fwd, type); err: _trace_xdp_redirect_err(dev, xdp_prog, index, err); return err; @@ -4098,10 +4074,9 @@ BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags) if (unlikely(flags)) return XDP_ABORTED; - ri->flags = flags; - ri->tgt_index = ifindex; - ri->tgt_value = NULL; - WRITE_ONCE(ri->map, NULL); + ri->tgt_type = XDP_REDIR_DEV_IFINDEX; + ri->tgt_index = 0; + ri->tgt_value = (void *)(long)ifindex; return XDP_REDIRECT; } @@ -4123,18 +4098,37 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, if (unlikely(flags > XDP_TX)) return XDP_ABORTED; - ri->tgt_value = __xdp_map_lookup_elem(map, ifindex); + switch (map->map_type) { + case BPF_MAP_TYPE_DEVMAP: + ri->tgt_value = __dev_map_lookup_elem(map, ifindex); + ri->tgt_type = XDP_REDIR_DEV_MAP; + break; + case BPF_MAP_TYPE_DEVMAP_HASH: + ri->tgt_value = __dev_map_hash_lookup_elem(map, ifindex); + ri->tgt_type = XDP_REDIR_DEV_MAP; + break; + case BPF_MAP_TYPE_CPUMAP: + ri->tgt_value = __cpu_map_lookup_elem(map, ifindex); + ri->tgt_type = XDP_REDIR_CPU_MAP; + break; + case BPF_MAP_TYPE_XSKMAP: + ri->tgt_value = __xsk_map_lookup_elem(map, ifindex); + ri->tgt_type = XDP_REDIR_XSK_MAP; + break; + default: + ri->tgt_value = NULL; + } + if (unlikely(!ri->tgt_value)) { /* If the lookup fails we want to clear out the state in the * redirect_info struct completely, so that if an eBPF program * performs multiple lookups, the last one always takes * precedence. */ - WRITE_ONCE(ri->map, NULL); + ri->tgt_type = XDP_REDIR_UNSET; return flags; } - ri->flags = flags; ri->tgt_index = ifindex; WRITE_ONCE(ri->map, map);