diff mbox series

[bpf-next,1/3] libbpf: xsk: use bpf_link

Message ID 20210215154638.4627-2-maciej.fijalkowski@intel.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Introduce bpf_link in libbpf's xsk | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 10 maintainers not CCed: jonathan.lemon@gmail.com kuba@kernel.org hawk@kernel.org yhs@fb.com bjorn@kernel.org davem@davemloft.net john.fastabend@gmail.com kpsingh@kernel.org songliubraving@fb.com kafai@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Fijalkowski, Maciej Feb. 15, 2021, 3:46 p.m. UTC
Currently, if there are multiple xdpsock instances running on a single
interface and in case one of the instances is terminated, the rest of
them are left in an inoperable state due to the fact of unloaded XDP
prog from interface.

To address that, step away from setting bpf prog in favour of bpf_link.
This means that refcounting of BPF resources will be done automatically
by bpf_link itself.

When setting up BPF resources during xsk socket creation, check whether
bpf_link for a given ifindex already exists via set of calls to
bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
and comparing the ifindexes from bpf_link and xsk socket.

If there's no bpf_link yet, create one for a given XDP prog and unload
explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.

If bpf_link is already at a given ifindex and underlying program is not
AF-XDP one, bail out or update the bpf_link's prog given the presence of
XDP_FLAGS_UPDATE_IF_NOEXIST.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 122 insertions(+), 21 deletions(-)

Comments

Toke Høiland-Jørgensen Feb. 15, 2021, 5:07 p.m. UTC | #1
Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:

> Currently, if there are multiple xdpsock instances running on a single
> interface and in case one of the instances is terminated, the rest of
> them are left in an inoperable state due to the fact of unloaded XDP
> prog from interface.
>
> To address that, step away from setting bpf prog in favour of bpf_link.
> This means that refcounting of BPF resources will be done automatically
> by bpf_link itself.
>
> When setting up BPF resources during xsk socket creation, check whether
> bpf_link for a given ifindex already exists via set of calls to
> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> and comparing the ifindexes from bpf_link and xsk socket.

One consideration here is that bpf_link_get_fd_by_id() is a privileged
operation (privileged as in CAP_SYS_ADMIN), so this has the side effect
of making AF_XDP privileged as well. Is that the intention?

Another is that the AF_XDP code is in the process of moving to libxdp
(see in-progress PR [0]), and this approach won't carry over as-is to
that model, because libxdp has to pin the bpf_link fds.

However, in libxdp we can solve the original problem in a different way,
and in fact I already suggested to Magnus that we should do this (see
[1]); so one way forward could be to address it during the merge in
libxdp? It should be possible to address the original issue (two
instances of xdpsock breaking each other when they exit), but
applications will still need to do an explicit unload operation before
exiting (i.e., the automatic detach on bpf_link fd closure will take
more work, and likely require extending the bpf_link kernel support)...

-Toke

[0] https://github.com/xdp-project/xdp-tools/pull/92
[1] https://github.com/xdp-project/xdp-tools/pull/92#discussion_r576204719
Björn Töpel Feb. 15, 2021, 5:38 p.m. UTC | #2
On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote:
> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
> 
>> Currently, if there are multiple xdpsock instances running on a single
>> interface and in case one of the instances is terminated, the rest of
>> them are left in an inoperable state due to the fact of unloaded XDP
>> prog from interface.
>>
>> To address that, step away from setting bpf prog in favour of bpf_link.
>> This means that refcounting of BPF resources will be done automatically
>> by bpf_link itself.
>>
>> When setting up BPF resources during xsk socket creation, check whether
>> bpf_link for a given ifindex already exists via set of calls to
>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
>> and comparing the ifindexes from bpf_link and xsk socket.
> 
> One consideration here is that bpf_link_get_fd_by_id() is a privileged
> operation (privileged as in CAP_SYS_ADMIN), so this has the side effect
> of making AF_XDP privileged as well. Is that the intention?
>

We're already using, e.g., bpf_map_get_fd_by_id() which has that
as well. So we're assuming that for XDP setup already!

> Another is that the AF_XDP code is in the process of moving to libxdp
> (see in-progress PR [0]), and this approach won't carry over as-is to
> that model, because libxdp has to pin the bpf_link fds.
>

I was assuming there were two modes of operations for AF_XDP in libxdp.
One which is with the multi-program support (which AFAIK is why the
pinning is required), and one "like the current libbpf" one. For the
latter Maciej's series would be a good fit, no?

> However, in libxdp we can solve the original problem in a different way,
> and in fact I already suggested to Magnus that we should do this (see
> [1]); so one way forward could be to address it during the merge in
> libxdp? It should be possible to address the original issue (two
> instances of xdpsock breaking each other when they exit), but
> applications will still need to do an explicit unload operation before
> exiting (i.e., the automatic detach on bpf_link fd closure will take
> more work, and likely require extending the bpf_link kernel support)...
>

I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
we're months ahead, then I'd really like to see this in libbpf until the
merge. However, I'll leave that for Magnus/you to decide!

Bottom line; I'd *really* like bpf_link behavior (process scoped) for
AF_XDP sooner than later! ;-)


Thanks for the input!
Björn


> -Toke
> 
> [0] https://github.com/xdp-project/xdp-tools/pull/92
> [1] https://github.com/xdp-project/xdp-tools/pull/92#discussion_r576204719
>
Toke Høiland-Jørgensen Feb. 15, 2021, 7:35 p.m. UTC | #3
Björn Töpel <bjorn.topel@intel.com> writes:

> On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote:
>> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
>> 
>>> Currently, if there are multiple xdpsock instances running on a single
>>> interface and in case one of the instances is terminated, the rest of
>>> them are left in an inoperable state due to the fact of unloaded XDP
>>> prog from interface.
>>>
>>> To address that, step away from setting bpf prog in favour of bpf_link.
>>> This means that refcounting of BPF resources will be done automatically
>>> by bpf_link itself.
>>>
>>> When setting up BPF resources during xsk socket creation, check whether
>>> bpf_link for a given ifindex already exists via set of calls to
>>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
>>> and comparing the ifindexes from bpf_link and xsk socket.
>> 
>> One consideration here is that bpf_link_get_fd_by_id() is a privileged
>> operation (privileged as in CAP_SYS_ADMIN), so this has the side effect
>> of making AF_XDP privileged as well. Is that the intention?
>>
>
> We're already using, e.g., bpf_map_get_fd_by_id() which has that
> as well. So we're assuming that for XDP setup already!

Ah, right, didn't realise that one is CAP_SYS_ADMIN as well; I
remembered this as being specific to the bpf_link operation.

>> Another is that the AF_XDP code is in the process of moving to libxdp
>> (see in-progress PR [0]), and this approach won't carry over as-is to
>> that model, because libxdp has to pin the bpf_link fds.
>>
>
> I was assuming there were two modes of operations for AF_XDP in libxdp.
> One which is with the multi-program support (which AFAIK is why the
> pinning is required), and one "like the current libbpf" one. For the
> latter Maciej's series would be a good fit, no?

We haven't added an explicit mode switch for now; libxdp will fall back
to regular interface attach if the kernel doesn't support the needed
features for multi-attach, but if it's possible to just have libxdp
transparently do the right thing I'd much prefer that. So we're still
exploring that (part of which is that Magnus has promised to run some
performance tests to see if there's a difference).

However, even if there's an explicit mode switch I'd like to avoid
different *semantics* between the two modes if possible, to keep the two
as compatible as possible. And since we can't currently do "auto-detach
on bpf_link fd close" when using multi-prog, introducing this now would
lead to just such a semantic difference. So my preference would be to do
it differently... :)

>> However, in libxdp we can solve the original problem in a different way,
>> and in fact I already suggested to Magnus that we should do this (see
>> [1]); so one way forward could be to address it during the merge in
>> libxdp? It should be possible to address the original issue (two
>> instances of xdpsock breaking each other when they exit), but
>> applications will still need to do an explicit unload operation before
>> exiting (i.e., the automatic detach on bpf_link fd closure will take
>> more work, and likely require extending the bpf_link kernel support)...
>>
>
> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
> we're months ahead, then I'd really like to see this in libbpf until the
> merge. However, I'll leave that for Magnus/you to decide!

Well, as far as libxdp support goes, the PR I linked is pretty close to
being mergeable. One of the few outstanding issues is whether we should
solve just this issue before merging, actually :)

Not sure exactly which timeframe Andrii is envisioning for libbpf 1.0,
but last I heard he'll announce something next week.

> Bottom line; I'd *really* like bpf_link behavior (process scoped) for
> AF_XDP sooner than later! ;-)

Totally agree that we should solve the multi-process coexistence
problem! And as I said, I think we can do so in libxdp by using the same
synchronisation mechanism we use for setting up the multi-prog
dispatcher. So it doesn't *have* to hold things up :)

-Toke
John Fastabend Feb. 15, 2021, 8:22 p.m. UTC | #4
Björn Töpel wrote:
> On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote:
> > Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
> > 
> >> Currently, if there are multiple xdpsock instances running on a single
> >> interface and in case one of the instances is terminated, the rest of
> >> them are left in an inoperable state due to the fact of unloaded XDP
> >> prog from interface.

I'm a bit confused by the above. This is only the case if the instance
that terminated is the one that loaded the XDP program and it also didn't
pin the program correct? If so lets make the commit message a bit more
clear about the exact case we are solving.

> >>
> >> To address that, step away from setting bpf prog in favour of bpf_link.
> >> This means that refcounting of BPF resources will be done automatically
> >> by bpf_link itself.

+1

> >>
> >> When setting up BPF resources during xsk socket creation, check whether
> >> bpf_link for a given ifindex already exists via set of calls to
> >> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> >> and comparing the ifindexes from bpf_link and xsk socket.
> > 
> > One consideration here is that bpf_link_get_fd_by_id() is a privileged
> > operation (privileged as in CAP_SYS_ADMIN), so this has the side effect
> > of making AF_XDP privileged as well. Is that the intention?
> >
> 
> We're already using, e.g., bpf_map_get_fd_by_id() which has that
> as well. So we're assuming that for XDP setup already!
> 
> > Another is that the AF_XDP code is in the process of moving to libxdp
> > (see in-progress PR [0]), and this approach won't carry over as-is to
> > that model, because libxdp has to pin the bpf_link fds.
> >
> 
> I was assuming there were two modes of operations for AF_XDP in libxdp.
> One which is with the multi-program support (which AFAIK is why the
> pinning is required), and one "like the current libbpf" one. For the
> latter Maciej's series would be a good fit, no?
> 
> > However, in libxdp we can solve the original problem in a different way,
> > and in fact I already suggested to Magnus that we should do this (see
> > [1]); so one way forward could be to address it during the merge in
> > libxdp? It should be possible to address the original issue (two
> > instances of xdpsock breaking each other when they exit), but
> > applications will still need to do an explicit unload operation before
> > exiting (i.e., the automatic detach on bpf_link fd closure will take
> > more work, and likely require extending the bpf_link kernel support)...
> >
> 
> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
> we're months ahead, then I'd really like to see this in libbpf until the
> merge. However, I'll leave that for Magnus/you to decide!

Did I miss some thread? What does this mean libbpf 1.0/libxdp merge? From
my side libbpf should support the basic operations: load, attach, pin,
and link for all my BPF objects. I view libxdp as providing 'extra'
goodness on top of that. Everyone agree?

> 
> Bottom line; I'd *really* like bpf_link behavior (process scoped) for
> AF_XDP sooner than later! ;-)

Because I use libbpf as my base management for BPF objects I want it
to support the basic ops for all objects so link ops should land there.

> 
> 
> Thanks for the input!
> Björn
> 
> 
> > -Toke
> > 
> > [0] https://github.com/xdp-project/xdp-tools/pull/92
> > [1] https://github.com/xdp-project/xdp-tools/pull/92#discussion_r576204719
> >
John Fastabend Feb. 15, 2021, 8:49 p.m. UTC | #5
Maciej Fijalkowski wrote:
> Currently, if there are multiple xdpsock instances running on a single
> interface and in case one of the instances is terminated, the rest of
> them are left in an inoperable state due to the fact of unloaded XDP
> prog from interface.
> 
> To address that, step away from setting bpf prog in favour of bpf_link.
> This means that refcounting of BPF resources will be done automatically
> by bpf_link itself.
> 
> When setting up BPF resources during xsk socket creation, check whether
> bpf_link for a given ifindex already exists via set of calls to
> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> and comparing the ifindexes from bpf_link and xsk socket.
> 
> If there's no bpf_link yet, create one for a given XDP prog and unload
> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.
> 
> If bpf_link is already at a given ifindex and underlying program is not
> AF-XDP one, bail out or update the bpf_link's prog given the presence of
> XDP_FLAGS_UPDATE_IF_NOEXIST.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>  tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 122 insertions(+), 21 deletions(-)

[...]

> +static int xsk_create_bpf_link(struct xsk_socket *xsk)
> +{
> +	/* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags
> +	 * might have set XDP_FLAGS_UPDATE_IF_NOEXIST
> +	 */
> +	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
> +			    .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES));
> +	struct xsk_ctx *ctx = xsk->ctx;
> +	__u32 prog_id;
> +	int link_fd;
> +	int err;
> +
> +	/* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any,
> +	 * so that bpf_link can be attached
> +	 */
> +	if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) {
> +		err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags);
> +		if (err) {
> +			pr_warn("getting XDP prog id failed\n");
> +			return err;
> +		}
> +		if (prog_id) {
> +			err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0);
> +			if (err < 0) {
> +				pr_warn("detaching XDP prog failed\n");
> +				return err;
> +			}
> +		}
>  	}
>  
> -	ctx->prog_fd = prog_fd;
> +	link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts);
> +	if (link_fd < 0) {
> +		pr_warn("bpf_link_create failed: %s\n", strerror(errno));
> +		return link_fd;
> +	}
> +

This can leave the system in a bad state where it unloaded the XDP program
above, but then failed to create the link. So we should somehow fix that
if possible or at minimum put a note somewhere so users can't claim they
shouldn't know this.

Also related, its not good for real systems to let XDP program go missing
for some period of time. I didn't check but we should make
XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already.

> +	ctx->link_fd = link_fd;
>  	return 0;
>  }
>  

[...]

> +static int xsk_link_lookup(struct xsk_ctx *ctx, __u32 *prog_id)
> +{
> +	__u32 link_len = sizeof(struct bpf_link_info);
> +	struct bpf_link_info link_info;
> +	__u32 id = 0;
> +	int err;
> +	int fd;
> +
> +	while (true) {
> +		err = bpf_link_get_next_id(id, &id);
> +		if (err) {
> +			if (errno == ENOENT)
> +				break;
> +			pr_warn("can't get next link: %s\n", strerror(errno));
> +			break;
> +		}
> +
> +		fd = bpf_link_get_fd_by_id(id);
> +		if (fd < 0) {
> +			if (errno == ENOENT)
> +				continue;
> +			pr_warn("can't get link by id (%u): %s\n", id, strerror(errno));
> +			break;
> +		}
> +
> +		memset(&link_info, 0, link_len);
> +		err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len);
> +		if (err) {
> +			pr_warn("can't get link info: %s\n", strerror(errno));
> +			close(fd);
> +			break;
> +		}
> +		if (link_info.xdp.ifindex == ctx->ifindex) {
> +			ctx->link_fd = fd;
> +			*prog_id = link_info.prog_id;
> +			break;
> +		}
> +		close(fd);
> +	}
> +
> +	return errno == ENOENT ? 0 : err;

But, err wont be set in fd < 0 case? I guess we don't want to return 0 if
bpf_link_get_fd_by_id fails. Although I really don't like the construct
here that much. I think just `return err` and ensuring err is set correctly
would be more clear. At least the fd error case needs to be handled
though.

> +}
> +
Toke Høiland-Jørgensen Feb. 15, 2021, 9:38 p.m. UTC | #6
John Fastabend <john.fastabend@gmail.com> writes:

>> > However, in libxdp we can solve the original problem in a different way,
>> > and in fact I already suggested to Magnus that we should do this (see
>> > [1]); so one way forward could be to address it during the merge in
>> > libxdp? It should be possible to address the original issue (two
>> > instances of xdpsock breaking each other when they exit), but
>> > applications will still need to do an explicit unload operation before
>> > exiting (i.e., the automatic detach on bpf_link fd closure will take
>> > more work, and likely require extending the bpf_link kernel support)...
>> >
>> 
>> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
>> we're months ahead, then I'd really like to see this in libbpf until the
>> merge. However, I'll leave that for Magnus/you to decide!
>
> Did I miss some thread? What does this mean libbpf 1.0/libxdp merge?

The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff to
libxdp (so the socket stuff in xsk.h). We're adding the existing code
wholesale, and keeping API compatibility during the move, so all that's
needed is adding -lxdp when compiling. And obviously the existing libbpf
code isn't going anywhere until such a time as there's a general
backwards compatibility-breaking deprecation in libbpf (which I believe
Andrii is planning to do in an upcoming and as-of-yet unannounced v1.0
release).

While integrating the XSK code into libxdp we're trying to make it
compatible with the rest of the library (i.e., multi-prog). Hence my
preference to avoid introducing something that makes this harder :)

-Toke
John Fastabend Feb. 16, 2021, 12:18 a.m. UTC | #7
Toke Høiland-Jørgensen wrote:
> John Fastabend <john.fastabend@gmail.com> writes:
> 
> >> > However, in libxdp we can solve the original problem in a different way,
> >> > and in fact I already suggested to Magnus that we should do this (see
> >> > [1]); so one way forward could be to address it during the merge in
> >> > libxdp? It should be possible to address the original issue (two
> >> > instances of xdpsock breaking each other when they exit), but
> >> > applications will still need to do an explicit unload operation before
> >> > exiting (i.e., the automatic detach on bpf_link fd closure will take
> >> > more work, and likely require extending the bpf_link kernel support)...
> >> >
> >> 
> >> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
> >> we're months ahead, then I'd really like to see this in libbpf until the
> >> merge. However, I'll leave that for Magnus/you to decide!
> >
> > Did I miss some thread? What does this mean libbpf 1.0/libxdp merge?
> 
> The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff to
> libxdp (so the socket stuff in xsk.h). We're adding the existing code
> wholesale, and keeping API compatibility during the move, so all that's
> needed is adding -lxdp when compiling. And obviously the existing libbpf
> code isn't going anywhere until such a time as there's a general
> backwards compatibility-breaking deprecation in libbpf (which I believe
> Andrii is planning to do in an upcoming and as-of-yet unannounced v1.0
> release).

OK, I would like to keep the basic XDP pieces in libbpf though. For example
bpf_program__attach_xdp(). This way we don't have one lib to attach
everything, but XDP.

> 
> While integrating the XSK code into libxdp we're trying to make it
> compatible with the rest of the library (i.e., multi-prog). Hence my
> preference to avoid introducing something that makes this harder :)
> 
> -Toke
> 

OK that makes sense to me thanks. But, I'm missing something (maybe its
obvious to everyone else?).

When you load an XDP program you should get a reference to it. And then
XDP program should never be unloaded until that id is removed right? It
may or may not have an xsk map. Why does adding/removing programs from
an associated map have any impact on the XDP program? That seems like
the buggy part to me. No other map behaves this way as far as I can
tell. Now if the program with the XDP reference closes without pinning
the map or otherwise doing something with it, sure the map gets destroyed
and any xsk sockets are lost.

Thanks!
John
Fijalkowski, Maciej Feb. 16, 2021, 2:01 a.m. UTC | #8
On Mon, Feb 15, 2021 at 08:35:29PM +0100, Toke Høiland-Jørgensen wrote:
> Björn Töpel <bjorn.topel@intel.com> writes:
> 
> > On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote:
> >> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
> >> 
> >>> Currently, if there are multiple xdpsock instances running on a single
> >>> interface and in case one of the instances is terminated, the rest of
> >>> them are left in an inoperable state due to the fact of unloaded XDP
> >>> prog from interface.
> >>>
> >>> To address that, step away from setting bpf prog in favour of bpf_link.
> >>> This means that refcounting of BPF resources will be done automatically
> >>> by bpf_link itself.
> >>>
> >>> When setting up BPF resources during xsk socket creation, check whether
> >>> bpf_link for a given ifindex already exists via set of calls to
> >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> >>> and comparing the ifindexes from bpf_link and xsk socket.
> >> 
> >> One consideration here is that bpf_link_get_fd_by_id() is a privileged
> >> operation (privileged as in CAP_SYS_ADMIN), so this has the side effect
> >> of making AF_XDP privileged as well. Is that the intention?
> >>
> >
> > We're already using, e.g., bpf_map_get_fd_by_id() which has that
> > as well. So we're assuming that for XDP setup already!
> 
> Ah, right, didn't realise that one is CAP_SYS_ADMIN as well; I
> remembered this as being specific to the bpf_link operation.
> 
> >> Another is that the AF_XDP code is in the process of moving to libxdp
> >> (see in-progress PR [0]), and this approach won't carry over as-is to
> >> that model, because libxdp has to pin the bpf_link fds.
> >>
> >
> > I was assuming there were two modes of operations for AF_XDP in libxdp.
> > One which is with the multi-program support (which AFAIK is why the
> > pinning is required), and one "like the current libbpf" one. For the
> > latter Maciej's series would be a good fit, no?
> 
> We haven't added an explicit mode switch for now; libxdp will fall back
> to regular interface attach if the kernel doesn't support the needed
> features for multi-attach, but if it's possible to just have libxdp
> transparently do the right thing I'd much prefer that. So we're still
> exploring that (part of which is that Magnus has promised to run some
> performance tests to see if there's a difference).
> 
> However, even if there's an explicit mode switch I'd like to avoid
> different *semantics* between the two modes if possible, to keep the two
> as compatible as possible. And since we can't currently do "auto-detach
> on bpf_link fd close" when using multi-prog, introducing this now would
> lead to just such a semantic difference. So my preference would be to do
> it differently... :)
> 
> >> However, in libxdp we can solve the original problem in a different way,
> >> and in fact I already suggested to Magnus that we should do this (see
> >> [1]); so one way forward could be to address it during the merge in
> >> libxdp? It should be possible to address the original issue (two
> >> instances of xdpsock breaking each other when they exit), but
> >> applications will still need to do an explicit unload operation before
> >> exiting (i.e., the automatic detach on bpf_link fd closure will take
> >> more work, and likely require extending the bpf_link kernel support)...
> >>
> >
> > I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
> > we're months ahead, then I'd really like to see this in libbpf until the
> > merge. However, I'll leave that for Magnus/you to decide!

WDYM by libbpf 1.0/libxdp merge? I glanced through thread and I saw that
John was also not aware of that. Not sure where it was discussed?

If you're saying 'merge', then is libxdp going to be a part of kernel or
as an AF-XDP related guy I would be forced to include yet another
repository in the BPF developer toolchain? :<

> 
> Well, as far as libxdp support goes, the PR I linked is pretty close to
> being mergeable. One of the few outstanding issues is whether we should
> solve just this issue before merging, actually :)
> 
> Not sure exactly which timeframe Andrii is envisioning for libbpf 1.0,
> but last I heard he'll announce something next week.
> 
> > Bottom line; I'd *really* like bpf_link behavior (process scoped) for
> > AF_XDP sooner than later! ;-)
> 
> Totally agree that we should solve the multi-process coexistence
> problem! And as I said, I think we can do so in libxdp by using the same
> synchronisation mechanism we use for setting up the multi-prog
> dispatcher. So it doesn't *have* to hold things up :)

Am I reading this right or you're trying to reject the fix of the long
standing issue due to a PR that is not ready yet on a standalone
project/library? :P

Once libxdp is the standard way of playing with AF-XDP and there are
actual users of that, then I'm happy to work/help on that issue.

Spawning a few xdpsock instances on the same interface has been a
standard/easiest way of measuring the scalability of AF-XDP ZC
implementations. This has been a real PITA for quite a long time. So, I
second Bjorn's statement - the sooner we have this fixed, the better.

Thanks! I hope I haven't sounded harsh, not my intent at all,
Maciej

> 
> -Toke
>
Fijalkowski, Maciej Feb. 16, 2021, 2:10 a.m. UTC | #9
On Mon, Feb 15, 2021 at 12:22:36PM -0800, John Fastabend wrote:
> Björn Töpel wrote:
> > On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote:
> > > Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
> > > 
> > >> Currently, if there are multiple xdpsock instances running on a single
> > >> interface and in case one of the instances is terminated, the rest of
> > >> them are left in an inoperable state due to the fact of unloaded XDP
> > >> prog from interface.
> 
> I'm a bit confused by the above. This is only the case if the instance
> that terminated is the one that loaded the XDP program and it also didn't
> pin the program correct? If so lets make the commit message a bit more
> clear about the exact case we are solving.

I can include the following from the cover letter:

<quote>
$ sudo ./xdpsock -i ens801f0 -t -q 10 // load xdp prog and xskmap and
                                      // add entry to xskmap at idx 10

$ sudo ./xdpsock -i ens801f0 -t -q 11 // add entry to xskmap at idx 11

terminate one of the processes and another one is unable to work due to
the fact that the XDP prog was unloaded from interface.
</quote>

We don't do the prog pinning and it doesn't really matter which of the
instances you terminate.

> 
> > >>
> > >> To address that, step away from setting bpf prog in favour of bpf_link.
> > >> This means that refcounting of BPF resources will be done automatically
> > >> by bpf_link itself.
> 
> +1
> 
> > >>
> > >> When setting up BPF resources during xsk socket creation, check whether
> > >> bpf_link for a given ifindex already exists via set of calls to
> > >> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> > >> and comparing the ifindexes from bpf_link and xsk socket.
> > > 
> > > One consideration here is that bpf_link_get_fd_by_id() is a privileged
> > > operation (privileged as in CAP_SYS_ADMIN), so this has the side effect
> > > of making AF_XDP privileged as well. Is that the intention?
> > >
> > 
> > We're already using, e.g., bpf_map_get_fd_by_id() which has that
> > as well. So we're assuming that for XDP setup already!
> > 
> > > Another is that the AF_XDP code is in the process of moving to libxdp
> > > (see in-progress PR [0]), and this approach won't carry over as-is to
> > > that model, because libxdp has to pin the bpf_link fds.
> > >
> > 
> > I was assuming there were two modes of operations for AF_XDP in libxdp.
> > One which is with the multi-program support (which AFAIK is why the
> > pinning is required), and one "like the current libbpf" one. For the
> > latter Maciej's series would be a good fit, no?
> > 
> > > However, in libxdp we can solve the original problem in a different way,
> > > and in fact I already suggested to Magnus that we should do this (see
> > > [1]); so one way forward could be to address it during the merge in
> > > libxdp? It should be possible to address the original issue (two
> > > instances of xdpsock breaking each other when they exit), but
> > > applications will still need to do an explicit unload operation before
> > > exiting (i.e., the automatic detach on bpf_link fd closure will take
> > > more work, and likely require extending the bpf_link kernel support)...
> > >
> > 
> > I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
> > we're months ahead, then I'd really like to see this in libbpf until the
> > merge. However, I'll leave that for Magnus/you to decide!
> 
> Did I miss some thread? What does this mean libbpf 1.0/libxdp merge? From
> my side libbpf should support the basic operations: load, attach, pin,
> and link for all my BPF objects. I view libxdp as providing 'extra'
> goodness on top of that. Everyone agree?
> 
> > 
> > Bottom line; I'd *really* like bpf_link behavior (process scoped) for
> > AF_XDP sooner than later! ;-)
> 
> Because I use libbpf as my base management for BPF objects I want it
> to support the basic ops for all objects so link ops should land there.
> 
> > 
> > 
> > Thanks for the input!
> > Björn
> > 
> > 
> > > -Toke
> > > 
> > > [0] https://github.com/xdp-project/xdp-tools/pull/92
> > > [1] https://github.com/xdp-project/xdp-tools/pull/92#discussion_r576204719
> > > 
> 
>
Fijalkowski, Maciej Feb. 16, 2021, 2:23 a.m. UTC | #10
On Mon, Feb 15, 2021 at 04:18:28PM -0800, John Fastabend wrote:
> Toke Høiland-Jørgensen wrote:
> > John Fastabend <john.fastabend@gmail.com> writes:
> > 
> > >> > However, in libxdp we can solve the original problem in a different way,
> > >> > and in fact I already suggested to Magnus that we should do this (see
> > >> > [1]); so one way forward could be to address it during the merge in
> > >> > libxdp? It should be possible to address the original issue (two
> > >> > instances of xdpsock breaking each other when they exit), but
> > >> > applications will still need to do an explicit unload operation before
> > >> > exiting (i.e., the automatic detach on bpf_link fd closure will take
> > >> > more work, and likely require extending the bpf_link kernel support)...
> > >> >
> > >> 
> > >> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
> > >> we're months ahead, then I'd really like to see this in libbpf until the
> > >> merge. However, I'll leave that for Magnus/you to decide!
> > >
> > > Did I miss some thread? What does this mean libbpf 1.0/libxdp merge?
> > 
> > The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff to
> > libxdp (so the socket stuff in xsk.h). We're adding the existing code
> > wholesale, and keeping API compatibility during the move, so all that's
> > needed is adding -lxdp when compiling. And obviously the existing libbpf
> > code isn't going anywhere until such a time as there's a general
> > backwards compatibility-breaking deprecation in libbpf (which I believe
> > Andrii is planning to do in an upcoming and as-of-yet unannounced v1.0
> > release).

Once again, is libxdp going to land in th kernel? Still not clear to me.

> 
> OK, I would like to keep the basic XDP pieces in libbpf though. For example
> bpf_program__attach_xdp(). This way we don't have one lib to attach
> everything, but XDP.
> 
> > 
> > While integrating the XSK code into libxdp we're trying to make it
> > compatible with the rest of the library (i.e., multi-prog). Hence my
> > preference to avoid introducing something that makes this harder :)

Do you see the issue with solving it one way in libbpf currently given
that we can't really tell when the merge of libs is going to happen and
the other way within the libxdp itself?

> > 
> > -Toke
> > 
> 
> OK that makes sense to me thanks. But, I'm missing something (maybe its
> obvious to everyone else?).
> 
> When you load an XDP program you should get a reference to it. And then
> XDP program should never be unloaded until that id is removed right? It

WDYM by 'that id is removed' ?

> may or may not have an xsk map. Why does adding/removing programs from

you meant adding/removing 'sockets'?

> an associated map have any impact on the XDP program? That seems like
> the buggy part to me. No other map behaves this way as far as I can
> tell. Now if the program with the XDP reference closes without pinning
> the map or otherwise doing something with it, sure the map gets destroyed
> and any xsk sockets are lost.

It's the XDP program that is getting lost, not XSKMAP. XDP prog presence
determines whether your driver works in ZC or not. If XDP prog is gone
then xdpsock bails out (or any other AF-XDP app won't be able to work).

> 
> Thanks!
> John
Fijalkowski, Maciej Feb. 16, 2021, 2:38 a.m. UTC | #11
On Mon, Feb 15, 2021 at 12:49:27PM -0800, John Fastabend wrote:
> Maciej Fijalkowski wrote:
> > Currently, if there are multiple xdpsock instances running on a single
> > interface and in case one of the instances is terminated, the rest of
> > them are left in an inoperable state due to the fact of unloaded XDP
> > prog from interface.
> > 
> > To address that, step away from setting bpf prog in favour of bpf_link.
> > This means that refcounting of BPF resources will be done automatically
> > by bpf_link itself.
> > 
> > When setting up BPF resources during xsk socket creation, check whether
> > bpf_link for a given ifindex already exists via set of calls to
> > bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> > and comparing the ifindexes from bpf_link and xsk socket.
> > 
> > If there's no bpf_link yet, create one for a given XDP prog and unload
> > explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.
> > 
> > If bpf_link is already at a given ifindex and underlying program is not
> > AF-XDP one, bail out or update the bpf_link's prog given the presence of
> > XDP_FLAGS_UPDATE_IF_NOEXIST.
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >  tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 122 insertions(+), 21 deletions(-)
> 
> [...]
> 
> > +static int xsk_create_bpf_link(struct xsk_socket *xsk)
> > +{
> > +	/* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags
> > +	 * might have set XDP_FLAGS_UPDATE_IF_NOEXIST
> > +	 */
> > +	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
> > +			    .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES));
> > +	struct xsk_ctx *ctx = xsk->ctx;
> > +	__u32 prog_id;
> > +	int link_fd;
> > +	int err;
> > +
> > +	/* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any,
> > +	 * so that bpf_link can be attached
> > +	 */
> > +	if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) {
> > +		err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags);
> > +		if (err) {
> > +			pr_warn("getting XDP prog id failed\n");
> > +			return err;
> > +		}
> > +		if (prog_id) {
> > +			err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0);
> > +			if (err < 0) {
> > +				pr_warn("detaching XDP prog failed\n");
> > +				return err;
> > +			}
> > +		}
> >  	}
> >  
> > -	ctx->prog_fd = prog_fd;
> > +	link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts);
> > +	if (link_fd < 0) {
> > +		pr_warn("bpf_link_create failed: %s\n", strerror(errno));
> > +		return link_fd;
> > +	}
> > +
> 
> This can leave the system in a bad state where it unloaded the XDP program
> above, but then failed to create the link. So we should somehow fix that
> if possible or at minimum put a note somewhere so users can't claim they
> shouldn't know this.
> 
> Also related, its not good for real systems to let XDP program go missing
> for some period of time. I didn't check but we should make
> XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already.

Old way of attaching prog is mutual exclusive with bpf_link, right?
What I'm saying is in order to use one of the two, you need to wipe out
the current one in favour of the second that you would like to load.

> 
> > +	ctx->link_fd = link_fd;
> >  	return 0;
> >  }
> >  
> 
> [...]
> 
> > +static int xsk_link_lookup(struct xsk_ctx *ctx, __u32 *prog_id)
> > +{
> > +	__u32 link_len = sizeof(struct bpf_link_info);
> > +	struct bpf_link_info link_info;
> > +	__u32 id = 0;
> > +	int err;
> > +	int fd;
> > +
> > +	while (true) {
> > +		err = bpf_link_get_next_id(id, &id);
> > +		if (err) {
> > +			if (errno == ENOENT)
> > +				break;
> > +			pr_warn("can't get next link: %s\n", strerror(errno));
> > +			break;
> > +		}
> > +
> > +		fd = bpf_link_get_fd_by_id(id);
> > +		if (fd < 0) {
> > +			if (errno == ENOENT)
> > +				continue;
> > +			pr_warn("can't get link by id (%u): %s\n", id, strerror(errno));
> > +			break;
> > +		}
> > +
> > +		memset(&link_info, 0, link_len);
> > +		err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len);
> > +		if (err) {
> > +			pr_warn("can't get link info: %s\n", strerror(errno));
> > +			close(fd);
> > +			break;
> > +		}
> > +		if (link_info.xdp.ifindex == ctx->ifindex) {
> > +			ctx->link_fd = fd;
> > +			*prog_id = link_info.prog_id;
> > +			break;
> > +		}
> > +		close(fd);
> > +	}
> > +
> > +	return errno == ENOENT ? 0 : err;
> 
> But, err wont be set in fd < 0 case? I guess we don't want to return 0 if
> bpf_link_get_fd_by_id fails.

Good catch!

> Although I really don't like the construct
> here that much. I think just `return err` and ensuring err is set correctly
> would be more clear. At least the fd error case needs to be handled
> though.

FWIW, this was inspired by tools/bpf/bpftool/link.c:do_show()

> 
> > +}
> > +
Björn Töpel Feb. 16, 2021, 9:15 a.m. UTC | #12
On 2021-02-16 03:01, Maciej Fijalkowski wrote:
> On Mon, Feb 15, 2021 at 08:35:29PM +0100, Toke Høiland-Jørgensen wrote:
>> Björn Töpel <bjorn.topel@intel.com> writes:

[...]

>>>
>>> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
>>> we're months ahead, then I'd really like to see this in libbpf until the
>>> merge. However, I'll leave that for Magnus/you to decide!
> 
> WDYM by libbpf 1.0/libxdp merge? I glanced through thread and I saw that
> John was also not aware of that. Not sure where it was discussed?
>

Oh, right. Yeah, we've had some offlist discussions about moving the
AF_XDP functionality from libbpf to libxdp in the libbpf 1.0 timeframe.

> If you're saying 'merge', then is libxdp going to be a part of kernel or
> as an AF-XDP related guy I would be forced to include yet another
> repository in the BPF developer toolchain? :<
>

The AF_XDP functionality of libbpf will be part of libxdp, which is not
in the kernel tree. libxdp depend on libbpf, which includes the core BPF
functionality. For AF_XDP this is a good thing IMO. libxdp includes more
higher lever abstractions than libbpf, which is more aligned to AF_XDP.

Yes, that would mean that you would get another dependency for AF_XDP,
and one that is not in the kernel tree. For most *users* this is not a
problem, in fact it might be easier to consume and to contribute for
most users. We can't optimize just for the kernel hackers. ;-)


Björn

[...]
Björn Töpel Feb. 16, 2021, 9:20 a.m. UTC | #13
On 2021-02-15 21:49, John Fastabend wrote:
> Maciej Fijalkowski wrote:
>> Currently, if there are multiple xdpsock instances running on a single
>> interface and in case one of the instances is terminated, the rest of
>> them are left in an inoperable state due to the fact of unloaded XDP
>> prog from interface.
>>
>> To address that, step away from setting bpf prog in favour of bpf_link.
>> This means that refcounting of BPF resources will be done automatically
>> by bpf_link itself.
>>
>> When setting up BPF resources during xsk socket creation, check whether
>> bpf_link for a given ifindex already exists via set of calls to
>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
>> and comparing the ifindexes from bpf_link and xsk socket.
>>
>> If there's no bpf_link yet, create one for a given XDP prog and unload
>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.
>>
>> If bpf_link is already at a given ifindex and underlying program is not
>> AF-XDP one, bail out or update the bpf_link's prog given the presence of
>> XDP_FLAGS_UPDATE_IF_NOEXIST.
>>
>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
>> ---
>>   tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++-------
>>   1 file changed, 122 insertions(+), 21 deletions(-)
> 
> [...]
> 
>> +static int xsk_create_bpf_link(struct xsk_socket *xsk)
>> +{
>> +	/* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags
>> +	 * might have set XDP_FLAGS_UPDATE_IF_NOEXIST
>> +	 */
>> +	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
>> +			    .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES));
>> +	struct xsk_ctx *ctx = xsk->ctx;
>> +	__u32 prog_id;
>> +	int link_fd;
>> +	int err;
>> +
>> +	/* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any,
>> +	 * so that bpf_link can be attached
>> +	 */
>> +	if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) {
>> +		err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags);
>> +		if (err) {
>> +			pr_warn("getting XDP prog id failed\n");
>> +			return err;
>> +		}
>> +		if (prog_id) {
>> +			err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0);
>> +			if (err < 0) {
>> +				pr_warn("detaching XDP prog failed\n");
>> +				return err;
>> +			}
>> +		}
>>   	}
>>   
>> -	ctx->prog_fd = prog_fd;
>> +	link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts);
>> +	if (link_fd < 0) {
>> +		pr_warn("bpf_link_create failed: %s\n", strerror(errno));
>> +		return link_fd;
>> +	}
>> +
> 
> This can leave the system in a bad state where it unloaded the XDP program
> above, but then failed to create the link. So we should somehow fix that
> if possible or at minimum put a note somewhere so users can't claim they
> shouldn't know this.
> 
> Also related, its not good for real systems to let XDP program go missing
> for some period of time. I didn't check but we should make
> XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already.
>

This is the default for XDP sockets library. The
"bpf_set_link_xdp_fd(...-1)" way is only when a user sets it explicitly.
One could maybe argue that the "force remove" would be out of scope for
AF_XDP; Meaning that if an XDP program is running, attached via netlink,
the AF_XDP library simply cannot remove it. The user would need to rely
on some other mechanism.


Björn


[...]
Björn Töpel Feb. 16, 2021, 9:23 a.m. UTC | #14
On 2021-02-16 03:23, Maciej Fijalkowski wrote:
> On Mon, Feb 15, 2021 at 04:18:28PM -0800, John Fastabend wrote:

[...]

> 
> Once again, is libxdp going to land in th kernel? Still not clear to me.
>

No, libxdp does not live in the kernel tree.


Björn

[...]
Toke Høiland-Jørgensen Feb. 16, 2021, 10:27 a.m. UTC | #15
Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:

> On Mon, Feb 15, 2021 at 08:35:29PM +0100, Toke Høiland-Jørgensen wrote:
>> Björn Töpel <bjorn.topel@intel.com> writes:
>> 
>> > On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote:
>> >> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
>> >> 
>> >>> Currently, if there are multiple xdpsock instances running on a single
>> >>> interface and in case one of the instances is terminated, the rest of
>> >>> them are left in an inoperable state due to the fact of unloaded XDP
>> >>> prog from interface.
>> >>>
>> >>> To address that, step away from setting bpf prog in favour of bpf_link.
>> >>> This means that refcounting of BPF resources will be done automatically
>> >>> by bpf_link itself.
>> >>>
>> >>> When setting up BPF resources during xsk socket creation, check whether
>> >>> bpf_link for a given ifindex already exists via set of calls to
>> >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
>> >>> and comparing the ifindexes from bpf_link and xsk socket.
>> >> 
>> >> One consideration here is that bpf_link_get_fd_by_id() is a privileged
>> >> operation (privileged as in CAP_SYS_ADMIN), so this has the side effect
>> >> of making AF_XDP privileged as well. Is that the intention?
>> >>
>> >
>> > We're already using, e.g., bpf_map_get_fd_by_id() which has that
>> > as well. So we're assuming that for XDP setup already!
>> 
>> Ah, right, didn't realise that one is CAP_SYS_ADMIN as well; I
>> remembered this as being specific to the bpf_link operation.
>> 
>> >> Another is that the AF_XDP code is in the process of moving to libxdp
>> >> (see in-progress PR [0]), and this approach won't carry over as-is to
>> >> that model, because libxdp has to pin the bpf_link fds.
>> >>
>> >
>> > I was assuming there were two modes of operations for AF_XDP in libxdp.
>> > One which is with the multi-program support (which AFAIK is why the
>> > pinning is required), and one "like the current libbpf" one. For the
>> > latter Maciej's series would be a good fit, no?
>> 
>> We haven't added an explicit mode switch for now; libxdp will fall back
>> to regular interface attach if the kernel doesn't support the needed
>> features for multi-attach, but if it's possible to just have libxdp
>> transparently do the right thing I'd much prefer that. So we're still
>> exploring that (part of which is that Magnus has promised to run some
>> performance tests to see if there's a difference).
>> 
>> However, even if there's an explicit mode switch I'd like to avoid
>> different *semantics* between the two modes if possible, to keep the two
>> as compatible as possible. And since we can't currently do "auto-detach
>> on bpf_link fd close" when using multi-prog, introducing this now would
>> lead to just such a semantic difference. So my preference would be to do
>> it differently... :)
>> 
>> >> However, in libxdp we can solve the original problem in a different way,
>> >> and in fact I already suggested to Magnus that we should do this (see
>> >> [1]); so one way forward could be to address it during the merge in
>> >> libxdp? It should be possible to address the original issue (two
>> >> instances of xdpsock breaking each other when they exit), but
>> >> applications will still need to do an explicit unload operation before
>> >> exiting (i.e., the automatic detach on bpf_link fd closure will take
>> >> more work, and likely require extending the bpf_link kernel support)...
>> >>
>> >
>> > I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
>> > we're months ahead, then I'd really like to see this in libbpf until the
>> > merge. However, I'll leave that for Magnus/you to decide!
>
> WDYM by libbpf 1.0/libxdp merge? I glanced through thread and I saw that
> John was also not aware of that. Not sure where it was discussed?
>
> If you're saying 'merge', then is libxdp going to be a part of kernel or
> as an AF-XDP related guy I would be forced to include yet another
> repository in the BPF developer toolchain? :<

As I replied to John, we're trying to do this in as compatible and
painless a way as possible. In particular, we'll maintain (source) API
compatibility with the code currently in libbpf. And yeah, currently
libxdp lives outside the kernel tree. Not sure whether we'll end up
moving it into the kernel tree, as Björn noted up-thread there are
arguments in both directions :)

>> Well, as far as libxdp support goes, the PR I linked is pretty close to
>> being mergeable. One of the few outstanding issues is whether we should
>> solve just this issue before merging, actually :)
>> 
>> Not sure exactly which timeframe Andrii is envisioning for libbpf 1.0,
>> but last I heard he'll announce something next week.
>> 
>> > Bottom line; I'd *really* like bpf_link behavior (process scoped) for
>> > AF_XDP sooner than later! ;-)
>> 
>> Totally agree that we should solve the multi-process coexistence
>> problem! And as I said, I think we can do so in libxdp by using the same
>> synchronisation mechanism we use for setting up the multi-prog
>> dispatcher. So it doesn't *have* to hold things up :)
>
> Am I reading this right or you're trying to reject the fix of the long
> standing issue due to a PR that is not ready yet on a standalone
> project/library? :P

Haha, no, that is not what I'm saying. As I said up-thread I agree that
this is something we should fix, obviously. I'm just suggesting we fix
it in a way that will also be compatible with libxdp and multiprog so we
won't have to introduce yet-another-flag that users will have to decide
on.

However, now that I'm looking at your patch again I think my fears may
have been overblown. I got hung up on the bit in the commit message
where you said "refcounting of BPF resources will be done automatically
by bpf_link itself", and wrongly assumed that you were exposing the
bpf_link fd to the application. But I see now that it's kept in the
private xsk_ctx structure, and applications still just call
xsk_socket__delete(). So in libxdp we can just implement the same API
with a different synchronisation mechanism; that's fine. Apologies for
jumping to conclusions :/

> Once libxdp is the standard way of playing with AF-XDP and there are
> actual users of that, then I'm happy to work/help on that issue.

That is good to know, thanks! I opened an issue in the xdp-tools
repository to track this for the libxdp side (Magnus and I agreed that
we'll merge the existing code first, then fix this on top):
https://github.com/xdp-project/xdp-tools/issues/93

As noted above, the mechanism may have to change, but I think it's
possible to achieve the same thing in a libxdp context :)

-Toke
Toke Høiland-Jørgensen Feb. 16, 2021, 10:36 a.m. UTC | #16
John Fastabend <john.fastabend@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> John Fastabend <john.fastabend@gmail.com> writes:
>> 
>> >> > However, in libxdp we can solve the original problem in a different way,
>> >> > and in fact I already suggested to Magnus that we should do this (see
>> >> > [1]); so one way forward could be to address it during the merge in
>> >> > libxdp? It should be possible to address the original issue (two
>> >> > instances of xdpsock breaking each other when they exit), but
>> >> > applications will still need to do an explicit unload operation before
>> >> > exiting (i.e., the automatic detach on bpf_link fd closure will take
>> >> > more work, and likely require extending the bpf_link kernel support)...
>> >> >
>> >> 
>> >> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
>> >> we're months ahead, then I'd really like to see this in libbpf until the
>> >> merge. However, I'll leave that for Magnus/you to decide!
>> >
>> > Did I miss some thread? What does this mean libbpf 1.0/libxdp merge?
>> 
>> The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff to
>> libxdp (so the socket stuff in xsk.h). We're adding the existing code
>> wholesale, and keeping API compatibility during the move, so all that's
>> needed is adding -lxdp when compiling. And obviously the existing libbpf
>> code isn't going anywhere until such a time as there's a general
>> backwards compatibility-breaking deprecation in libbpf (which I believe
>> Andrii is planning to do in an upcoming and as-of-yet unannounced v1.0
>> release).
>
> OK, I would like to keep the basic XDP pieces in libbpf though. For example
> bpf_program__attach_xdp(). This way we don't have one lib to attach
> everything, but XDP.

The details are still TDB; for now, we're just merging in the XSK code
to the libxdp repo. I expect Andrii to announce his plans for the rest
soonish. I wouldn't expect basic things like that to go away, though :)

>> 
>> While integrating the XSK code into libxdp we're trying to make it
>> compatible with the rest of the library (i.e., multi-prog). Hence my
>> preference to avoid introducing something that makes this harder :)
>> 
>> -Toke
>> 
>
> OK that makes sense to me thanks. But, I'm missing something (maybe its
> obvious to everyone else?).
>
> When you load an XDP program you should get a reference to it. And then
> XDP program should never be unloaded until that id is removed right? It
> may or may not have an xsk map. Why does adding/removing programs from
> an associated map have any impact on the XDP program? That seems like
> the buggy part to me. No other map behaves this way as far as I can
> tell. Now if the program with the XDP reference closes without pinning
> the map or otherwise doing something with it, sure the map gets destroyed
> and any xsk sockets are lost.

The original bug comes from the XSK code abstracting away the fact that
an AF_XDP socket needs an XDP program on the interface to work; so if
none exists, the library will just load a program that redirects into
the socket. Which breaks since the xdpsock example application is trying
to be nice and clean up after itself, by removing the XDP program when
it's done with the socket, thus breaking any other programs using that
XDP program. So this patch introduces proper synchronisation on both add
and remove of the XDP program...

-Toke
Toke Høiland-Jørgensen Feb. 16, 2021, 10:39 a.m. UTC | #17
Björn Töpel <bjorn.topel@intel.com> writes:

> On 2021-02-15 21:49, John Fastabend wrote:
>> Maciej Fijalkowski wrote:
>>> Currently, if there are multiple xdpsock instances running on a single
>>> interface and in case one of the instances is terminated, the rest of
>>> them are left in an inoperable state due to the fact of unloaded XDP
>>> prog from interface.
>>>
>>> To address that, step away from setting bpf prog in favour of bpf_link.
>>> This means that refcounting of BPF resources will be done automatically
>>> by bpf_link itself.
>>>
>>> When setting up BPF resources during xsk socket creation, check whether
>>> bpf_link for a given ifindex already exists via set of calls to
>>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
>>> and comparing the ifindexes from bpf_link and xsk socket.
>>>
>>> If there's no bpf_link yet, create one for a given XDP prog and unload
>>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.
>>>
>>> If bpf_link is already at a given ifindex and underlying program is not
>>> AF-XDP one, bail out or update the bpf_link's prog given the presence of
>>> XDP_FLAGS_UPDATE_IF_NOEXIST.
>>>
>>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
>>> ---
>>>   tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++-------
>>>   1 file changed, 122 insertions(+), 21 deletions(-)
>> 
>> [...]
>> 
>>> +static int xsk_create_bpf_link(struct xsk_socket *xsk)
>>> +{
>>> +	/* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags
>>> +	 * might have set XDP_FLAGS_UPDATE_IF_NOEXIST
>>> +	 */
>>> +	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
>>> +			    .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES));
>>> +	struct xsk_ctx *ctx = xsk->ctx;
>>> +	__u32 prog_id;
>>> +	int link_fd;
>>> +	int err;
>>> +
>>> +	/* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any,
>>> +	 * so that bpf_link can be attached
>>> +	 */
>>> +	if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) {
>>> +		err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags);
>>> +		if (err) {
>>> +			pr_warn("getting XDP prog id failed\n");
>>> +			return err;
>>> +		}
>>> +		if (prog_id) {
>>> +			err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0);
>>> +			if (err < 0) {
>>> +				pr_warn("detaching XDP prog failed\n");
>>> +				return err;
>>> +			}
>>> +		}
>>>   	}
>>>   
>>> -	ctx->prog_fd = prog_fd;
>>> +	link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts);
>>> +	if (link_fd < 0) {
>>> +		pr_warn("bpf_link_create failed: %s\n", strerror(errno));
>>> +		return link_fd;
>>> +	}
>>> +
>> 
>> This can leave the system in a bad state where it unloaded the XDP program
>> above, but then failed to create the link. So we should somehow fix that
>> if possible or at minimum put a note somewhere so users can't claim they
>> shouldn't know this.
>> 
>> Also related, its not good for real systems to let XDP program go missing
>> for some period of time. I didn't check but we should make
>> XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already.
>>
>
> This is the default for XDP sockets library. The
> "bpf_set_link_xdp_fd(...-1)" way is only when a user sets it explicitly.
> One could maybe argue that the "force remove" would be out of scope for
> AF_XDP; Meaning that if an XDP program is running, attached via netlink,
> the AF_XDP library simply cannot remove it. The user would need to rely
> on some other mechanism.

Yeah, I'd tend to agree with that. In general, I think the proliferation
of "just force-remove (or override) the running program" in code and
instructions has been a mistake; and application should only really be
adding and removing its own program...

-Toke
John Fastabend Feb. 16, 2021, 6:19 p.m. UTC | #18
Maciej Fijalkowski wrote:
> On Mon, Feb 15, 2021 at 12:49:27PM -0800, John Fastabend wrote:
> > Maciej Fijalkowski wrote:
> > > Currently, if there are multiple xdpsock instances running on a single
> > > interface and in case one of the instances is terminated, the rest of
> > > them are left in an inoperable state due to the fact of unloaded XDP
> > > prog from interface.
> > > 
> > > To address that, step away from setting bpf prog in favour of bpf_link.
> > > This means that refcounting of BPF resources will be done automatically
> > > by bpf_link itself.
> > > 
> > > When setting up BPF resources during xsk socket creation, check whether
> > > bpf_link for a given ifindex already exists via set of calls to
> > > bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> > > and comparing the ifindexes from bpf_link and xsk socket.
> > > 
> > > If there's no bpf_link yet, create one for a given XDP prog and unload
> > > explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.
> > > 
> > > If bpf_link is already at a given ifindex and underlying program is not
> > > AF-XDP one, bail out or update the bpf_link's prog given the presence of
> > > XDP_FLAGS_UPDATE_IF_NOEXIST.
> > > 
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > ---
> > >  tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++-------
> > >  1 file changed, 122 insertions(+), 21 deletions(-)
> > 
> > [...]
> > 
> > > +static int xsk_create_bpf_link(struct xsk_socket *xsk)
> > > +{
> > > +	/* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags
> > > +	 * might have set XDP_FLAGS_UPDATE_IF_NOEXIST
> > > +	 */
> > > +	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
> > > +			    .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES));
> > > +	struct xsk_ctx *ctx = xsk->ctx;
> > > +	__u32 prog_id;
> > > +	int link_fd;
> > > +	int err;
> > > +
> > > +	/* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any,
> > > +	 * so that bpf_link can be attached
> > > +	 */
> > > +	if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) {
> > > +		err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags);
> > > +		if (err) {
> > > +			pr_warn("getting XDP prog id failed\n");
> > > +			return err;
> > > +		}
> > > +		if (prog_id) {
> > > +			err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0);
> > > +			if (err < 0) {
> > > +				pr_warn("detaching XDP prog failed\n");
> > > +				return err;
> > > +			}
> > > +		}
> > >  	}
> > >  
> > > -	ctx->prog_fd = prog_fd;
> > > +	link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts);
> > > +	if (link_fd < 0) {
> > > +		pr_warn("bpf_link_create failed: %s\n", strerror(errno));
> > > +		return link_fd;
> > > +	}
> > > +
> > 
> > This can leave the system in a bad state where it unloaded the XDP program
> > above, but then failed to create the link. So we should somehow fix that
> > if possible or at minimum put a note somewhere so users can't claim they
> > shouldn't know this.
> > 
> > Also related, its not good for real systems to let XDP program go missing
> > for some period of time. I didn't check but we should make
> > XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already.
> 
> Old way of attaching prog is mutual exclusive with bpf_link, right?
> What I'm saying is in order to use one of the two, you need to wipe out
> the current one in favour of the second that you would like to load.

Personally, if I were using above I want the operation to error
if a XDP program is already attached. Then user is forced to remove the
XDP program directly if thats even safe to do.

Reusing UPDATE_IF_NOEXIST flag above seems like an abuse of that flag.
The kernel side does an atomic program swap (or at least it should imo) 
of the programs when it is set. Atomic here is not exactly right though
because driver might reset or do other things, but the point is no
packets are missed without policy. In above some N packets will pass
through the device without policy being applied. This is going to be
subtle and buggy if used in real production systems.

The API needs to do a replace operation not a delete/create and if it
can't do that it needs to error out so the user can figure out what
to do about it.

Do you really need this automatic behavior for something? It clutters
up the API with more flags and I can't see how its useful. If it
errors out just delete the prog using the existing interfaces from the
API user side.

> 
> > 
> > > +	ctx->link_fd = link_fd;
> > >  	return 0;
> > >  }
> > >  
> > 
> > [...]
> > 
> > > +static int xsk_link_lookup(struct xsk_ctx *ctx, __u32 *prog_id)
> > > +{
> > > +	__u32 link_len = sizeof(struct bpf_link_info);
> > > +	struct bpf_link_info link_info;
> > > +	__u32 id = 0;
> > > +	int err;
> > > +	int fd;
> > > +
> > > +	while (true) {
> > > +		err = bpf_link_get_next_id(id, &id);
> > > +		if (err) {
> > > +			if (errno == ENOENT)
> > > +				break;
> > > +			pr_warn("can't get next link: %s\n", strerror(errno));
> > > +			break;
> > > +		}
> > > +
> > > +		fd = bpf_link_get_fd_by_id(id);
> > > +		if (fd < 0) {
> > > +			if (errno == ENOENT)
> > > +				continue;
> > > +			pr_warn("can't get link by id (%u): %s\n", id, strerror(errno));
> > > +			break;
> > > +		}
> > > +
> > > +		memset(&link_info, 0, link_len);
> > > +		err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len);
> > > +		if (err) {
> > > +			pr_warn("can't get link info: %s\n", strerror(errno));
> > > +			close(fd);
> > > +			break;
> > > +		}
> > > +		if (link_info.xdp.ifindex == ctx->ifindex) {
> > > +			ctx->link_fd = fd;
> > > +			*prog_id = link_info.prog_id;
> > > +			break;
> > > +		}
> > > +		close(fd);
> > > +	}
> > > +
> > > +	return errno == ENOENT ? 0 : err;
> > 
> > But, err wont be set in fd < 0 case? I guess we don't want to return 0 if
> > bpf_link_get_fd_by_id fails.
> 
> Good catch!
> 
> > Although I really don't like the construct
> > here that much. I think just `return err` and ensuring err is set correctly
> > would be more clear. At least the fd error case needs to be handled
> > though.
> 
> FWIW, this was inspired by tools/bpf/bpftool/link.c:do_show()

Sure its not my preference, but as long as the bug is resolved I
wont complain. If I hadn't seen the bug I wouldn't have said
anything.

> 
> > 
> > > +}
> > > +
John Fastabend Feb. 16, 2021, 7:15 p.m. UTC | #19
Toke Høiland-Jørgensen wrote:
> Björn Töpel <bjorn.topel@intel.com> writes:
> 
> > On 2021-02-15 21:49, John Fastabend wrote:
> >> Maciej Fijalkowski wrote:
> >>> Currently, if there are multiple xdpsock instances running on a single
> >>> interface and in case one of the instances is terminated, the rest of
> >>> them are left in an inoperable state due to the fact of unloaded XDP
> >>> prog from interface.
> >>>
> >>> To address that, step away from setting bpf prog in favour of bpf_link.
> >>> This means that refcounting of BPF resources will be done automatically
> >>> by bpf_link itself.
> >>>
> >>> When setting up BPF resources during xsk socket creation, check whether
> >>> bpf_link for a given ifindex already exists via set of calls to
> >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> >>> and comparing the ifindexes from bpf_link and xsk socket.
> >>>
> >>> If there's no bpf_link yet, create one for a given XDP prog and unload
> >>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.
> >>>
> >>> If bpf_link is already at a given ifindex and underlying program is not
> >>> AF-XDP one, bail out or update the bpf_link's prog given the presence of
> >>> XDP_FLAGS_UPDATE_IF_NOEXIST.
> >>>
> >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> >>> ---

[...]

> >>> -	ctx->prog_fd = prog_fd;
> >>> +	link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts);
> >>> +	if (link_fd < 0) {
> >>> +		pr_warn("bpf_link_create failed: %s\n", strerror(errno));
> >>> +		return link_fd;
> >>> +	}
> >>> +
> >> 
> >> This can leave the system in a bad state where it unloaded the XDP program
> >> above, but then failed to create the link. So we should somehow fix that
> >> if possible or at minimum put a note somewhere so users can't claim they
> >> shouldn't know this.
> >> 
> >> Also related, its not good for real systems to let XDP program go missing
> >> for some period of time. I didn't check but we should make
> >> XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already.
> >>
> >
> > This is the default for XDP sockets library. The
> > "bpf_set_link_xdp_fd(...-1)" way is only when a user sets it explicitly.
> > One could maybe argue that the "force remove" would be out of scope for
> > AF_XDP; Meaning that if an XDP program is running, attached via netlink,
> > the AF_XDP library simply cannot remove it. The user would need to rely
> > on some other mechanism.
> 
> Yeah, I'd tend to agree with that. In general, I think the proliferation
> of "just force-remove (or override) the running program" in code and
> instructions has been a mistake; and application should only really be
> adding and removing its own program...
> 
> -Toke
> 

I'll try to consolidate some of my opinions from a couple threads here. It
looks to me many of these issues are self-inflicted by the API. We built
the API with out the right abstractions or at least different abstractions
from the rest of the BPF APIs. Not too surprising seeing the kernel side
and user side were all being built up at once.

For example this specific issue where the xsk API also deletes the XDP
program seems to be due to merging the xsk with the creation of the XDP
programs.

I'm not a real user of AF_XDP (yet.), but here is how I would expect it
to work based on how the sockmap pieces work, which are somewhat similar
given they also deal with sockets.

Program 
(1) load and pin an XDP BPF program
    - obj = bpf_object__open(prog);
    - bpf_object__load_xattr(&attr);
    - bpf_program__pin()
(2) pin the map, find map_xsk using any of the map APIs
    - bpf_map__pin(map_xsk, path_to_pin)
(3) attach to XDP
    - link = bpf_program__attach_xdp()
    - bpf_link__pin()

At this point you have a BPF program loaded, a xsk map, and a link all
pinned and ready. And we can add socks using the process found in
`enter_xsks_into_map` in the sample. This can be the same program that
loaded/pinned the XDP program or some other program it doesn't really
matter.

 - create xsk fd
      . xsk_umem__create()
      . xsk_socket__create
 - open map @ pinned path
 - bpf_map_update_elem(xsks_map, &key, &fd, 0);

Then it looks like we don't have any conflicts? The XDP program is pinned
and exists in its normal scope. The xsk elements can be added/deleted
as normal. If the XDP program is removed and the map referencing (using
normal ref rules) reaches zero its also deleted. Above is more or less
the same flow we use for any BPF program so looks good to me.

The trouble seems to pop up when using the higher abstraction APIs
xsk_setup_xdp_prog and friends I guess? I just see above as already
fairly easy to use and we have good helpers to create the sockets it looks
like. Maybe I missed some design considerations. IMO higher level
abstractions should go in new libxdp and above should stay in libbpf.

/rant off ;)

Thanks,
John
Fijalkowski, Maciej Feb. 16, 2021, 8:10 p.m. UTC | #20
On Tue, Feb 16, 2021 at 10:19:17AM -0800, John Fastabend wrote:
> Maciej Fijalkowski wrote:
> > On Mon, Feb 15, 2021 at 12:49:27PM -0800, John Fastabend wrote:
> > > Maciej Fijalkowski wrote:
> > > > Currently, if there are multiple xdpsock instances running on a single
> > > > interface and in case one of the instances is terminated, the rest of
> > > > them are left in an inoperable state due to the fact of unloaded XDP
> > > > prog from interface.
> > > > 
> > > > To address that, step away from setting bpf prog in favour of bpf_link.
> > > > This means that refcounting of BPF resources will be done automatically
> > > > by bpf_link itself.
> > > > 
> > > > When setting up BPF resources during xsk socket creation, check whether
> > > > bpf_link for a given ifindex already exists via set of calls to
> > > > bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> > > > and comparing the ifindexes from bpf_link and xsk socket.
> > > > 
> > > > If there's no bpf_link yet, create one for a given XDP prog and unload
> > > > explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.
> > > > 
> > > > If bpf_link is already at a given ifindex and underlying program is not
> > > > AF-XDP one, bail out or update the bpf_link's prog given the presence of
> > > > XDP_FLAGS_UPDATE_IF_NOEXIST.
> > > > 
> > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > > ---
> > > >  tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++-------
> > > >  1 file changed, 122 insertions(+), 21 deletions(-)
> > > 
> > > [...]
> > > 
> > > > +static int xsk_create_bpf_link(struct xsk_socket *xsk)
> > > > +{
> > > > +	/* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags
> > > > +	 * might have set XDP_FLAGS_UPDATE_IF_NOEXIST
> > > > +	 */
> > > > +	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
> > > > +			    .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES));
> > > > +	struct xsk_ctx *ctx = xsk->ctx;
> > > > +	__u32 prog_id;
> > > > +	int link_fd;
> > > > +	int err;
> > > > +
> > > > +	/* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any,
> > > > +	 * so that bpf_link can be attached
> > > > +	 */
> > > > +	if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) {
> > > > +		err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags);
> > > > +		if (err) {
> > > > +			pr_warn("getting XDP prog id failed\n");
> > > > +			return err;
> > > > +		}
> > > > +		if (prog_id) {
> > > > +			err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0);
> > > > +			if (err < 0) {
> > > > +				pr_warn("detaching XDP prog failed\n");
> > > > +				return err;
> > > > +			}
> > > > +		}
> > > >  	}
> > > >  
> > > > -	ctx->prog_fd = prog_fd;
> > > > +	link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts);
> > > > +	if (link_fd < 0) {
> > > > +		pr_warn("bpf_link_create failed: %s\n", strerror(errno));
> > > > +		return link_fd;
> > > > +	}
> > > > +
> > > 
> > > This can leave the system in a bad state where it unloaded the XDP program
> > > above, but then failed to create the link. So we should somehow fix that
> > > if possible or at minimum put a note somewhere so users can't claim they
> > > shouldn't know this.
> > > 
> > > Also related, its not good for real systems to let XDP program go missing
> > > for some period of time. I didn't check but we should make
> > > XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already.
> > 
> > Old way of attaching prog is mutual exclusive with bpf_link, right?
> > What I'm saying is in order to use one of the two, you need to wipe out
> > the current one in favour of the second that you would like to load.
> 
> Personally, if I were using above I want the operation to error
> if a XDP program is already attached. Then user is forced to remove the
> XDP program directly if thats even safe to do.
> 
> Reusing UPDATE_IF_NOEXIST flag above seems like an abuse of that flag.
> The kernel side does an atomic program swap (or at least it should imo) 
> of the programs when it is set. Atomic here is not exactly right though
> because driver might reset or do other things, but the point is no
> packets are missed without policy. In above some N packets will pass
> through the device without policy being applied. This is going to be
> subtle and buggy if used in real production systems.
> 
> The API needs to do a replace operation not a delete/create and if it
> can't do that it needs to error out so the user can figure out what
> to do about it.
> 
> Do you really need this automatic behavior for something? It clutters
> up the API with more flags and I can't see how its useful. If it
> errors out just delete the prog using the existing interfaces from the
> API user side.

Fair argument, I went too far with --force flag. Given what you said, I'll
drop that wipe out of netlink-based XDP prog, but I think we can keep the
logic around updating the bpf_link if it already exists (in case --force
flag was set). This provides the atomic xchg and we won't expose systems
to a time frame without XDP policy as you point out.

> 
> > 
> > > 
> > > > +	ctx->link_fd = link_fd;
> > > >  	return 0;
> > > >  }
> > > >  
> > > 
> > > [...]
> > > 
> > > > +static int xsk_link_lookup(struct xsk_ctx *ctx, __u32 *prog_id)
> > > > +{
> > > > +	__u32 link_len = sizeof(struct bpf_link_info);
> > > > +	struct bpf_link_info link_info;
> > > > +	__u32 id = 0;
> > > > +	int err;
> > > > +	int fd;
> > > > +
> > > > +	while (true) {
> > > > +		err = bpf_link_get_next_id(id, &id);
> > > > +		if (err) {
> > > > +			if (errno == ENOENT)
> > > > +				break;
> > > > +			pr_warn("can't get next link: %s\n", strerror(errno));
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		fd = bpf_link_get_fd_by_id(id);
> > > > +		if (fd < 0) {
> > > > +			if (errno == ENOENT)
> > > > +				continue;
> > > > +			pr_warn("can't get link by id (%u): %s\n", id, strerror(errno));
> > > > +			break;
> > > > +		}
> > > > +
> > > > +		memset(&link_info, 0, link_len);
> > > > +		err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len);
> > > > +		if (err) {
> > > > +			pr_warn("can't get link info: %s\n", strerror(errno));
> > > > +			close(fd);
> > > > +			break;
> > > > +		}
> > > > +		if (link_info.xdp.ifindex == ctx->ifindex) {
> > > > +			ctx->link_fd = fd;
> > > > +			*prog_id = link_info.prog_id;
> > > > +			break;
> > > > +		}
> > > > +		close(fd);
> > > > +	}
> > > > +
> > > > +	return errno == ENOENT ? 0 : err;
> > > 
> > > But, err wont be set in fd < 0 case? I guess we don't want to return 0 if
> > > bpf_link_get_fd_by_id fails.
> > 
> > Good catch!
> > 
> > > Although I really don't like the construct
> > > here that much. I think just `return err` and ensuring err is set correctly
> > > would be more clear. At least the fd error case needs to be handled
> > > though.
> > 
> > FWIW, this was inspired by tools/bpf/bpftool/link.c:do_show()
> 
> Sure its not my preference, but as long as the bug is resolved I
> wont complain. If I hadn't seen the bug I wouldn't have said
> anything.
> 
> > 
> > > 
> > > > +}
> > > > +
> 
>
Fijalkowski, Maciej Feb. 16, 2021, 8:15 p.m. UTC | #21
On Tue, Feb 16, 2021 at 11:27:55AM +0100, Toke Høiland-Jørgensen wrote:
> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
> >
> > Am I reading this right or you're trying to reject the fix of the long
> > standing issue due to a PR that is not ready yet on a standalone
> > project/library? :P
> 
> Haha, no, that is not what I'm saying. As I said up-thread I agree that
> this is something we should fix, obviously. I'm just suggesting we fix
> it in a way that will also be compatible with libxdp and multiprog so we
> won't have to introduce yet-another-flag that users will have to decide
> on.
> 
> However, now that I'm looking at your patch again I think my fears may
> have been overblown. I got hung up on the bit in the commit message
> where you said "refcounting of BPF resources will be done automatically
> by bpf_link itself", and wrongly assumed that you were exposing the
> bpf_link fd to the application. But I see now that it's kept in the
> private xsk_ctx structure, and applications still just call
> xsk_socket__delete(). So in libxdp we can just implement the same API
> with a different synchronisation mechanism; that's fine. Apologies for
> jumping to conclusions :/

No worries, this shows how important a thorough commit message is, seems
that I failed on that part.

> 
> > Once libxdp is the standard way of playing with AF-XDP and there are
> > actual users of that, then I'm happy to work/help on that issue.
> 
> That is good to know, thanks! I opened an issue in the xdp-tools
> repository to track this for the libxdp side (Magnus and I agreed that
> we'll merge the existing code first, then fix this on top):
> https://github.com/xdp-project/xdp-tools/issues/93

Thanks! And good to hear that there's some sort of agreement.

> 
> As noted above, the mechanism may have to change, but I think it's
> possible to achieve the same thing in a libxdp context :)
> 
> -Toke
>
Fijalkowski, Maciej Feb. 16, 2021, 8:50 p.m. UTC | #22
On Tue, Feb 16, 2021 at 11:15:41AM -0800, John Fastabend wrote:
> Toke Høiland-Jørgensen wrote:
> > Björn Töpel <bjorn.topel@intel.com> writes:
> > 
> > > On 2021-02-15 21:49, John Fastabend wrote:
> > >> Maciej Fijalkowski wrote:
> > >>> Currently, if there are multiple xdpsock instances running on a single
> > >>> interface and in case one of the instances is terminated, the rest of
> > >>> them are left in an inoperable state due to the fact of unloaded XDP
> > >>> prog from interface.
> > >>>
> > >>> To address that, step away from setting bpf prog in favour of bpf_link.
> > >>> This means that refcounting of BPF resources will be done automatically
> > >>> by bpf_link itself.
> > >>>
> > >>> When setting up BPF resources during xsk socket creation, check whether
> > >>> bpf_link for a given ifindex already exists via set of calls to
> > >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> > >>> and comparing the ifindexes from bpf_link and xsk socket.
> > >>>
> > >>> If there's no bpf_link yet, create one for a given XDP prog and unload
> > >>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.
> > >>>
> > >>> If bpf_link is already at a given ifindex and underlying program is not
> > >>> AF-XDP one, bail out or update the bpf_link's prog given the presence of
> > >>> XDP_FLAGS_UPDATE_IF_NOEXIST.
> > >>>
> > >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > >>> ---
> 
> [...]
> 
> > >>> -	ctx->prog_fd = prog_fd;
> > >>> +	link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts);
> > >>> +	if (link_fd < 0) {
> > >>> +		pr_warn("bpf_link_create failed: %s\n", strerror(errno));
> > >>> +		return link_fd;
> > >>> +	}
> > >>> +
> > >> 
> > >> This can leave the system in a bad state where it unloaded the XDP program
> > >> above, but then failed to create the link. So we should somehow fix that
> > >> if possible or at minimum put a note somewhere so users can't claim they
> > >> shouldn't know this.
> > >> 
> > >> Also related, its not good for real systems to let XDP program go missing
> > >> for some period of time. I didn't check but we should make
> > >> XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already.
> > >>
> > >
> > > This is the default for XDP sockets library. The
> > > "bpf_set_link_xdp_fd(...-1)" way is only when a user sets it explicitly.
> > > One could maybe argue that the "force remove" would be out of scope for
> > > AF_XDP; Meaning that if an XDP program is running, attached via netlink,
> > > the AF_XDP library simply cannot remove it. The user would need to rely
> > > on some other mechanism.
> > 
> > Yeah, I'd tend to agree with that. In general, I think the proliferation
> > of "just force-remove (or override) the running program" in code and
> > instructions has been a mistake; and application should only really be
> > adding and removing its own program...
> > 
> > -Toke
> > 
> 
> I'll try to consolidate some of my opinions from a couple threads here. It
> looks to me many of these issues are self-inflicted by the API. We built
> the API with out the right abstractions or at least different abstractions
> from the rest of the BPF APIs. Not too surprising seeing the kernel side
> and user side were all being built up at once.
> 
> For example this specific issue where the xsk API also deletes the XDP
> program seems to be due to merging the xsk with the creation of the XDP
> programs.

IMHO the issue I fixed and that is the topic of those discussions is the
fact that handling of XDP program for xsk case was not balanced, IOW
libbpf was creating the prog if it wasn't present whereas app side was
blindly removing the XDP prog, even if it wasn't aware whether it has
loaded the prog among with XSKMAP or only attached the xsk socket to
XSKMAP that already existed.

> 
> I'm not a real user of AF_XDP (yet.), but here is how I would expect it
> to work based on how the sockmap pieces work, which are somewhat similar
> given they also deal with sockets.

Don't want to be picky, but I suppose sockmap won't play with ndo_bpf()
and that's what was bothering us.

> 
> Program 
> (1) load and pin an XDP BPF program
>     - obj = bpf_object__open(prog);
>     - bpf_object__load_xattr(&attr);
>     - bpf_program__pin()
> (2) pin the map, find map_xsk using any of the map APIs
>     - bpf_map__pin(map_xsk, path_to_pin)
> (3) attach to XDP
>     - link = bpf_program__attach_xdp()
>     - bpf_link__pin()
> 
> At this point you have a BPF program loaded, a xsk map, and a link all
> pinned and ready. And we can add socks using the process found in
> `enter_xsks_into_map` in the sample. This can be the same program that
> loaded/pinned the XDP program or some other program it doesn't really
> matter.
> 
>  - create xsk fd
>       . xsk_umem__create()
>       . xsk_socket__create
>  - open map @ pinned path
>  - bpf_map_update_elem(xsks_map, &key, &fd, 0);
> 
> Then it looks like we don't have any conflicts? The XDP program is pinned
> and exists in its normal scope. The xsk elements can be added/deleted
> as normal.

The only difference from what you wrote up is the resource pinning, when
compared to what we currently have + the set I am proposing.

So, if you're saying it looks like we don't have any conflicts and I am
saying that the flow is what we have, then...? :)

You would have to ask Magnus what was behind the decision to avoid API
from tools/lib/bpf/libbpf.c but rather directly call underlying functions
from tools/lib/bpf/bpf.c. Nevertheless, it doesn't really make a big
difference to me.

> If the XDP program is removed and the map referencing (using
> normal ref rules) reaches zero its also deleted. Above is more or less
> the same flow we use for any BPF program so looks good to me.

We have to be a bit more specific around the "XDP program is removed".
Is it removed from the network interface? That's the thing that we want to
avoid when there are other xsk sockets active on a given interface.

With bpf_link, XDP prog is removed only when bpf_link's refcount reaches
zero, via link->ops->release() callback that is invoked from
bpf_link_put().

And the refcount is updated with each process that attaches/detaches from
the bpf_link on interface.

> 
> The trouble seems to pop up when using the higher abstraction APIs
> xsk_setup_xdp_prog and friends I guess? I just see above as already
> fairly easy to use and we have good helpers to create the sockets it looks
> like. Maybe I missed some design considerations. IMO higher level
> abstractions should go in new libxdp and above should stay in libbpf.

xsk_setup_xdp_prog doesn't really feel like higher level abstraction, as I
mentioned, to me it has one layer of abstraction peeled off.

> 
> /rant off ;)
> 
> Thanks,
> John
John Fastabend Feb. 16, 2021, 9:17 p.m. UTC | #23
Maciej Fijalkowski wrote:
> On Tue, Feb 16, 2021 at 11:15:41AM -0800, John Fastabend wrote:
> > Toke Høiland-Jørgensen wrote:
> > > Björn Töpel <bjorn.topel@intel.com> writes:
> > > 
> > > > On 2021-02-15 21:49, John Fastabend wrote:
> > > >> Maciej Fijalkowski wrote:
> > > >>> Currently, if there are multiple xdpsock instances running on a single
> > > >>> interface and in case one of the instances is terminated, the rest of
> > > >>> them are left in an inoperable state due to the fact of unloaded XDP
> > > >>> prog from interface.
> > > >>>
> > > >>> To address that, step away from setting bpf prog in favour of bpf_link.
> > > >>> This means that refcounting of BPF resources will be done automatically
> > > >>> by bpf_link itself.
> > > >>>
> > > >>> When setting up BPF resources during xsk socket creation, check whether
> > > >>> bpf_link for a given ifindex already exists via set of calls to
> > > >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> > > >>> and comparing the ifindexes from bpf_link and xsk socket.
> > > >>>
> > > >>> If there's no bpf_link yet, create one for a given XDP prog and unload
> > > >>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.
> > > >>>
> > > >>> If bpf_link is already at a given ifindex and underlying program is not
> > > >>> AF-XDP one, bail out or update the bpf_link's prog given the presence of
> > > >>> XDP_FLAGS_UPDATE_IF_NOEXIST.
> > > >>>
> > > >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

[...]

> > 
> > I'm not a real user of AF_XDP (yet.), but here is how I would expect it
> > to work based on how the sockmap pieces work, which are somewhat similar
> > given they also deal with sockets.
> 
> Don't want to be picky, but I suppose sockmap won't play with ndo_bpf()
> and that's what was bothering us.
> 
> > 
> > Program 
> > (1) load and pin an XDP BPF program
> >     - obj = bpf_object__open(prog);
> >     - bpf_object__load_xattr(&attr);
> >     - bpf_program__pin()
> > (2) pin the map, find map_xsk using any of the map APIs
> >     - bpf_map__pin(map_xsk, path_to_pin)
> > (3) attach to XDP
> >     - link = bpf_program__attach_xdp()
> >     - bpf_link__pin()
> > 
> > At this point you have a BPF program loaded, a xsk map, and a link all
> > pinned and ready. And we can add socks using the process found in
> > `enter_xsks_into_map` in the sample. This can be the same program that
> > loaded/pinned the XDP program or some other program it doesn't really
> > matter.
> > 
> >  - create xsk fd
> >       . xsk_umem__create()
> >       . xsk_socket__create
> >  - open map @ pinned path
> >  - bpf_map_update_elem(xsks_map, &key, &fd, 0);
> > 
> > Then it looks like we don't have any conflicts? The XDP program is pinned
> > and exists in its normal scope. The xsk elements can be added/deleted
> > as normal.
> 
> The only difference from what you wrote up is the resource pinning, when
> compared to what we currently have + the set I am proposing.
> 
> So, if you're saying it looks like we don't have any conflicts and I am
> saying that the flow is what we have, then...? :)
> 
> You would have to ask Magnus what was behind the decision to avoid API
> from tools/lib/bpf/libbpf.c but rather directly call underlying functions
> from tools/lib/bpf/bpf.c. Nevertheless, it doesn't really make a big
> difference to me.
> 
> > If the XDP program is removed and the map referencing (using
> > normal ref rules) reaches zero its also deleted. Above is more or less
> > the same flow we use for any BPF program so looks good to me.
> 
> We have to be a bit more specific around the "XDP program is removed".
> Is it removed from the network interface? That's the thing that we want to
> avoid when there are other xsk sockets active on a given interface.

What I'm suggesting here is to use the normal rules for when an
XDP program is removed from the network interface. I don't think we
should do anything extra to keep the XDP program attached/loaded
just because it has a xsk map which may or may not have some
entries in it. If the user wants this behavior they should pin
the bpf_link pointer associated with the xdp program. This is the
same as any other program/map I have in my BPF system.

> 
> With bpf_link, XDP prog is removed only when bpf_link's refcount reaches
> zero, via link->ops->release() callback that is invoked from
> bpf_link_put().
> 
> And the refcount is updated with each process that attaches/detaches from
> the bpf_link on interface.
> 
> > 
> > The trouble seems to pop up when using the higher abstraction APIs
> > xsk_setup_xdp_prog and friends I guess? I just see above as already
> > fairly easy to use and we have good helpers to create the sockets it looks
> > like. Maybe I missed some design considerations. IMO higher level
> > abstractions should go in new libxdp and above should stay in libbpf.
> 
> xsk_setup_xdp_prog doesn't really feel like higher level abstraction, as I
> mentioned, to me it has one layer of abstraction peeled off.

Except it seems to have caused some issues. I don't think I gain
much from the API personally vs just doing above steps. But, I
appreciate you are just trying to fix it here so patches are
a good idea with v2 improvements. And I expect whenever
libbpf/libxdp split the above "high level" APIs will land in
libxdp.

Thanks,
John

> 
> > 
> > /rant off ;)
> > 
> > Thanks,
> > John
Dan Siemon Feb. 17, 2021, 2:23 a.m. UTC | #24
On Mon, 2021-02-15 at 22:38 +0100, Toke Høiland-Jørgensen wrote:
> The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff
> to
> libxdp (so the socket stuff in xsk.h). We're adding the existing code
> wholesale, and keeping API compatibility during the move, so all
> that's
> needed is adding -lxdp when compiling. And obviously the existing
> libbpf
> code isn't going anywhere until such a time as there's a general
> backwards compatibility-breaking deprecation in libbpf (which I
> believe
> Andrii is planning to do in an upcoming and as-of-yet unannounced
> v1.0
> release).

I maintain a Rust binding to the AF_XDP parts of libbpf [1][2]. On the
chance that more significant changes can be entertained in the switch
to libxdp... The fact that many required functions like the ring access
functions exist only in xsk.h makes building a binding more difficult
because we need to wrap it with an extra C function [3]. From that
perspective, it would be great if those could move to xsk.c.

[1] - https://github.com/aterlo/afxdp-rs
[2] - https://github.com/alexforster/libbpf-sys
[3] - https://github.com/alexforster/libbpf-sys/blob/master/bindings.c
Magnus Karlsson Feb. 17, 2021, 7:16 a.m. UTC | #25
On Wed, Feb 17, 2021 at 3:26 AM Dan Siemon <dan@coverfire.com> wrote:
>
> On Mon, 2021-02-15 at 22:38 +0100, Toke Høiland-Jørgensen wrote:
> > The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff
> > to
> > libxdp (so the socket stuff in xsk.h). We're adding the existing code
> > wholesale, and keeping API compatibility during the move, so all
> > that's
> > needed is adding -lxdp when compiling. And obviously the existing
> > libbpf
> > code isn't going anywhere until such a time as there's a general
> > backwards compatibility-breaking deprecation in libbpf (which I
> > believe
> > Andrii is planning to do in an upcoming and as-of-yet unannounced
> > v1.0
> > release).
>
> I maintain a Rust binding to the AF_XDP parts of libbpf [1][2]. On the
> chance that more significant changes can be entertained in the switch
> to libxdp... The fact that many required functions like the ring access
> functions exist only in xsk.h makes building a binding more difficult
> because we need to wrap it with an extra C function [3]. From that
> perspective, it would be great if those could move to xsk.c.

The only reason they were put in xsk.h is performance. But with LTO
(link-time optimizations) being present in most C-compilers these
days, it might not be a valid argument anymore. I will perform some
experiments and let you know. As you say, it would be much nicer to
hide away these functions in the library proper and make your life
easier.

> [1] - https://github.com/aterlo/afxdp-rs
> [2] - https://github.com/alexforster/libbpf-sys
> [3] - https://github.com/alexforster/libbpf-sys/blob/master/bindings.c
>
Magnus Karlsson Feb. 17, 2021, 7:36 a.m. UTC | #26
On Wed, Feb 17, 2021 at 8:16 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Wed, Feb 17, 2021 at 3:26 AM Dan Siemon <dan@coverfire.com> wrote:
> >
> > On Mon, 2021-02-15 at 22:38 +0100, Toke Høiland-Jørgensen wrote:
> > > The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff
> > > to
> > > libxdp (so the socket stuff in xsk.h). We're adding the existing code
> > > wholesale, and keeping API compatibility during the move, so all
> > > that's
> > > needed is adding -lxdp when compiling. And obviously the existing
> > > libbpf
> > > code isn't going anywhere until such a time as there's a general
> > > backwards compatibility-breaking deprecation in libbpf (which I
> > > believe
> > > Andrii is planning to do in an upcoming and as-of-yet unannounced
> > > v1.0
> > > release).
> >
> > I maintain a Rust binding to the AF_XDP parts of libbpf [1][2]. On the
> > chance that more significant changes can be entertained in the switch
> > to libxdp... The fact that many required functions like the ring access
> > functions exist only in xsk.h makes building a binding more difficult
> > because we need to wrap it with an extra C function [3]. From that
> > perspective, it would be great if those could move to xsk.c.
>
> The only reason they were put in xsk.h is performance. But with LTO
> (link-time optimizations) being present in most C-compilers these
> days, it might not be a valid argument anymore. I will perform some
> experiments and let you know. As you say, it would be much nicer to
> hide away these functions in the library proper and make your life
> easier.

I would be very grateful for any more suggested changes that users out
there would like to see. Now, when we move to libxdp is the perfect
chance to fix those things. We might even decide to partially break
compatibility or change some behavior if the gain is large enough.

> > [1] - https://github.com/aterlo/afxdp-rs
> > [2] - https://github.com/alexforster/libbpf-sys
> > [3] - https://github.com/alexforster/libbpf-sys/blob/master/bindings.c
> >
Andrii Nakryiko Feb. 23, 2021, 1:15 a.m. UTC | #27
On Tue, Feb 16, 2021 at 2:37 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> John Fastabend <john.fastabend@gmail.com> writes:
>
> > Toke Høiland-Jørgensen wrote:
> >> John Fastabend <john.fastabend@gmail.com> writes:
> >>
> >> >> > However, in libxdp we can solve the original problem in a different way,
> >> >> > and in fact I already suggested to Magnus that we should do this (see
> >> >> > [1]); so one way forward could be to address it during the merge in
> >> >> > libxdp? It should be possible to address the original issue (two
> >> >> > instances of xdpsock breaking each other when they exit), but
> >> >> > applications will still need to do an explicit unload operation before
> >> >> > exiting (i.e., the automatic detach on bpf_link fd closure will take
> >> >> > more work, and likely require extending the bpf_link kernel support)...
> >> >> >
> >> >>
> >> >> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If
> >> >> we're months ahead, then I'd really like to see this in libbpf until the
> >> >> merge. However, I'll leave that for Magnus/you to decide!
> >> >
> >> > Did I miss some thread? What does this mean libbpf 1.0/libxdp merge?
> >>
> >> The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff to
> >> libxdp (so the socket stuff in xsk.h). We're adding the existing code
> >> wholesale, and keeping API compatibility during the move, so all that's
> >> needed is adding -lxdp when compiling. And obviously the existing libbpf
> >> code isn't going anywhere until such a time as there's a general
> >> backwards compatibility-breaking deprecation in libbpf (which I believe
> >> Andrii is planning to do in an upcoming and as-of-yet unannounced v1.0
> >> release).
> >
> > OK, I would like to keep the basic XDP pieces in libbpf though. For example
> > bpf_program__attach_xdp(). This way we don't have one lib to attach
> > everything, but XDP.
>
> The details are still TDB; for now, we're just merging in the XSK code
> to the libxdp repo. I expect Andrii to announce his plans for the rest
> soonish. I wouldn't expect basic things like that to go away, though :)

Yeah, I'll probably post more details this week. Just catching up on
stuff after vacation.

As mentioned already, all the basic APIs (i.e., APIs like
bpf_program__attach_xdp and bpf_set_link_xdp_fd, though I hope we can
give the latter a better name) will stay intact. Stay tuned!

>
> >>
> >> While integrating the XSK code into libxdp we're trying to make it
> >> compatible with the rest of the library (i.e., multi-prog). Hence my
> >> preference to avoid introducing something that makes this harder :)
> >>
> >> -Toke
> >>
> >
> > OK that makes sense to me thanks. But, I'm missing something (maybe its
> > obvious to everyone else?).
> >
> > When you load an XDP program you should get a reference to it. And then
> > XDP program should never be unloaded until that id is removed right? It
> > may or may not have an xsk map. Why does adding/removing programs from
> > an associated map have any impact on the XDP program? That seems like
> > the buggy part to me. No other map behaves this way as far as I can
> > tell. Now if the program with the XDP reference closes without pinning
> > the map or otherwise doing something with it, sure the map gets destroyed
> > and any xsk sockets are lost.
>
> The original bug comes from the XSK code abstracting away the fact that
> an AF_XDP socket needs an XDP program on the interface to work; so if
> none exists, the library will just load a program that redirects into
> the socket. Which breaks since the xdpsock example application is trying
> to be nice and clean up after itself, by removing the XDP program when
> it's done with the socket, thus breaking any other programs using that
> XDP program. So this patch introduces proper synchronisation on both add
> and remove of the XDP program...
>
> -Toke
>
diff mbox series

Patch

diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 20500fb1f17e..5911868efa43 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -28,6 +28,7 @@ 
 #include <sys/mman.h>
 #include <sys/socket.h>
 #include <sys/types.h>
+#include <linux/if_link.h>
 
 #include "bpf.h"
 #include "libbpf.h"
@@ -70,6 +71,7 @@  struct xsk_ctx {
 	int ifindex;
 	struct list_head list;
 	int prog_fd;
+	int link_fd;
 	int xsks_map_fd;
 	char ifname[IFNAMSIZ];
 };
@@ -409,7 +411,7 @@  static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 	static const int log_buf_size = 16 * 1024;
 	struct xsk_ctx *ctx = xsk->ctx;
 	char log_buf[log_buf_size];
-	int err, prog_fd;
+	int prog_fd;
 
 	/* This is the fallback C-program:
 	 * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
@@ -499,14 +501,47 @@  static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 		return prog_fd;
 	}
 
-	err = bpf_set_link_xdp_fd(xsk->ctx->ifindex, prog_fd,
-				  xsk->config.xdp_flags);
-	if (err) {
-		close(prog_fd);
-		return err;
+	ctx->prog_fd = prog_fd;
+	return 0;
+}
+
+static int xsk_create_bpf_link(struct xsk_socket *xsk)
+{
+	/* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags
+	 * might have set XDP_FLAGS_UPDATE_IF_NOEXIST
+	 */
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
+			    .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES));
+	struct xsk_ctx *ctx = xsk->ctx;
+	__u32 prog_id;
+	int link_fd;
+	int err;
+
+	/* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any,
+	 * so that bpf_link can be attached
+	 */
+	if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags);
+		if (err) {
+			pr_warn("getting XDP prog id failed\n");
+			return err;
+		}
+		if (prog_id) {
+			err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0);
+			if (err < 0) {
+				pr_warn("detaching XDP prog failed\n");
+				return err;
+			}
+		}
 	}
 
-	ctx->prog_fd = prog_fd;
+	link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts);
+	if (link_fd < 0) {
+		pr_warn("bpf_link_create failed: %s\n", strerror(errno));
+		return link_fd;
+	}
+
+	ctx->link_fd = link_fd;
 	return 0;
 }
 
@@ -625,8 +660,32 @@  static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
 	}
 
 	err = 0;
-	if (ctx->xsks_map_fd == -1)
+	if (ctx->xsks_map_fd != -1)
+		goto out_map_ids;
+
+	/* if prog load is forced and we found bpf_link on a given ifindex BUT
+	 * the underlying XDP prog is not an AF-XDP one, close old prog's fd,
+	 * load AF-XDP and update existing bpf_link
+	 */
+	if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) {
+		close(ctx->prog_fd);
+		err = xsk_create_bpf_maps(xsk);
+		if (err)
+			goto out_map_ids;
+
+		err = xsk_load_xdp_prog(xsk);
+		if (err) {
+			xsk_delete_bpf_maps(xsk);
+			goto out_map_ids;
+		}
+
+		/* if update failed, caller will close fds */
+		err = bpf_link_update(ctx->link_fd, ctx->prog_fd, 0);
+		if (err)
+			pr_warn("bpf_link_update failed: %s\n", strerror(errno));
+	} else {
 		err = -ENOENT;
+	}
 
 out_map_ids:
 	free(map_ids);
@@ -666,6 +725,49 @@  static int xsk_create_xsk_struct(int ifindex, struct xsk_socket *xsk)
 	return 0;
 }
 
+static int xsk_link_lookup(struct xsk_ctx *ctx, __u32 *prog_id)
+{
+	__u32 link_len = sizeof(struct bpf_link_info);
+	struct bpf_link_info link_info;
+	__u32 id = 0;
+	int err;
+	int fd;
+
+	while (true) {
+		err = bpf_link_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT)
+				break;
+			pr_warn("can't get next link: %s\n", strerror(errno));
+			break;
+		}
+
+		fd = bpf_link_get_fd_by_id(id);
+		if (fd < 0) {
+			if (errno == ENOENT)
+				continue;
+			pr_warn("can't get link by id (%u): %s\n", id, strerror(errno));
+			break;
+		}
+
+		memset(&link_info, 0, link_len);
+		err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len);
+		if (err) {
+			pr_warn("can't get link info: %s\n", strerror(errno));
+			close(fd);
+			break;
+		}
+		if (link_info.xdp.ifindex == ctx->ifindex) {
+			ctx->link_fd = fd;
+			*prog_id = link_info.prog_id;
+			break;
+		}
+		close(fd);
+	}
+
+	return errno == ENOENT ? 0 : err;
+}
+
 static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp,
 				int *xsks_map_fd)
 {
@@ -674,8 +776,7 @@  static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp,
 	__u32 prog_id = 0;
 	int err;
 
-	err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id,
-				  xsk->config.xdp_flags);
+	err = xsk_link_lookup(ctx, &prog_id);
 	if (err)
 		return err;
 
@@ -685,9 +786,12 @@  static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp,
 			return err;
 
 		err = xsk_load_xdp_prog(xsk);
-		if (err) {
+		if (err)
 			goto err_load_xdp_prog;
-		}
+
+		err = xsk_create_bpf_link(xsk);
+		if (err)
+			goto err_create_bpf_link;
 	} else {
 		ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id);
 		if (ctx->prog_fd < 0)
@@ -695,20 +799,15 @@  static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp,
 		err = xsk_lookup_bpf_maps(xsk);
 		if (err) {
 			close(ctx->prog_fd);
+			close(ctx->link_fd);
 			return err;
 		}
 	}
 
 	if (xsk->rx) {
 		err = xsk_set_bpf_maps(xsk);
-		if (err) {
-			if (!prog_id) {
-				goto err_set_bpf_maps;
-			} else {
-				close(ctx->prog_fd);
-				return err;
-			}
-		}
+		if (err)
+			goto err_set_bpf_maps;
 	}
 	if (xsks_map_fd)
 		*xsks_map_fd = ctx->xsks_map_fd;
@@ -716,8 +815,9 @@  static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp,
 	return 0;
 
 err_set_bpf_maps:
+	close(ctx->link_fd);
+err_create_bpf_link:
 	close(ctx->prog_fd);
-	bpf_set_link_xdp_fd(ctx->ifindex, -1, 0);
 err_load_xdp_prog:
 	xsk_delete_bpf_maps(xsk);
 
@@ -1053,6 +1153,7 @@  void xsk_socket__delete(struct xsk_socket *xsk)
 	if (ctx->prog_fd != -1) {
 		xsk_delete_bpf_maps(xsk);
 		close(ctx->prog_fd);
+		close(ctx->link_fd);
 	}
 
 	err = xsk_get_mmap_offsets(xsk->fd, &off);