Message ID | 20210215154638.4627-2-maciej.fijalkowski@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | Introduce bpf_link in libbpf's xsk | expand |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > Currently, if there are multiple xdpsock instances running on a single > interface and in case one of the instances is terminated, the rest of > them are left in an inoperable state due to the fact of unloaded XDP > prog from interface. > > To address that, step away from setting bpf prog in favour of bpf_link. > This means that refcounting of BPF resources will be done automatically > by bpf_link itself. > > When setting up BPF resources during xsk socket creation, check whether > bpf_link for a given ifindex already exists via set of calls to > bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > and comparing the ifindexes from bpf_link and xsk socket. One consideration here is that bpf_link_get_fd_by_id() is a privileged operation (privileged as in CAP_SYS_ADMIN), so this has the side effect of making AF_XDP privileged as well. Is that the intention? Another is that the AF_XDP code is in the process of moving to libxdp (see in-progress PR [0]), and this approach won't carry over as-is to that model, because libxdp has to pin the bpf_link fds. However, in libxdp we can solve the original problem in a different way, and in fact I already suggested to Magnus that we should do this (see [1]); so one way forward could be to address it during the merge in libxdp? It should be possible to address the original issue (two instances of xdpsock breaking each other when they exit), but applications will still need to do an explicit unload operation before exiting (i.e., the automatic detach on bpf_link fd closure will take more work, and likely require extending the bpf_link kernel support)... -Toke [0] https://github.com/xdp-project/xdp-tools/pull/92 [1] https://github.com/xdp-project/xdp-tools/pull/92#discussion_r576204719
On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote: > Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > >> Currently, if there are multiple xdpsock instances running on a single >> interface and in case one of the instances is terminated, the rest of >> them are left in an inoperable state due to the fact of unloaded XDP >> prog from interface. >> >> To address that, step away from setting bpf prog in favour of bpf_link. >> This means that refcounting of BPF resources will be done automatically >> by bpf_link itself. >> >> When setting up BPF resources during xsk socket creation, check whether >> bpf_link for a given ifindex already exists via set of calls to >> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd >> and comparing the ifindexes from bpf_link and xsk socket. > > One consideration here is that bpf_link_get_fd_by_id() is a privileged > operation (privileged as in CAP_SYS_ADMIN), so this has the side effect > of making AF_XDP privileged as well. Is that the intention? > We're already using, e.g., bpf_map_get_fd_by_id() which has that as well. So we're assuming that for XDP setup already! > Another is that the AF_XDP code is in the process of moving to libxdp > (see in-progress PR [0]), and this approach won't carry over as-is to > that model, because libxdp has to pin the bpf_link fds. > I was assuming there were two modes of operations for AF_XDP in libxdp. One which is with the multi-program support (which AFAIK is why the pinning is required), and one "like the current libbpf" one. For the latter Maciej's series would be a good fit, no? > However, in libxdp we can solve the original problem in a different way, > and in fact I already suggested to Magnus that we should do this (see > [1]); so one way forward could be to address it during the merge in > libxdp? It should be possible to address the original issue (two > instances of xdpsock breaking each other when they exit), but > applications will still need to do an explicit unload operation before > exiting (i.e., the automatic detach on bpf_link fd closure will take > more work, and likely require extending the bpf_link kernel support)... > I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If we're months ahead, then I'd really like to see this in libbpf until the merge. However, I'll leave that for Magnus/you to decide! Bottom line; I'd *really* like bpf_link behavior (process scoped) for AF_XDP sooner than later! ;-) Thanks for the input! Björn > -Toke > > [0] https://github.com/xdp-project/xdp-tools/pull/92 > [1] https://github.com/xdp-project/xdp-tools/pull/92#discussion_r576204719 >
Björn Töpel <bjorn.topel@intel.com> writes: > On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote: >> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: >> >>> Currently, if there are multiple xdpsock instances running on a single >>> interface and in case one of the instances is terminated, the rest of >>> them are left in an inoperable state due to the fact of unloaded XDP >>> prog from interface. >>> >>> To address that, step away from setting bpf prog in favour of bpf_link. >>> This means that refcounting of BPF resources will be done automatically >>> by bpf_link itself. >>> >>> When setting up BPF resources during xsk socket creation, check whether >>> bpf_link for a given ifindex already exists via set of calls to >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd >>> and comparing the ifindexes from bpf_link and xsk socket. >> >> One consideration here is that bpf_link_get_fd_by_id() is a privileged >> operation (privileged as in CAP_SYS_ADMIN), so this has the side effect >> of making AF_XDP privileged as well. Is that the intention? >> > > We're already using, e.g., bpf_map_get_fd_by_id() which has that > as well. So we're assuming that for XDP setup already! Ah, right, didn't realise that one is CAP_SYS_ADMIN as well; I remembered this as being specific to the bpf_link operation. >> Another is that the AF_XDP code is in the process of moving to libxdp >> (see in-progress PR [0]), and this approach won't carry over as-is to >> that model, because libxdp has to pin the bpf_link fds. >> > > I was assuming there were two modes of operations for AF_XDP in libxdp. > One which is with the multi-program support (which AFAIK is why the > pinning is required), and one "like the current libbpf" one. For the > latter Maciej's series would be a good fit, no? We haven't added an explicit mode switch for now; libxdp will fall back to regular interface attach if the kernel doesn't support the needed features for multi-attach, but if it's possible to just have libxdp transparently do the right thing I'd much prefer that. So we're still exploring that (part of which is that Magnus has promised to run some performance tests to see if there's a difference). However, even if there's an explicit mode switch I'd like to avoid different *semantics* between the two modes if possible, to keep the two as compatible as possible. And since we can't currently do "auto-detach on bpf_link fd close" when using multi-prog, introducing this now would lead to just such a semantic difference. So my preference would be to do it differently... :) >> However, in libxdp we can solve the original problem in a different way, >> and in fact I already suggested to Magnus that we should do this (see >> [1]); so one way forward could be to address it during the merge in >> libxdp? It should be possible to address the original issue (two >> instances of xdpsock breaking each other when they exit), but >> applications will still need to do an explicit unload operation before >> exiting (i.e., the automatic detach on bpf_link fd closure will take >> more work, and likely require extending the bpf_link kernel support)... >> > > I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If > we're months ahead, then I'd really like to see this in libbpf until the > merge. However, I'll leave that for Magnus/you to decide! Well, as far as libxdp support goes, the PR I linked is pretty close to being mergeable. One of the few outstanding issues is whether we should solve just this issue before merging, actually :) Not sure exactly which timeframe Andrii is envisioning for libbpf 1.0, but last I heard he'll announce something next week. > Bottom line; I'd *really* like bpf_link behavior (process scoped) for > AF_XDP sooner than later! ;-) Totally agree that we should solve the multi-process coexistence problem! And as I said, I think we can do so in libxdp by using the same synchronisation mechanism we use for setting up the multi-prog dispatcher. So it doesn't *have* to hold things up :) -Toke
Björn Töpel wrote: > On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote: > > Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > > > >> Currently, if there are multiple xdpsock instances running on a single > >> interface and in case one of the instances is terminated, the rest of > >> them are left in an inoperable state due to the fact of unloaded XDP > >> prog from interface. I'm a bit confused by the above. This is only the case if the instance that terminated is the one that loaded the XDP program and it also didn't pin the program correct? If so lets make the commit message a bit more clear about the exact case we are solving. > >> > >> To address that, step away from setting bpf prog in favour of bpf_link. > >> This means that refcounting of BPF resources will be done automatically > >> by bpf_link itself. +1 > >> > >> When setting up BPF resources during xsk socket creation, check whether > >> bpf_link for a given ifindex already exists via set of calls to > >> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > >> and comparing the ifindexes from bpf_link and xsk socket. > > > > One consideration here is that bpf_link_get_fd_by_id() is a privileged > > operation (privileged as in CAP_SYS_ADMIN), so this has the side effect > > of making AF_XDP privileged as well. Is that the intention? > > > > We're already using, e.g., bpf_map_get_fd_by_id() which has that > as well. So we're assuming that for XDP setup already! > > > Another is that the AF_XDP code is in the process of moving to libxdp > > (see in-progress PR [0]), and this approach won't carry over as-is to > > that model, because libxdp has to pin the bpf_link fds. > > > > I was assuming there were two modes of operations for AF_XDP in libxdp. > One which is with the multi-program support (which AFAIK is why the > pinning is required), and one "like the current libbpf" one. For the > latter Maciej's series would be a good fit, no? > > > However, in libxdp we can solve the original problem in a different way, > > and in fact I already suggested to Magnus that we should do this (see > > [1]); so one way forward could be to address it during the merge in > > libxdp? It should be possible to address the original issue (two > > instances of xdpsock breaking each other when they exit), but > > applications will still need to do an explicit unload operation before > > exiting (i.e., the automatic detach on bpf_link fd closure will take > > more work, and likely require extending the bpf_link kernel support)... > > > > I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If > we're months ahead, then I'd really like to see this in libbpf until the > merge. However, I'll leave that for Magnus/you to decide! Did I miss some thread? What does this mean libbpf 1.0/libxdp merge? From my side libbpf should support the basic operations: load, attach, pin, and link for all my BPF objects. I view libxdp as providing 'extra' goodness on top of that. Everyone agree? > > Bottom line; I'd *really* like bpf_link behavior (process scoped) for > AF_XDP sooner than later! ;-) Because I use libbpf as my base management for BPF objects I want it to support the basic ops for all objects so link ops should land there. > > > Thanks for the input! > Björn > > > > -Toke > > > > [0] https://github.com/xdp-project/xdp-tools/pull/92 > > [1] https://github.com/xdp-project/xdp-tools/pull/92#discussion_r576204719 > >
Maciej Fijalkowski wrote: > Currently, if there are multiple xdpsock instances running on a single > interface and in case one of the instances is terminated, the rest of > them are left in an inoperable state due to the fact of unloaded XDP > prog from interface. > > To address that, step away from setting bpf prog in favour of bpf_link. > This means that refcounting of BPF resources will be done automatically > by bpf_link itself. > > When setting up BPF resources during xsk socket creation, check whether > bpf_link for a given ifindex already exists via set of calls to > bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > and comparing the ifindexes from bpf_link and xsk socket. > > If there's no bpf_link yet, create one for a given XDP prog and unload > explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. > > If bpf_link is already at a given ifindex and underlying program is not > AF-XDP one, bail out or update the bpf_link's prog given the presence of > XDP_FLAGS_UPDATE_IF_NOEXIST. > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > --- > tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++------- > 1 file changed, 122 insertions(+), 21 deletions(-) [...] > +static int xsk_create_bpf_link(struct xsk_socket *xsk) > +{ > + /* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags > + * might have set XDP_FLAGS_UPDATE_IF_NOEXIST > + */ > + DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts, > + .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES)); > + struct xsk_ctx *ctx = xsk->ctx; > + __u32 prog_id; > + int link_fd; > + int err; > + > + /* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any, > + * so that bpf_link can be attached > + */ > + if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) { > + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags); > + if (err) { > + pr_warn("getting XDP prog id failed\n"); > + return err; > + } > + if (prog_id) { > + err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0); > + if (err < 0) { > + pr_warn("detaching XDP prog failed\n"); > + return err; > + } > + } > } > > - ctx->prog_fd = prog_fd; > + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts); > + if (link_fd < 0) { > + pr_warn("bpf_link_create failed: %s\n", strerror(errno)); > + return link_fd; > + } > + This can leave the system in a bad state where it unloaded the XDP program above, but then failed to create the link. So we should somehow fix that if possible or at minimum put a note somewhere so users can't claim they shouldn't know this. Also related, its not good for real systems to let XDP program go missing for some period of time. I didn't check but we should make XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already. > + ctx->link_fd = link_fd; > return 0; > } > [...] > +static int xsk_link_lookup(struct xsk_ctx *ctx, __u32 *prog_id) > +{ > + __u32 link_len = sizeof(struct bpf_link_info); > + struct bpf_link_info link_info; > + __u32 id = 0; > + int err; > + int fd; > + > + while (true) { > + err = bpf_link_get_next_id(id, &id); > + if (err) { > + if (errno == ENOENT) > + break; > + pr_warn("can't get next link: %s\n", strerror(errno)); > + break; > + } > + > + fd = bpf_link_get_fd_by_id(id); > + if (fd < 0) { > + if (errno == ENOENT) > + continue; > + pr_warn("can't get link by id (%u): %s\n", id, strerror(errno)); > + break; > + } > + > + memset(&link_info, 0, link_len); > + err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len); > + if (err) { > + pr_warn("can't get link info: %s\n", strerror(errno)); > + close(fd); > + break; > + } > + if (link_info.xdp.ifindex == ctx->ifindex) { > + ctx->link_fd = fd; > + *prog_id = link_info.prog_id; > + break; > + } > + close(fd); > + } > + > + return errno == ENOENT ? 0 : err; But, err wont be set in fd < 0 case? I guess we don't want to return 0 if bpf_link_get_fd_by_id fails. Although I really don't like the construct here that much. I think just `return err` and ensuring err is set correctly would be more clear. At least the fd error case needs to be handled though. > +} > +
John Fastabend <john.fastabend@gmail.com> writes: >> > However, in libxdp we can solve the original problem in a different way, >> > and in fact I already suggested to Magnus that we should do this (see >> > [1]); so one way forward could be to address it during the merge in >> > libxdp? It should be possible to address the original issue (two >> > instances of xdpsock breaking each other when they exit), but >> > applications will still need to do an explicit unload operation before >> > exiting (i.e., the automatic detach on bpf_link fd closure will take >> > more work, and likely require extending the bpf_link kernel support)... >> > >> >> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If >> we're months ahead, then I'd really like to see this in libbpf until the >> merge. However, I'll leave that for Magnus/you to decide! > > Did I miss some thread? What does this mean libbpf 1.0/libxdp merge? The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff to libxdp (so the socket stuff in xsk.h). We're adding the existing code wholesale, and keeping API compatibility during the move, so all that's needed is adding -lxdp when compiling. And obviously the existing libbpf code isn't going anywhere until such a time as there's a general backwards compatibility-breaking deprecation in libbpf (which I believe Andrii is planning to do in an upcoming and as-of-yet unannounced v1.0 release). While integrating the XSK code into libxdp we're trying to make it compatible with the rest of the library (i.e., multi-prog). Hence my preference to avoid introducing something that makes this harder :) -Toke
Toke Høiland-Jørgensen wrote: > John Fastabend <john.fastabend@gmail.com> writes: > > >> > However, in libxdp we can solve the original problem in a different way, > >> > and in fact I already suggested to Magnus that we should do this (see > >> > [1]); so one way forward could be to address it during the merge in > >> > libxdp? It should be possible to address the original issue (two > >> > instances of xdpsock breaking each other when they exit), but > >> > applications will still need to do an explicit unload operation before > >> > exiting (i.e., the automatic detach on bpf_link fd closure will take > >> > more work, and likely require extending the bpf_link kernel support)... > >> > > >> > >> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If > >> we're months ahead, then I'd really like to see this in libbpf until the > >> merge. However, I'll leave that for Magnus/you to decide! > > > > Did I miss some thread? What does this mean libbpf 1.0/libxdp merge? > > The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff to > libxdp (so the socket stuff in xsk.h). We're adding the existing code > wholesale, and keeping API compatibility during the move, so all that's > needed is adding -lxdp when compiling. And obviously the existing libbpf > code isn't going anywhere until such a time as there's a general > backwards compatibility-breaking deprecation in libbpf (which I believe > Andrii is planning to do in an upcoming and as-of-yet unannounced v1.0 > release). OK, I would like to keep the basic XDP pieces in libbpf though. For example bpf_program__attach_xdp(). This way we don't have one lib to attach everything, but XDP. > > While integrating the XSK code into libxdp we're trying to make it > compatible with the rest of the library (i.e., multi-prog). Hence my > preference to avoid introducing something that makes this harder :) > > -Toke > OK that makes sense to me thanks. But, I'm missing something (maybe its obvious to everyone else?). When you load an XDP program you should get a reference to it. And then XDP program should never be unloaded until that id is removed right? It may or may not have an xsk map. Why does adding/removing programs from an associated map have any impact on the XDP program? That seems like the buggy part to me. No other map behaves this way as far as I can tell. Now if the program with the XDP reference closes without pinning the map or otherwise doing something with it, sure the map gets destroyed and any xsk sockets are lost. Thanks! John
On Mon, Feb 15, 2021 at 08:35:29PM +0100, Toke Høiland-Jørgensen wrote: > Björn Töpel <bjorn.topel@intel.com> writes: > > > On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote: > >> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > >> > >>> Currently, if there are multiple xdpsock instances running on a single > >>> interface and in case one of the instances is terminated, the rest of > >>> them are left in an inoperable state due to the fact of unloaded XDP > >>> prog from interface. > >>> > >>> To address that, step away from setting bpf prog in favour of bpf_link. > >>> This means that refcounting of BPF resources will be done automatically > >>> by bpf_link itself. > >>> > >>> When setting up BPF resources during xsk socket creation, check whether > >>> bpf_link for a given ifindex already exists via set of calls to > >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > >>> and comparing the ifindexes from bpf_link and xsk socket. > >> > >> One consideration here is that bpf_link_get_fd_by_id() is a privileged > >> operation (privileged as in CAP_SYS_ADMIN), so this has the side effect > >> of making AF_XDP privileged as well. Is that the intention? > >> > > > > We're already using, e.g., bpf_map_get_fd_by_id() which has that > > as well. So we're assuming that for XDP setup already! > > Ah, right, didn't realise that one is CAP_SYS_ADMIN as well; I > remembered this as being specific to the bpf_link operation. > > >> Another is that the AF_XDP code is in the process of moving to libxdp > >> (see in-progress PR [0]), and this approach won't carry over as-is to > >> that model, because libxdp has to pin the bpf_link fds. > >> > > > > I was assuming there were two modes of operations for AF_XDP in libxdp. > > One which is with the multi-program support (which AFAIK is why the > > pinning is required), and one "like the current libbpf" one. For the > > latter Maciej's series would be a good fit, no? > > We haven't added an explicit mode switch for now; libxdp will fall back > to regular interface attach if the kernel doesn't support the needed > features for multi-attach, but if it's possible to just have libxdp > transparently do the right thing I'd much prefer that. So we're still > exploring that (part of which is that Magnus has promised to run some > performance tests to see if there's a difference). > > However, even if there's an explicit mode switch I'd like to avoid > different *semantics* between the two modes if possible, to keep the two > as compatible as possible. And since we can't currently do "auto-detach > on bpf_link fd close" when using multi-prog, introducing this now would > lead to just such a semantic difference. So my preference would be to do > it differently... :) > > >> However, in libxdp we can solve the original problem in a different way, > >> and in fact I already suggested to Magnus that we should do this (see > >> [1]); so one way forward could be to address it during the merge in > >> libxdp? It should be possible to address the original issue (two > >> instances of xdpsock breaking each other when they exit), but > >> applications will still need to do an explicit unload operation before > >> exiting (i.e., the automatic detach on bpf_link fd closure will take > >> more work, and likely require extending the bpf_link kernel support)... > >> > > > > I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If > > we're months ahead, then I'd really like to see this in libbpf until the > > merge. However, I'll leave that for Magnus/you to decide! WDYM by libbpf 1.0/libxdp merge? I glanced through thread and I saw that John was also not aware of that. Not sure where it was discussed? If you're saying 'merge', then is libxdp going to be a part of kernel or as an AF-XDP related guy I would be forced to include yet another repository in the BPF developer toolchain? :< > > Well, as far as libxdp support goes, the PR I linked is pretty close to > being mergeable. One of the few outstanding issues is whether we should > solve just this issue before merging, actually :) > > Not sure exactly which timeframe Andrii is envisioning for libbpf 1.0, > but last I heard he'll announce something next week. > > > Bottom line; I'd *really* like bpf_link behavior (process scoped) for > > AF_XDP sooner than later! ;-) > > Totally agree that we should solve the multi-process coexistence > problem! And as I said, I think we can do so in libxdp by using the same > synchronisation mechanism we use for setting up the multi-prog > dispatcher. So it doesn't *have* to hold things up :) Am I reading this right or you're trying to reject the fix of the long standing issue due to a PR that is not ready yet on a standalone project/library? :P Once libxdp is the standard way of playing with AF-XDP and there are actual users of that, then I'm happy to work/help on that issue. Spawning a few xdpsock instances on the same interface has been a standard/easiest way of measuring the scalability of AF-XDP ZC implementations. This has been a real PITA for quite a long time. So, I second Bjorn's statement - the sooner we have this fixed, the better. Thanks! I hope I haven't sounded harsh, not my intent at all, Maciej > > -Toke >
On Mon, Feb 15, 2021 at 12:22:36PM -0800, John Fastabend wrote: > Björn Töpel wrote: > > On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote: > > > Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > > > > > >> Currently, if there are multiple xdpsock instances running on a single > > >> interface and in case one of the instances is terminated, the rest of > > >> them are left in an inoperable state due to the fact of unloaded XDP > > >> prog from interface. > > I'm a bit confused by the above. This is only the case if the instance > that terminated is the one that loaded the XDP program and it also didn't > pin the program correct? If so lets make the commit message a bit more > clear about the exact case we are solving. I can include the following from the cover letter: <quote> $ sudo ./xdpsock -i ens801f0 -t -q 10 // load xdp prog and xskmap and // add entry to xskmap at idx 10 $ sudo ./xdpsock -i ens801f0 -t -q 11 // add entry to xskmap at idx 11 terminate one of the processes and another one is unable to work due to the fact that the XDP prog was unloaded from interface. </quote> We don't do the prog pinning and it doesn't really matter which of the instances you terminate. > > > >> > > >> To address that, step away from setting bpf prog in favour of bpf_link. > > >> This means that refcounting of BPF resources will be done automatically > > >> by bpf_link itself. > > +1 > > > >> > > >> When setting up BPF resources during xsk socket creation, check whether > > >> bpf_link for a given ifindex already exists via set of calls to > > >> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > > >> and comparing the ifindexes from bpf_link and xsk socket. > > > > > > One consideration here is that bpf_link_get_fd_by_id() is a privileged > > > operation (privileged as in CAP_SYS_ADMIN), so this has the side effect > > > of making AF_XDP privileged as well. Is that the intention? > > > > > > > We're already using, e.g., bpf_map_get_fd_by_id() which has that > > as well. So we're assuming that for XDP setup already! > > > > > Another is that the AF_XDP code is in the process of moving to libxdp > > > (see in-progress PR [0]), and this approach won't carry over as-is to > > > that model, because libxdp has to pin the bpf_link fds. > > > > > > > I was assuming there were two modes of operations for AF_XDP in libxdp. > > One which is with the multi-program support (which AFAIK is why the > > pinning is required), and one "like the current libbpf" one. For the > > latter Maciej's series would be a good fit, no? > > > > > However, in libxdp we can solve the original problem in a different way, > > > and in fact I already suggested to Magnus that we should do this (see > > > [1]); so one way forward could be to address it during the merge in > > > libxdp? It should be possible to address the original issue (two > > > instances of xdpsock breaking each other when they exit), but > > > applications will still need to do an explicit unload operation before > > > exiting (i.e., the automatic detach on bpf_link fd closure will take > > > more work, and likely require extending the bpf_link kernel support)... > > > > > > > I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If > > we're months ahead, then I'd really like to see this in libbpf until the > > merge. However, I'll leave that for Magnus/you to decide! > > Did I miss some thread? What does this mean libbpf 1.0/libxdp merge? From > my side libbpf should support the basic operations: load, attach, pin, > and link for all my BPF objects. I view libxdp as providing 'extra' > goodness on top of that. Everyone agree? > > > > > Bottom line; I'd *really* like bpf_link behavior (process scoped) for > > AF_XDP sooner than later! ;-) > > Because I use libbpf as my base management for BPF objects I want it > to support the basic ops for all objects so link ops should land there. > > > > > > > Thanks for the input! > > Björn > > > > > > > -Toke > > > > > > [0] https://github.com/xdp-project/xdp-tools/pull/92 > > > [1] https://github.com/xdp-project/xdp-tools/pull/92#discussion_r576204719 > > > > >
On Mon, Feb 15, 2021 at 04:18:28PM -0800, John Fastabend wrote: > Toke Høiland-Jørgensen wrote: > > John Fastabend <john.fastabend@gmail.com> writes: > > > > >> > However, in libxdp we can solve the original problem in a different way, > > >> > and in fact I already suggested to Magnus that we should do this (see > > >> > [1]); so one way forward could be to address it during the merge in > > >> > libxdp? It should be possible to address the original issue (two > > >> > instances of xdpsock breaking each other when they exit), but > > >> > applications will still need to do an explicit unload operation before > > >> > exiting (i.e., the automatic detach on bpf_link fd closure will take > > >> > more work, and likely require extending the bpf_link kernel support)... > > >> > > > >> > > >> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If > > >> we're months ahead, then I'd really like to see this in libbpf until the > > >> merge. However, I'll leave that for Magnus/you to decide! > > > > > > Did I miss some thread? What does this mean libbpf 1.0/libxdp merge? > > > > The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff to > > libxdp (so the socket stuff in xsk.h). We're adding the existing code > > wholesale, and keeping API compatibility during the move, so all that's > > needed is adding -lxdp when compiling. And obviously the existing libbpf > > code isn't going anywhere until such a time as there's a general > > backwards compatibility-breaking deprecation in libbpf (which I believe > > Andrii is planning to do in an upcoming and as-of-yet unannounced v1.0 > > release). Once again, is libxdp going to land in th kernel? Still not clear to me. > > OK, I would like to keep the basic XDP pieces in libbpf though. For example > bpf_program__attach_xdp(). This way we don't have one lib to attach > everything, but XDP. > > > > > While integrating the XSK code into libxdp we're trying to make it > > compatible with the rest of the library (i.e., multi-prog). Hence my > > preference to avoid introducing something that makes this harder :) Do you see the issue with solving it one way in libbpf currently given that we can't really tell when the merge of libs is going to happen and the other way within the libxdp itself? > > > > -Toke > > > > OK that makes sense to me thanks. But, I'm missing something (maybe its > obvious to everyone else?). > > When you load an XDP program you should get a reference to it. And then > XDP program should never be unloaded until that id is removed right? It WDYM by 'that id is removed' ? > may or may not have an xsk map. Why does adding/removing programs from you meant adding/removing 'sockets'? > an associated map have any impact on the XDP program? That seems like > the buggy part to me. No other map behaves this way as far as I can > tell. Now if the program with the XDP reference closes without pinning > the map or otherwise doing something with it, sure the map gets destroyed > and any xsk sockets are lost. It's the XDP program that is getting lost, not XSKMAP. XDP prog presence determines whether your driver works in ZC or not. If XDP prog is gone then xdpsock bails out (or any other AF-XDP app won't be able to work). > > Thanks! > John
On Mon, Feb 15, 2021 at 12:49:27PM -0800, John Fastabend wrote: > Maciej Fijalkowski wrote: > > Currently, if there are multiple xdpsock instances running on a single > > interface and in case one of the instances is terminated, the rest of > > them are left in an inoperable state due to the fact of unloaded XDP > > prog from interface. > > > > To address that, step away from setting bpf prog in favour of bpf_link. > > This means that refcounting of BPF resources will be done automatically > > by bpf_link itself. > > > > When setting up BPF resources during xsk socket creation, check whether > > bpf_link for a given ifindex already exists via set of calls to > > bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > > and comparing the ifindexes from bpf_link and xsk socket. > > > > If there's no bpf_link yet, create one for a given XDP prog and unload > > explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. > > > > If bpf_link is already at a given ifindex and underlying program is not > > AF-XDP one, bail out or update the bpf_link's prog given the presence of > > XDP_FLAGS_UPDATE_IF_NOEXIST. > > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > > --- > > tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++------- > > 1 file changed, 122 insertions(+), 21 deletions(-) > > [...] > > > +static int xsk_create_bpf_link(struct xsk_socket *xsk) > > +{ > > + /* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags > > + * might have set XDP_FLAGS_UPDATE_IF_NOEXIST > > + */ > > + DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts, > > + .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES)); > > + struct xsk_ctx *ctx = xsk->ctx; > > + __u32 prog_id; > > + int link_fd; > > + int err; > > + > > + /* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any, > > + * so that bpf_link can be attached > > + */ > > + if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) { > > + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags); > > + if (err) { > > + pr_warn("getting XDP prog id failed\n"); > > + return err; > > + } > > + if (prog_id) { > > + err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0); > > + if (err < 0) { > > + pr_warn("detaching XDP prog failed\n"); > > + return err; > > + } > > + } > > } > > > > - ctx->prog_fd = prog_fd; > > + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts); > > + if (link_fd < 0) { > > + pr_warn("bpf_link_create failed: %s\n", strerror(errno)); > > + return link_fd; > > + } > > + > > This can leave the system in a bad state where it unloaded the XDP program > above, but then failed to create the link. So we should somehow fix that > if possible or at minimum put a note somewhere so users can't claim they > shouldn't know this. > > Also related, its not good for real systems to let XDP program go missing > for some period of time. I didn't check but we should make > XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already. Old way of attaching prog is mutual exclusive with bpf_link, right? What I'm saying is in order to use one of the two, you need to wipe out the current one in favour of the second that you would like to load. > > > + ctx->link_fd = link_fd; > > return 0; > > } > > > > [...] > > > +static int xsk_link_lookup(struct xsk_ctx *ctx, __u32 *prog_id) > > +{ > > + __u32 link_len = sizeof(struct bpf_link_info); > > + struct bpf_link_info link_info; > > + __u32 id = 0; > > + int err; > > + int fd; > > + > > + while (true) { > > + err = bpf_link_get_next_id(id, &id); > > + if (err) { > > + if (errno == ENOENT) > > + break; > > + pr_warn("can't get next link: %s\n", strerror(errno)); > > + break; > > + } > > + > > + fd = bpf_link_get_fd_by_id(id); > > + if (fd < 0) { > > + if (errno == ENOENT) > > + continue; > > + pr_warn("can't get link by id (%u): %s\n", id, strerror(errno)); > > + break; > > + } > > + > > + memset(&link_info, 0, link_len); > > + err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len); > > + if (err) { > > + pr_warn("can't get link info: %s\n", strerror(errno)); > > + close(fd); > > + break; > > + } > > + if (link_info.xdp.ifindex == ctx->ifindex) { > > + ctx->link_fd = fd; > > + *prog_id = link_info.prog_id; > > + break; > > + } > > + close(fd); > > + } > > + > > + return errno == ENOENT ? 0 : err; > > But, err wont be set in fd < 0 case? I guess we don't want to return 0 if > bpf_link_get_fd_by_id fails. Good catch! > Although I really don't like the construct > here that much. I think just `return err` and ensuring err is set correctly > would be more clear. At least the fd error case needs to be handled > though. FWIW, this was inspired by tools/bpf/bpftool/link.c:do_show() > > > +} > > +
On 2021-02-16 03:01, Maciej Fijalkowski wrote: > On Mon, Feb 15, 2021 at 08:35:29PM +0100, Toke Høiland-Jørgensen wrote: >> Björn Töpel <bjorn.topel@intel.com> writes: [...] >>> >>> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If >>> we're months ahead, then I'd really like to see this in libbpf until the >>> merge. However, I'll leave that for Magnus/you to decide! > > WDYM by libbpf 1.0/libxdp merge? I glanced through thread and I saw that > John was also not aware of that. Not sure where it was discussed? > Oh, right. Yeah, we've had some offlist discussions about moving the AF_XDP functionality from libbpf to libxdp in the libbpf 1.0 timeframe. > If you're saying 'merge', then is libxdp going to be a part of kernel or > as an AF-XDP related guy I would be forced to include yet another > repository in the BPF developer toolchain? :< > The AF_XDP functionality of libbpf will be part of libxdp, which is not in the kernel tree. libxdp depend on libbpf, which includes the core BPF functionality. For AF_XDP this is a good thing IMO. libxdp includes more higher lever abstractions than libbpf, which is more aligned to AF_XDP. Yes, that would mean that you would get another dependency for AF_XDP, and one that is not in the kernel tree. For most *users* this is not a problem, in fact it might be easier to consume and to contribute for most users. We can't optimize just for the kernel hackers. ;-) Björn [...]
On 2021-02-15 21:49, John Fastabend wrote: > Maciej Fijalkowski wrote: >> Currently, if there are multiple xdpsock instances running on a single >> interface and in case one of the instances is terminated, the rest of >> them are left in an inoperable state due to the fact of unloaded XDP >> prog from interface. >> >> To address that, step away from setting bpf prog in favour of bpf_link. >> This means that refcounting of BPF resources will be done automatically >> by bpf_link itself. >> >> When setting up BPF resources during xsk socket creation, check whether >> bpf_link for a given ifindex already exists via set of calls to >> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd >> and comparing the ifindexes from bpf_link and xsk socket. >> >> If there's no bpf_link yet, create one for a given XDP prog and unload >> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. >> >> If bpf_link is already at a given ifindex and underlying program is not >> AF-XDP one, bail out or update the bpf_link's prog given the presence of >> XDP_FLAGS_UPDATE_IF_NOEXIST. >> >> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> >> --- >> tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++------- >> 1 file changed, 122 insertions(+), 21 deletions(-) > > [...] > >> +static int xsk_create_bpf_link(struct xsk_socket *xsk) >> +{ >> + /* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags >> + * might have set XDP_FLAGS_UPDATE_IF_NOEXIST >> + */ >> + DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts, >> + .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES)); >> + struct xsk_ctx *ctx = xsk->ctx; >> + __u32 prog_id; >> + int link_fd; >> + int err; >> + >> + /* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any, >> + * so that bpf_link can be attached >> + */ >> + if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) { >> + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags); >> + if (err) { >> + pr_warn("getting XDP prog id failed\n"); >> + return err; >> + } >> + if (prog_id) { >> + err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0); >> + if (err < 0) { >> + pr_warn("detaching XDP prog failed\n"); >> + return err; >> + } >> + } >> } >> >> - ctx->prog_fd = prog_fd; >> + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts); >> + if (link_fd < 0) { >> + pr_warn("bpf_link_create failed: %s\n", strerror(errno)); >> + return link_fd; >> + } >> + > > This can leave the system in a bad state where it unloaded the XDP program > above, but then failed to create the link. So we should somehow fix that > if possible or at minimum put a note somewhere so users can't claim they > shouldn't know this. > > Also related, its not good for real systems to let XDP program go missing > for some period of time. I didn't check but we should make > XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already. > This is the default for XDP sockets library. The "bpf_set_link_xdp_fd(...-1)" way is only when a user sets it explicitly. One could maybe argue that the "force remove" would be out of scope for AF_XDP; Meaning that if an XDP program is running, attached via netlink, the AF_XDP library simply cannot remove it. The user would need to rely on some other mechanism. Björn [...]
On 2021-02-16 03:23, Maciej Fijalkowski wrote: > On Mon, Feb 15, 2021 at 04:18:28PM -0800, John Fastabend wrote: [...] > > Once again, is libxdp going to land in th kernel? Still not clear to me. > No, libxdp does not live in the kernel tree. Björn [...]
Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > On Mon, Feb 15, 2021 at 08:35:29PM +0100, Toke Høiland-Jørgensen wrote: >> Björn Töpel <bjorn.topel@intel.com> writes: >> >> > On 2021-02-15 18:07, Toke Høiland-Jørgensen wrote: >> >> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: >> >> >> >>> Currently, if there are multiple xdpsock instances running on a single >> >>> interface and in case one of the instances is terminated, the rest of >> >>> them are left in an inoperable state due to the fact of unloaded XDP >> >>> prog from interface. >> >>> >> >>> To address that, step away from setting bpf prog in favour of bpf_link. >> >>> This means that refcounting of BPF resources will be done automatically >> >>> by bpf_link itself. >> >>> >> >>> When setting up BPF resources during xsk socket creation, check whether >> >>> bpf_link for a given ifindex already exists via set of calls to >> >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd >> >>> and comparing the ifindexes from bpf_link and xsk socket. >> >> >> >> One consideration here is that bpf_link_get_fd_by_id() is a privileged >> >> operation (privileged as in CAP_SYS_ADMIN), so this has the side effect >> >> of making AF_XDP privileged as well. Is that the intention? >> >> >> > >> > We're already using, e.g., bpf_map_get_fd_by_id() which has that >> > as well. So we're assuming that for XDP setup already! >> >> Ah, right, didn't realise that one is CAP_SYS_ADMIN as well; I >> remembered this as being specific to the bpf_link operation. >> >> >> Another is that the AF_XDP code is in the process of moving to libxdp >> >> (see in-progress PR [0]), and this approach won't carry over as-is to >> >> that model, because libxdp has to pin the bpf_link fds. >> >> >> > >> > I was assuming there were two modes of operations for AF_XDP in libxdp. >> > One which is with the multi-program support (which AFAIK is why the >> > pinning is required), and one "like the current libbpf" one. For the >> > latter Maciej's series would be a good fit, no? >> >> We haven't added an explicit mode switch for now; libxdp will fall back >> to regular interface attach if the kernel doesn't support the needed >> features for multi-attach, but if it's possible to just have libxdp >> transparently do the right thing I'd much prefer that. So we're still >> exploring that (part of which is that Magnus has promised to run some >> performance tests to see if there's a difference). >> >> However, even if there's an explicit mode switch I'd like to avoid >> different *semantics* between the two modes if possible, to keep the two >> as compatible as possible. And since we can't currently do "auto-detach >> on bpf_link fd close" when using multi-prog, introducing this now would >> lead to just such a semantic difference. So my preference would be to do >> it differently... :) >> >> >> However, in libxdp we can solve the original problem in a different way, >> >> and in fact I already suggested to Magnus that we should do this (see >> >> [1]); so one way forward could be to address it during the merge in >> >> libxdp? It should be possible to address the original issue (two >> >> instances of xdpsock breaking each other when they exit), but >> >> applications will still need to do an explicit unload operation before >> >> exiting (i.e., the automatic detach on bpf_link fd closure will take >> >> more work, and likely require extending the bpf_link kernel support)... >> >> >> > >> > I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If >> > we're months ahead, then I'd really like to see this in libbpf until the >> > merge. However, I'll leave that for Magnus/you to decide! > > WDYM by libbpf 1.0/libxdp merge? I glanced through thread and I saw that > John was also not aware of that. Not sure where it was discussed? > > If you're saying 'merge', then is libxdp going to be a part of kernel or > as an AF-XDP related guy I would be forced to include yet another > repository in the BPF developer toolchain? :< As I replied to John, we're trying to do this in as compatible and painless a way as possible. In particular, we'll maintain (source) API compatibility with the code currently in libbpf. And yeah, currently libxdp lives outside the kernel tree. Not sure whether we'll end up moving it into the kernel tree, as Björn noted up-thread there are arguments in both directions :) >> Well, as far as libxdp support goes, the PR I linked is pretty close to >> being mergeable. One of the few outstanding issues is whether we should >> solve just this issue before merging, actually :) >> >> Not sure exactly which timeframe Andrii is envisioning for libbpf 1.0, >> but last I heard he'll announce something next week. >> >> > Bottom line; I'd *really* like bpf_link behavior (process scoped) for >> > AF_XDP sooner than later! ;-) >> >> Totally agree that we should solve the multi-process coexistence >> problem! And as I said, I think we can do so in libxdp by using the same >> synchronisation mechanism we use for setting up the multi-prog >> dispatcher. So it doesn't *have* to hold things up :) > > Am I reading this right or you're trying to reject the fix of the long > standing issue due to a PR that is not ready yet on a standalone > project/library? :P Haha, no, that is not what I'm saying. As I said up-thread I agree that this is something we should fix, obviously. I'm just suggesting we fix it in a way that will also be compatible with libxdp and multiprog so we won't have to introduce yet-another-flag that users will have to decide on. However, now that I'm looking at your patch again I think my fears may have been overblown. I got hung up on the bit in the commit message where you said "refcounting of BPF resources will be done automatically by bpf_link itself", and wrongly assumed that you were exposing the bpf_link fd to the application. But I see now that it's kept in the private xsk_ctx structure, and applications still just call xsk_socket__delete(). So in libxdp we can just implement the same API with a different synchronisation mechanism; that's fine. Apologies for jumping to conclusions :/ > Once libxdp is the standard way of playing with AF-XDP and there are > actual users of that, then I'm happy to work/help on that issue. That is good to know, thanks! I opened an issue in the xdp-tools repository to track this for the libxdp side (Magnus and I agreed that we'll merge the existing code first, then fix this on top): https://github.com/xdp-project/xdp-tools/issues/93 As noted above, the mechanism may have to change, but I think it's possible to achieve the same thing in a libxdp context :) -Toke
John Fastabend <john.fastabend@gmail.com> writes: > Toke Høiland-Jørgensen wrote: >> John Fastabend <john.fastabend@gmail.com> writes: >> >> >> > However, in libxdp we can solve the original problem in a different way, >> >> > and in fact I already suggested to Magnus that we should do this (see >> >> > [1]); so one way forward could be to address it during the merge in >> >> > libxdp? It should be possible to address the original issue (two >> >> > instances of xdpsock breaking each other when they exit), but >> >> > applications will still need to do an explicit unload operation before >> >> > exiting (i.e., the automatic detach on bpf_link fd closure will take >> >> > more work, and likely require extending the bpf_link kernel support)... >> >> > >> >> >> >> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If >> >> we're months ahead, then I'd really like to see this in libbpf until the >> >> merge. However, I'll leave that for Magnus/you to decide! >> > >> > Did I miss some thread? What does this mean libbpf 1.0/libxdp merge? >> >> The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff to >> libxdp (so the socket stuff in xsk.h). We're adding the existing code >> wholesale, and keeping API compatibility during the move, so all that's >> needed is adding -lxdp when compiling. And obviously the existing libbpf >> code isn't going anywhere until such a time as there's a general >> backwards compatibility-breaking deprecation in libbpf (which I believe >> Andrii is planning to do in an upcoming and as-of-yet unannounced v1.0 >> release). > > OK, I would like to keep the basic XDP pieces in libbpf though. For example > bpf_program__attach_xdp(). This way we don't have one lib to attach > everything, but XDP. The details are still TDB; for now, we're just merging in the XSK code to the libxdp repo. I expect Andrii to announce his plans for the rest soonish. I wouldn't expect basic things like that to go away, though :) >> >> While integrating the XSK code into libxdp we're trying to make it >> compatible with the rest of the library (i.e., multi-prog). Hence my >> preference to avoid introducing something that makes this harder :) >> >> -Toke >> > > OK that makes sense to me thanks. But, I'm missing something (maybe its > obvious to everyone else?). > > When you load an XDP program you should get a reference to it. And then > XDP program should never be unloaded until that id is removed right? It > may or may not have an xsk map. Why does adding/removing programs from > an associated map have any impact on the XDP program? That seems like > the buggy part to me. No other map behaves this way as far as I can > tell. Now if the program with the XDP reference closes without pinning > the map or otherwise doing something with it, sure the map gets destroyed > and any xsk sockets are lost. The original bug comes from the XSK code abstracting away the fact that an AF_XDP socket needs an XDP program on the interface to work; so if none exists, the library will just load a program that redirects into the socket. Which breaks since the xdpsock example application is trying to be nice and clean up after itself, by removing the XDP program when it's done with the socket, thus breaking any other programs using that XDP program. So this patch introduces proper synchronisation on both add and remove of the XDP program... -Toke
Björn Töpel <bjorn.topel@intel.com> writes: > On 2021-02-15 21:49, John Fastabend wrote: >> Maciej Fijalkowski wrote: >>> Currently, if there are multiple xdpsock instances running on a single >>> interface and in case one of the instances is terminated, the rest of >>> them are left in an inoperable state due to the fact of unloaded XDP >>> prog from interface. >>> >>> To address that, step away from setting bpf prog in favour of bpf_link. >>> This means that refcounting of BPF resources will be done automatically >>> by bpf_link itself. >>> >>> When setting up BPF resources during xsk socket creation, check whether >>> bpf_link for a given ifindex already exists via set of calls to >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd >>> and comparing the ifindexes from bpf_link and xsk socket. >>> >>> If there's no bpf_link yet, create one for a given XDP prog and unload >>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. >>> >>> If bpf_link is already at a given ifindex and underlying program is not >>> AF-XDP one, bail out or update the bpf_link's prog given the presence of >>> XDP_FLAGS_UPDATE_IF_NOEXIST. >>> >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> >>> --- >>> tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++------- >>> 1 file changed, 122 insertions(+), 21 deletions(-) >> >> [...] >> >>> +static int xsk_create_bpf_link(struct xsk_socket *xsk) >>> +{ >>> + /* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags >>> + * might have set XDP_FLAGS_UPDATE_IF_NOEXIST >>> + */ >>> + DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts, >>> + .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES)); >>> + struct xsk_ctx *ctx = xsk->ctx; >>> + __u32 prog_id; >>> + int link_fd; >>> + int err; >>> + >>> + /* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any, >>> + * so that bpf_link can be attached >>> + */ >>> + if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) { >>> + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags); >>> + if (err) { >>> + pr_warn("getting XDP prog id failed\n"); >>> + return err; >>> + } >>> + if (prog_id) { >>> + err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0); >>> + if (err < 0) { >>> + pr_warn("detaching XDP prog failed\n"); >>> + return err; >>> + } >>> + } >>> } >>> >>> - ctx->prog_fd = prog_fd; >>> + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts); >>> + if (link_fd < 0) { >>> + pr_warn("bpf_link_create failed: %s\n", strerror(errno)); >>> + return link_fd; >>> + } >>> + >> >> This can leave the system in a bad state where it unloaded the XDP program >> above, but then failed to create the link. So we should somehow fix that >> if possible or at minimum put a note somewhere so users can't claim they >> shouldn't know this. >> >> Also related, its not good for real systems to let XDP program go missing >> for some period of time. I didn't check but we should make >> XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already. >> > > This is the default for XDP sockets library. The > "bpf_set_link_xdp_fd(...-1)" way is only when a user sets it explicitly. > One could maybe argue that the "force remove" would be out of scope for > AF_XDP; Meaning that if an XDP program is running, attached via netlink, > the AF_XDP library simply cannot remove it. The user would need to rely > on some other mechanism. Yeah, I'd tend to agree with that. In general, I think the proliferation of "just force-remove (or override) the running program" in code and instructions has been a mistake; and application should only really be adding and removing its own program... -Toke
Maciej Fijalkowski wrote: > On Mon, Feb 15, 2021 at 12:49:27PM -0800, John Fastabend wrote: > > Maciej Fijalkowski wrote: > > > Currently, if there are multiple xdpsock instances running on a single > > > interface and in case one of the instances is terminated, the rest of > > > them are left in an inoperable state due to the fact of unloaded XDP > > > prog from interface. > > > > > > To address that, step away from setting bpf prog in favour of bpf_link. > > > This means that refcounting of BPF resources will be done automatically > > > by bpf_link itself. > > > > > > When setting up BPF resources during xsk socket creation, check whether > > > bpf_link for a given ifindex already exists via set of calls to > > > bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > > > and comparing the ifindexes from bpf_link and xsk socket. > > > > > > If there's no bpf_link yet, create one for a given XDP prog and unload > > > explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. > > > > > > If bpf_link is already at a given ifindex and underlying program is not > > > AF-XDP one, bail out or update the bpf_link's prog given the presence of > > > XDP_FLAGS_UPDATE_IF_NOEXIST. > > > > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > > > --- > > > tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++------- > > > 1 file changed, 122 insertions(+), 21 deletions(-) > > > > [...] > > > > > +static int xsk_create_bpf_link(struct xsk_socket *xsk) > > > +{ > > > + /* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags > > > + * might have set XDP_FLAGS_UPDATE_IF_NOEXIST > > > + */ > > > + DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts, > > > + .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES)); > > > + struct xsk_ctx *ctx = xsk->ctx; > > > + __u32 prog_id; > > > + int link_fd; > > > + int err; > > > + > > > + /* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any, > > > + * so that bpf_link can be attached > > > + */ > > > + if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) { > > > + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags); > > > + if (err) { > > > + pr_warn("getting XDP prog id failed\n"); > > > + return err; > > > + } > > > + if (prog_id) { > > > + err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0); > > > + if (err < 0) { > > > + pr_warn("detaching XDP prog failed\n"); > > > + return err; > > > + } > > > + } > > > } > > > > > > - ctx->prog_fd = prog_fd; > > > + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts); > > > + if (link_fd < 0) { > > > + pr_warn("bpf_link_create failed: %s\n", strerror(errno)); > > > + return link_fd; > > > + } > > > + > > > > This can leave the system in a bad state where it unloaded the XDP program > > above, but then failed to create the link. So we should somehow fix that > > if possible or at minimum put a note somewhere so users can't claim they > > shouldn't know this. > > > > Also related, its not good for real systems to let XDP program go missing > > for some period of time. I didn't check but we should make > > XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already. > > Old way of attaching prog is mutual exclusive with bpf_link, right? > What I'm saying is in order to use one of the two, you need to wipe out > the current one in favour of the second that you would like to load. Personally, if I were using above I want the operation to error if a XDP program is already attached. Then user is forced to remove the XDP program directly if thats even safe to do. Reusing UPDATE_IF_NOEXIST flag above seems like an abuse of that flag. The kernel side does an atomic program swap (or at least it should imo) of the programs when it is set. Atomic here is not exactly right though because driver might reset or do other things, but the point is no packets are missed without policy. In above some N packets will pass through the device without policy being applied. This is going to be subtle and buggy if used in real production systems. The API needs to do a replace operation not a delete/create and if it can't do that it needs to error out so the user can figure out what to do about it. Do you really need this automatic behavior for something? It clutters up the API with more flags and I can't see how its useful. If it errors out just delete the prog using the existing interfaces from the API user side. > > > > > > + ctx->link_fd = link_fd; > > > return 0; > > > } > > > > > > > [...] > > > > > +static int xsk_link_lookup(struct xsk_ctx *ctx, __u32 *prog_id) > > > +{ > > > + __u32 link_len = sizeof(struct bpf_link_info); > > > + struct bpf_link_info link_info; > > > + __u32 id = 0; > > > + int err; > > > + int fd; > > > + > > > + while (true) { > > > + err = bpf_link_get_next_id(id, &id); > > > + if (err) { > > > + if (errno == ENOENT) > > > + break; > > > + pr_warn("can't get next link: %s\n", strerror(errno)); > > > + break; > > > + } > > > + > > > + fd = bpf_link_get_fd_by_id(id); > > > + if (fd < 0) { > > > + if (errno == ENOENT) > > > + continue; > > > + pr_warn("can't get link by id (%u): %s\n", id, strerror(errno)); > > > + break; > > > + } > > > + > > > + memset(&link_info, 0, link_len); > > > + err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len); > > > + if (err) { > > > + pr_warn("can't get link info: %s\n", strerror(errno)); > > > + close(fd); > > > + break; > > > + } > > > + if (link_info.xdp.ifindex == ctx->ifindex) { > > > + ctx->link_fd = fd; > > > + *prog_id = link_info.prog_id; > > > + break; > > > + } > > > + close(fd); > > > + } > > > + > > > + return errno == ENOENT ? 0 : err; > > > > But, err wont be set in fd < 0 case? I guess we don't want to return 0 if > > bpf_link_get_fd_by_id fails. > > Good catch! > > > Although I really don't like the construct > > here that much. I think just `return err` and ensuring err is set correctly > > would be more clear. At least the fd error case needs to be handled > > though. > > FWIW, this was inspired by tools/bpf/bpftool/link.c:do_show() Sure its not my preference, but as long as the bug is resolved I wont complain. If I hadn't seen the bug I wouldn't have said anything. > > > > > > +} > > > +
Toke Høiland-Jørgensen wrote: > Björn Töpel <bjorn.topel@intel.com> writes: > > > On 2021-02-15 21:49, John Fastabend wrote: > >> Maciej Fijalkowski wrote: > >>> Currently, if there are multiple xdpsock instances running on a single > >>> interface and in case one of the instances is terminated, the rest of > >>> them are left in an inoperable state due to the fact of unloaded XDP > >>> prog from interface. > >>> > >>> To address that, step away from setting bpf prog in favour of bpf_link. > >>> This means that refcounting of BPF resources will be done automatically > >>> by bpf_link itself. > >>> > >>> When setting up BPF resources during xsk socket creation, check whether > >>> bpf_link for a given ifindex already exists via set of calls to > >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > >>> and comparing the ifindexes from bpf_link and xsk socket. > >>> > >>> If there's no bpf_link yet, create one for a given XDP prog and unload > >>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. > >>> > >>> If bpf_link is already at a given ifindex and underlying program is not > >>> AF-XDP one, bail out or update the bpf_link's prog given the presence of > >>> XDP_FLAGS_UPDATE_IF_NOEXIST. > >>> > >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > >>> --- [...] > >>> - ctx->prog_fd = prog_fd; > >>> + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts); > >>> + if (link_fd < 0) { > >>> + pr_warn("bpf_link_create failed: %s\n", strerror(errno)); > >>> + return link_fd; > >>> + } > >>> + > >> > >> This can leave the system in a bad state where it unloaded the XDP program > >> above, but then failed to create the link. So we should somehow fix that > >> if possible or at minimum put a note somewhere so users can't claim they > >> shouldn't know this. > >> > >> Also related, its not good for real systems to let XDP program go missing > >> for some period of time. I didn't check but we should make > >> XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already. > >> > > > > This is the default for XDP sockets library. The > > "bpf_set_link_xdp_fd(...-1)" way is only when a user sets it explicitly. > > One could maybe argue that the "force remove" would be out of scope for > > AF_XDP; Meaning that if an XDP program is running, attached via netlink, > > the AF_XDP library simply cannot remove it. The user would need to rely > > on some other mechanism. > > Yeah, I'd tend to agree with that. In general, I think the proliferation > of "just force-remove (or override) the running program" in code and > instructions has been a mistake; and application should only really be > adding and removing its own program... > > -Toke > I'll try to consolidate some of my opinions from a couple threads here. It looks to me many of these issues are self-inflicted by the API. We built the API with out the right abstractions or at least different abstractions from the rest of the BPF APIs. Not too surprising seeing the kernel side and user side were all being built up at once. For example this specific issue where the xsk API also deletes the XDP program seems to be due to merging the xsk with the creation of the XDP programs. I'm not a real user of AF_XDP (yet.), but here is how I would expect it to work based on how the sockmap pieces work, which are somewhat similar given they also deal with sockets. Program (1) load and pin an XDP BPF program - obj = bpf_object__open(prog); - bpf_object__load_xattr(&attr); - bpf_program__pin() (2) pin the map, find map_xsk using any of the map APIs - bpf_map__pin(map_xsk, path_to_pin) (3) attach to XDP - link = bpf_program__attach_xdp() - bpf_link__pin() At this point you have a BPF program loaded, a xsk map, and a link all pinned and ready. And we can add socks using the process found in `enter_xsks_into_map` in the sample. This can be the same program that loaded/pinned the XDP program or some other program it doesn't really matter. - create xsk fd . xsk_umem__create() . xsk_socket__create - open map @ pinned path - bpf_map_update_elem(xsks_map, &key, &fd, 0); Then it looks like we don't have any conflicts? The XDP program is pinned and exists in its normal scope. The xsk elements can be added/deleted as normal. If the XDP program is removed and the map referencing (using normal ref rules) reaches zero its also deleted. Above is more or less the same flow we use for any BPF program so looks good to me. The trouble seems to pop up when using the higher abstraction APIs xsk_setup_xdp_prog and friends I guess? I just see above as already fairly easy to use and we have good helpers to create the sockets it looks like. Maybe I missed some design considerations. IMO higher level abstractions should go in new libxdp and above should stay in libbpf. /rant off ;) Thanks, John
On Tue, Feb 16, 2021 at 10:19:17AM -0800, John Fastabend wrote: > Maciej Fijalkowski wrote: > > On Mon, Feb 15, 2021 at 12:49:27PM -0800, John Fastabend wrote: > > > Maciej Fijalkowski wrote: > > > > Currently, if there are multiple xdpsock instances running on a single > > > > interface and in case one of the instances is terminated, the rest of > > > > them are left in an inoperable state due to the fact of unloaded XDP > > > > prog from interface. > > > > > > > > To address that, step away from setting bpf prog in favour of bpf_link. > > > > This means that refcounting of BPF resources will be done automatically > > > > by bpf_link itself. > > > > > > > > When setting up BPF resources during xsk socket creation, check whether > > > > bpf_link for a given ifindex already exists via set of calls to > > > > bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > > > > and comparing the ifindexes from bpf_link and xsk socket. > > > > > > > > If there's no bpf_link yet, create one for a given XDP prog and unload > > > > explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. > > > > > > > > If bpf_link is already at a given ifindex and underlying program is not > > > > AF-XDP one, bail out or update the bpf_link's prog given the presence of > > > > XDP_FLAGS_UPDATE_IF_NOEXIST. > > > > > > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > > > > --- > > > > tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++------- > > > > 1 file changed, 122 insertions(+), 21 deletions(-) > > > > > > [...] > > > > > > > +static int xsk_create_bpf_link(struct xsk_socket *xsk) > > > > +{ > > > > + /* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags > > > > + * might have set XDP_FLAGS_UPDATE_IF_NOEXIST > > > > + */ > > > > + DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts, > > > > + .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES)); > > > > + struct xsk_ctx *ctx = xsk->ctx; > > > > + __u32 prog_id; > > > > + int link_fd; > > > > + int err; > > > > + > > > > + /* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any, > > > > + * so that bpf_link can be attached > > > > + */ > > > > + if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) { > > > > + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags); > > > > + if (err) { > > > > + pr_warn("getting XDP prog id failed\n"); > > > > + return err; > > > > + } > > > > + if (prog_id) { > > > > + err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0); > > > > + if (err < 0) { > > > > + pr_warn("detaching XDP prog failed\n"); > > > > + return err; > > > > + } > > > > + } > > > > } > > > > > > > > - ctx->prog_fd = prog_fd; > > > > + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts); > > > > + if (link_fd < 0) { > > > > + pr_warn("bpf_link_create failed: %s\n", strerror(errno)); > > > > + return link_fd; > > > > + } > > > > + > > > > > > This can leave the system in a bad state where it unloaded the XDP program > > > above, but then failed to create the link. So we should somehow fix that > > > if possible or at minimum put a note somewhere so users can't claim they > > > shouldn't know this. > > > > > > Also related, its not good for real systems to let XDP program go missing > > > for some period of time. I didn't check but we should make > > > XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already. > > > > Old way of attaching prog is mutual exclusive with bpf_link, right? > > What I'm saying is in order to use one of the two, you need to wipe out > > the current one in favour of the second that you would like to load. > > Personally, if I were using above I want the operation to error > if a XDP program is already attached. Then user is forced to remove the > XDP program directly if thats even safe to do. > > Reusing UPDATE_IF_NOEXIST flag above seems like an abuse of that flag. > The kernel side does an atomic program swap (or at least it should imo) > of the programs when it is set. Atomic here is not exactly right though > because driver might reset or do other things, but the point is no > packets are missed without policy. In above some N packets will pass > through the device without policy being applied. This is going to be > subtle and buggy if used in real production systems. > > The API needs to do a replace operation not a delete/create and if it > can't do that it needs to error out so the user can figure out what > to do about it. > > Do you really need this automatic behavior for something? It clutters > up the API with more flags and I can't see how its useful. If it > errors out just delete the prog using the existing interfaces from the > API user side. Fair argument, I went too far with --force flag. Given what you said, I'll drop that wipe out of netlink-based XDP prog, but I think we can keep the logic around updating the bpf_link if it already exists (in case --force flag was set). This provides the atomic xchg and we won't expose systems to a time frame without XDP policy as you point out. > > > > > > > > > > + ctx->link_fd = link_fd; > > > > return 0; > > > > } > > > > > > > > > > [...] > > > > > > > +static int xsk_link_lookup(struct xsk_ctx *ctx, __u32 *prog_id) > > > > +{ > > > > + __u32 link_len = sizeof(struct bpf_link_info); > > > > + struct bpf_link_info link_info; > > > > + __u32 id = 0; > > > > + int err; > > > > + int fd; > > > > + > > > > + while (true) { > > > > + err = bpf_link_get_next_id(id, &id); > > > > + if (err) { > > > > + if (errno == ENOENT) > > > > + break; > > > > + pr_warn("can't get next link: %s\n", strerror(errno)); > > > > + break; > > > > + } > > > > + > > > > + fd = bpf_link_get_fd_by_id(id); > > > > + if (fd < 0) { > > > > + if (errno == ENOENT) > > > > + continue; > > > > + pr_warn("can't get link by id (%u): %s\n", id, strerror(errno)); > > > > + break; > > > > + } > > > > + > > > > + memset(&link_info, 0, link_len); > > > > + err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len); > > > > + if (err) { > > > > + pr_warn("can't get link info: %s\n", strerror(errno)); > > > > + close(fd); > > > > + break; > > > > + } > > > > + if (link_info.xdp.ifindex == ctx->ifindex) { > > > > + ctx->link_fd = fd; > > > > + *prog_id = link_info.prog_id; > > > > + break; > > > > + } > > > > + close(fd); > > > > + } > > > > + > > > > + return errno == ENOENT ? 0 : err; > > > > > > But, err wont be set in fd < 0 case? I guess we don't want to return 0 if > > > bpf_link_get_fd_by_id fails. > > > > Good catch! > > > > > Although I really don't like the construct > > > here that much. I think just `return err` and ensuring err is set correctly > > > would be more clear. At least the fd error case needs to be handled > > > though. > > > > FWIW, this was inspired by tools/bpf/bpftool/link.c:do_show() > > Sure its not my preference, but as long as the bug is resolved I > wont complain. If I hadn't seen the bug I wouldn't have said > anything. > > > > > > > > > > +} > > > > + > >
On Tue, Feb 16, 2021 at 11:27:55AM +0100, Toke Høiland-Jørgensen wrote: > Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > > > > Am I reading this right or you're trying to reject the fix of the long > > standing issue due to a PR that is not ready yet on a standalone > > project/library? :P > > Haha, no, that is not what I'm saying. As I said up-thread I agree that > this is something we should fix, obviously. I'm just suggesting we fix > it in a way that will also be compatible with libxdp and multiprog so we > won't have to introduce yet-another-flag that users will have to decide > on. > > However, now that I'm looking at your patch again I think my fears may > have been overblown. I got hung up on the bit in the commit message > where you said "refcounting of BPF resources will be done automatically > by bpf_link itself", and wrongly assumed that you were exposing the > bpf_link fd to the application. But I see now that it's kept in the > private xsk_ctx structure, and applications still just call > xsk_socket__delete(). So in libxdp we can just implement the same API > with a different synchronisation mechanism; that's fine. Apologies for > jumping to conclusions :/ No worries, this shows how important a thorough commit message is, seems that I failed on that part. > > > Once libxdp is the standard way of playing with AF-XDP and there are > > actual users of that, then I'm happy to work/help on that issue. > > That is good to know, thanks! I opened an issue in the xdp-tools > repository to track this for the libxdp side (Magnus and I agreed that > we'll merge the existing code first, then fix this on top): > https://github.com/xdp-project/xdp-tools/issues/93 Thanks! And good to hear that there's some sort of agreement. > > As noted above, the mechanism may have to change, but I think it's > possible to achieve the same thing in a libxdp context :) > > -Toke >
On Tue, Feb 16, 2021 at 11:15:41AM -0800, John Fastabend wrote: > Toke Høiland-Jørgensen wrote: > > Björn Töpel <bjorn.topel@intel.com> writes: > > > > > On 2021-02-15 21:49, John Fastabend wrote: > > >> Maciej Fijalkowski wrote: > > >>> Currently, if there are multiple xdpsock instances running on a single > > >>> interface and in case one of the instances is terminated, the rest of > > >>> them are left in an inoperable state due to the fact of unloaded XDP > > >>> prog from interface. > > >>> > > >>> To address that, step away from setting bpf prog in favour of bpf_link. > > >>> This means that refcounting of BPF resources will be done automatically > > >>> by bpf_link itself. > > >>> > > >>> When setting up BPF resources during xsk socket creation, check whether > > >>> bpf_link for a given ifindex already exists via set of calls to > > >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > > >>> and comparing the ifindexes from bpf_link and xsk socket. > > >>> > > >>> If there's no bpf_link yet, create one for a given XDP prog and unload > > >>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. > > >>> > > >>> If bpf_link is already at a given ifindex and underlying program is not > > >>> AF-XDP one, bail out or update the bpf_link's prog given the presence of > > >>> XDP_FLAGS_UPDATE_IF_NOEXIST. > > >>> > > >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > > >>> --- > > [...] > > > >>> - ctx->prog_fd = prog_fd; > > >>> + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts); > > >>> + if (link_fd < 0) { > > >>> + pr_warn("bpf_link_create failed: %s\n", strerror(errno)); > > >>> + return link_fd; > > >>> + } > > >>> + > > >> > > >> This can leave the system in a bad state where it unloaded the XDP program > > >> above, but then failed to create the link. So we should somehow fix that > > >> if possible or at minimum put a note somewhere so users can't claim they > > >> shouldn't know this. > > >> > > >> Also related, its not good for real systems to let XDP program go missing > > >> for some period of time. I didn't check but we should make > > >> XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already. > > >> > > > > > > This is the default for XDP sockets library. The > > > "bpf_set_link_xdp_fd(...-1)" way is only when a user sets it explicitly. > > > One could maybe argue that the "force remove" would be out of scope for > > > AF_XDP; Meaning that if an XDP program is running, attached via netlink, > > > the AF_XDP library simply cannot remove it. The user would need to rely > > > on some other mechanism. > > > > Yeah, I'd tend to agree with that. In general, I think the proliferation > > of "just force-remove (or override) the running program" in code and > > instructions has been a mistake; and application should only really be > > adding and removing its own program... > > > > -Toke > > > > I'll try to consolidate some of my opinions from a couple threads here. It > looks to me many of these issues are self-inflicted by the API. We built > the API with out the right abstractions or at least different abstractions > from the rest of the BPF APIs. Not too surprising seeing the kernel side > and user side were all being built up at once. > > For example this specific issue where the xsk API also deletes the XDP > program seems to be due to merging the xsk with the creation of the XDP > programs. IMHO the issue I fixed and that is the topic of those discussions is the fact that handling of XDP program for xsk case was not balanced, IOW libbpf was creating the prog if it wasn't present whereas app side was blindly removing the XDP prog, even if it wasn't aware whether it has loaded the prog among with XSKMAP or only attached the xsk socket to XSKMAP that already existed. > > I'm not a real user of AF_XDP (yet.), but here is how I would expect it > to work based on how the sockmap pieces work, which are somewhat similar > given they also deal with sockets. Don't want to be picky, but I suppose sockmap won't play with ndo_bpf() and that's what was bothering us. > > Program > (1) load and pin an XDP BPF program > - obj = bpf_object__open(prog); > - bpf_object__load_xattr(&attr); > - bpf_program__pin() > (2) pin the map, find map_xsk using any of the map APIs > - bpf_map__pin(map_xsk, path_to_pin) > (3) attach to XDP > - link = bpf_program__attach_xdp() > - bpf_link__pin() > > At this point you have a BPF program loaded, a xsk map, and a link all > pinned and ready. And we can add socks using the process found in > `enter_xsks_into_map` in the sample. This can be the same program that > loaded/pinned the XDP program or some other program it doesn't really > matter. > > - create xsk fd > . xsk_umem__create() > . xsk_socket__create > - open map @ pinned path > - bpf_map_update_elem(xsks_map, &key, &fd, 0); > > Then it looks like we don't have any conflicts? The XDP program is pinned > and exists in its normal scope. The xsk elements can be added/deleted > as normal. The only difference from what you wrote up is the resource pinning, when compared to what we currently have + the set I am proposing. So, if you're saying it looks like we don't have any conflicts and I am saying that the flow is what we have, then...? :) You would have to ask Magnus what was behind the decision to avoid API from tools/lib/bpf/libbpf.c but rather directly call underlying functions from tools/lib/bpf/bpf.c. Nevertheless, it doesn't really make a big difference to me. > If the XDP program is removed and the map referencing (using > normal ref rules) reaches zero its also deleted. Above is more or less > the same flow we use for any BPF program so looks good to me. We have to be a bit more specific around the "XDP program is removed". Is it removed from the network interface? That's the thing that we want to avoid when there are other xsk sockets active on a given interface. With bpf_link, XDP prog is removed only when bpf_link's refcount reaches zero, via link->ops->release() callback that is invoked from bpf_link_put(). And the refcount is updated with each process that attaches/detaches from the bpf_link on interface. > > The trouble seems to pop up when using the higher abstraction APIs > xsk_setup_xdp_prog and friends I guess? I just see above as already > fairly easy to use and we have good helpers to create the sockets it looks > like. Maybe I missed some design considerations. IMO higher level > abstractions should go in new libxdp and above should stay in libbpf. xsk_setup_xdp_prog doesn't really feel like higher level abstraction, as I mentioned, to me it has one layer of abstraction peeled off. > > /rant off ;) > > Thanks, > John
Maciej Fijalkowski wrote: > On Tue, Feb 16, 2021 at 11:15:41AM -0800, John Fastabend wrote: > > Toke Høiland-Jørgensen wrote: > > > Björn Töpel <bjorn.topel@intel.com> writes: > > > > > > > On 2021-02-15 21:49, John Fastabend wrote: > > > >> Maciej Fijalkowski wrote: > > > >>> Currently, if there are multiple xdpsock instances running on a single > > > >>> interface and in case one of the instances is terminated, the rest of > > > >>> them are left in an inoperable state due to the fact of unloaded XDP > > > >>> prog from interface. > > > >>> > > > >>> To address that, step away from setting bpf prog in favour of bpf_link. > > > >>> This means that refcounting of BPF resources will be done automatically > > > >>> by bpf_link itself. > > > >>> > > > >>> When setting up BPF resources during xsk socket creation, check whether > > > >>> bpf_link for a given ifindex already exists via set of calls to > > > >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd > > > >>> and comparing the ifindexes from bpf_link and xsk socket. > > > >>> > > > >>> If there's no bpf_link yet, create one for a given XDP prog and unload > > > >>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. > > > >>> > > > >>> If bpf_link is already at a given ifindex and underlying program is not > > > >>> AF-XDP one, bail out or update the bpf_link's prog given the presence of > > > >>> XDP_FLAGS_UPDATE_IF_NOEXIST. > > > >>> > > > >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> [...] > > > > I'm not a real user of AF_XDP (yet.), but here is how I would expect it > > to work based on how the sockmap pieces work, which are somewhat similar > > given they also deal with sockets. > > Don't want to be picky, but I suppose sockmap won't play with ndo_bpf() > and that's what was bothering us. > > > > > Program > > (1) load and pin an XDP BPF program > > - obj = bpf_object__open(prog); > > - bpf_object__load_xattr(&attr); > > - bpf_program__pin() > > (2) pin the map, find map_xsk using any of the map APIs > > - bpf_map__pin(map_xsk, path_to_pin) > > (3) attach to XDP > > - link = bpf_program__attach_xdp() > > - bpf_link__pin() > > > > At this point you have a BPF program loaded, a xsk map, and a link all > > pinned and ready. And we can add socks using the process found in > > `enter_xsks_into_map` in the sample. This can be the same program that > > loaded/pinned the XDP program or some other program it doesn't really > > matter. > > > > - create xsk fd > > . xsk_umem__create() > > . xsk_socket__create > > - open map @ pinned path > > - bpf_map_update_elem(xsks_map, &key, &fd, 0); > > > > Then it looks like we don't have any conflicts? The XDP program is pinned > > and exists in its normal scope. The xsk elements can be added/deleted > > as normal. > > The only difference from what you wrote up is the resource pinning, when > compared to what we currently have + the set I am proposing. > > So, if you're saying it looks like we don't have any conflicts and I am > saying that the flow is what we have, then...? :) > > You would have to ask Magnus what was behind the decision to avoid API > from tools/lib/bpf/libbpf.c but rather directly call underlying functions > from tools/lib/bpf/bpf.c. Nevertheless, it doesn't really make a big > difference to me. > > > If the XDP program is removed and the map referencing (using > > normal ref rules) reaches zero its also deleted. Above is more or less > > the same flow we use for any BPF program so looks good to me. > > We have to be a bit more specific around the "XDP program is removed". > Is it removed from the network interface? That's the thing that we want to > avoid when there are other xsk sockets active on a given interface. What I'm suggesting here is to use the normal rules for when an XDP program is removed from the network interface. I don't think we should do anything extra to keep the XDP program attached/loaded just because it has a xsk map which may or may not have some entries in it. If the user wants this behavior they should pin the bpf_link pointer associated with the xdp program. This is the same as any other program/map I have in my BPF system. > > With bpf_link, XDP prog is removed only when bpf_link's refcount reaches > zero, via link->ops->release() callback that is invoked from > bpf_link_put(). > > And the refcount is updated with each process that attaches/detaches from > the bpf_link on interface. > > > > > The trouble seems to pop up when using the higher abstraction APIs > > xsk_setup_xdp_prog and friends I guess? I just see above as already > > fairly easy to use and we have good helpers to create the sockets it looks > > like. Maybe I missed some design considerations. IMO higher level > > abstractions should go in new libxdp and above should stay in libbpf. > > xsk_setup_xdp_prog doesn't really feel like higher level abstraction, as I > mentioned, to me it has one layer of abstraction peeled off. Except it seems to have caused some issues. I don't think I gain much from the API personally vs just doing above steps. But, I appreciate you are just trying to fix it here so patches are a good idea with v2 improvements. And I expect whenever libbpf/libxdp split the above "high level" APIs will land in libxdp. Thanks, John > > > > > /rant off ;) > > > > Thanks, > > John
On Mon, 2021-02-15 at 22:38 +0100, Toke Høiland-Jørgensen wrote: > The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff > to > libxdp (so the socket stuff in xsk.h). We're adding the existing code > wholesale, and keeping API compatibility during the move, so all > that's > needed is adding -lxdp when compiling. And obviously the existing > libbpf > code isn't going anywhere until such a time as there's a general > backwards compatibility-breaking deprecation in libbpf (which I > believe > Andrii is planning to do in an upcoming and as-of-yet unannounced > v1.0 > release). I maintain a Rust binding to the AF_XDP parts of libbpf [1][2]. On the chance that more significant changes can be entertained in the switch to libxdp... The fact that many required functions like the ring access functions exist only in xsk.h makes building a binding more difficult because we need to wrap it with an extra C function [3]. From that perspective, it would be great if those could move to xsk.c. [1] - https://github.com/aterlo/afxdp-rs [2] - https://github.com/alexforster/libbpf-sys [3] - https://github.com/alexforster/libbpf-sys/blob/master/bindings.c
On Wed, Feb 17, 2021 at 3:26 AM Dan Siemon <dan@coverfire.com> wrote: > > On Mon, 2021-02-15 at 22:38 +0100, Toke Høiland-Jørgensen wrote: > > The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff > > to > > libxdp (so the socket stuff in xsk.h). We're adding the existing code > > wholesale, and keeping API compatibility during the move, so all > > that's > > needed is adding -lxdp when compiling. And obviously the existing > > libbpf > > code isn't going anywhere until such a time as there's a general > > backwards compatibility-breaking deprecation in libbpf (which I > > believe > > Andrii is planning to do in an upcoming and as-of-yet unannounced > > v1.0 > > release). > > I maintain a Rust binding to the AF_XDP parts of libbpf [1][2]. On the > chance that more significant changes can be entertained in the switch > to libxdp... The fact that many required functions like the ring access > functions exist only in xsk.h makes building a binding more difficult > because we need to wrap it with an extra C function [3]. From that > perspective, it would be great if those could move to xsk.c. The only reason they were put in xsk.h is performance. But with LTO (link-time optimizations) being present in most C-compilers these days, it might not be a valid argument anymore. I will perform some experiments and let you know. As you say, it would be much nicer to hide away these functions in the library proper and make your life easier. > [1] - https://github.com/aterlo/afxdp-rs > [2] - https://github.com/alexforster/libbpf-sys > [3] - https://github.com/alexforster/libbpf-sys/blob/master/bindings.c >
On Wed, Feb 17, 2021 at 8:16 AM Magnus Karlsson <magnus.karlsson@gmail.com> wrote: > > On Wed, Feb 17, 2021 at 3:26 AM Dan Siemon <dan@coverfire.com> wrote: > > > > On Mon, 2021-02-15 at 22:38 +0100, Toke Høiland-Jørgensen wrote: > > > The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff > > > to > > > libxdp (so the socket stuff in xsk.h). We're adding the existing code > > > wholesale, and keeping API compatibility during the move, so all > > > that's > > > needed is adding -lxdp when compiling. And obviously the existing > > > libbpf > > > code isn't going anywhere until such a time as there's a general > > > backwards compatibility-breaking deprecation in libbpf (which I > > > believe > > > Andrii is planning to do in an upcoming and as-of-yet unannounced > > > v1.0 > > > release). > > > > I maintain a Rust binding to the AF_XDP parts of libbpf [1][2]. On the > > chance that more significant changes can be entertained in the switch > > to libxdp... The fact that many required functions like the ring access > > functions exist only in xsk.h makes building a binding more difficult > > because we need to wrap it with an extra C function [3]. From that > > perspective, it would be great if those could move to xsk.c. > > The only reason they were put in xsk.h is performance. But with LTO > (link-time optimizations) being present in most C-compilers these > days, it might not be a valid argument anymore. I will perform some > experiments and let you know. As you say, it would be much nicer to > hide away these functions in the library proper and make your life > easier. I would be very grateful for any more suggested changes that users out there would like to see. Now, when we move to libxdp is the perfect chance to fix those things. We might even decide to partially break compatibility or change some behavior if the gain is large enough. > > [1] - https://github.com/aterlo/afxdp-rs > > [2] - https://github.com/alexforster/libbpf-sys > > [3] - https://github.com/alexforster/libbpf-sys/blob/master/bindings.c > >
On Tue, Feb 16, 2021 at 2:37 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > John Fastabend <john.fastabend@gmail.com> writes: > > > Toke Høiland-Jørgensen wrote: > >> John Fastabend <john.fastabend@gmail.com> writes: > >> > >> >> > However, in libxdp we can solve the original problem in a different way, > >> >> > and in fact I already suggested to Magnus that we should do this (see > >> >> > [1]); so one way forward could be to address it during the merge in > >> >> > libxdp? It should be possible to address the original issue (two > >> >> > instances of xdpsock breaking each other when they exit), but > >> >> > applications will still need to do an explicit unload operation before > >> >> > exiting (i.e., the automatic detach on bpf_link fd closure will take > >> >> > more work, and likely require extending the bpf_link kernel support)... > >> >> > > >> >> > >> >> I'd say it's depending on the libbpf 1.0/libxdp merge timeframe. If > >> >> we're months ahead, then I'd really like to see this in libbpf until the > >> >> merge. However, I'll leave that for Magnus/you to decide! > >> > > >> > Did I miss some thread? What does this mean libbpf 1.0/libxdp merge? > >> > >> The idea is to keep libbpf focused on bpf, and move the AF_XDP stuff to > >> libxdp (so the socket stuff in xsk.h). We're adding the existing code > >> wholesale, and keeping API compatibility during the move, so all that's > >> needed is adding -lxdp when compiling. And obviously the existing libbpf > >> code isn't going anywhere until such a time as there's a general > >> backwards compatibility-breaking deprecation in libbpf (which I believe > >> Andrii is planning to do in an upcoming and as-of-yet unannounced v1.0 > >> release). > > > > OK, I would like to keep the basic XDP pieces in libbpf though. For example > > bpf_program__attach_xdp(). This way we don't have one lib to attach > > everything, but XDP. > > The details are still TDB; for now, we're just merging in the XSK code > to the libxdp repo. I expect Andrii to announce his plans for the rest > soonish. I wouldn't expect basic things like that to go away, though :) Yeah, I'll probably post more details this week. Just catching up on stuff after vacation. As mentioned already, all the basic APIs (i.e., APIs like bpf_program__attach_xdp and bpf_set_link_xdp_fd, though I hope we can give the latter a better name) will stay intact. Stay tuned! > > >> > >> While integrating the XSK code into libxdp we're trying to make it > >> compatible with the rest of the library (i.e., multi-prog). Hence my > >> preference to avoid introducing something that makes this harder :) > >> > >> -Toke > >> > > > > OK that makes sense to me thanks. But, I'm missing something (maybe its > > obvious to everyone else?). > > > > When you load an XDP program you should get a reference to it. And then > > XDP program should never be unloaded until that id is removed right? It > > may or may not have an xsk map. Why does adding/removing programs from > > an associated map have any impact on the XDP program? That seems like > > the buggy part to me. No other map behaves this way as far as I can > > tell. Now if the program with the XDP reference closes without pinning > > the map or otherwise doing something with it, sure the map gets destroyed > > and any xsk sockets are lost. > > The original bug comes from the XSK code abstracting away the fact that > an AF_XDP socket needs an XDP program on the interface to work; so if > none exists, the library will just load a program that redirects into > the socket. Which breaks since the xdpsock example application is trying > to be nice and clean up after itself, by removing the XDP program when > it's done with the socket, thus breaking any other programs using that > XDP program. So this patch introduces proper synchronisation on both add > and remove of the XDP program... > > -Toke >
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index 20500fb1f17e..5911868efa43 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -28,6 +28,7 @@ #include <sys/mman.h> #include <sys/socket.h> #include <sys/types.h> +#include <linux/if_link.h> #include "bpf.h" #include "libbpf.h" @@ -70,6 +71,7 @@ struct xsk_ctx { int ifindex; struct list_head list; int prog_fd; + int link_fd; int xsks_map_fd; char ifname[IFNAMSIZ]; }; @@ -409,7 +411,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) static const int log_buf_size = 16 * 1024; struct xsk_ctx *ctx = xsk->ctx; char log_buf[log_buf_size]; - int err, prog_fd; + int prog_fd; /* This is the fallback C-program: * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) @@ -499,14 +501,47 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) return prog_fd; } - err = bpf_set_link_xdp_fd(xsk->ctx->ifindex, prog_fd, - xsk->config.xdp_flags); - if (err) { - close(prog_fd); - return err; + ctx->prog_fd = prog_fd; + return 0; +} + +static int xsk_create_bpf_link(struct xsk_socket *xsk) +{ + /* bpf_link only accepts XDP_FLAGS_MODES, but xsk->config.xdp_flags + * might have set XDP_FLAGS_UPDATE_IF_NOEXIST + */ + DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts, + .flags = (xsk->config.xdp_flags & XDP_FLAGS_MODES)); + struct xsk_ctx *ctx = xsk->ctx; + __u32 prog_id; + int link_fd; + int err; + + /* for !XDP_FLAGS_UPDATE_IF_NOEXIST, unload the program first, if any, + * so that bpf_link can be attached + */ + if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) { + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags); + if (err) { + pr_warn("getting XDP prog id failed\n"); + return err; + } + if (prog_id) { + err = bpf_set_link_xdp_fd(ctx->ifindex, -1, 0); + if (err < 0) { + pr_warn("detaching XDP prog failed\n"); + return err; + } + } } - ctx->prog_fd = prog_fd; + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts); + if (link_fd < 0) { + pr_warn("bpf_link_create failed: %s\n", strerror(errno)); + return link_fd; + } + + ctx->link_fd = link_fd; return 0; } @@ -625,8 +660,32 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) } err = 0; - if (ctx->xsks_map_fd == -1) + if (ctx->xsks_map_fd != -1) + goto out_map_ids; + + /* if prog load is forced and we found bpf_link on a given ifindex BUT + * the underlying XDP prog is not an AF-XDP one, close old prog's fd, + * load AF-XDP and update existing bpf_link + */ + if (!(xsk->config.xdp_flags & XDP_FLAGS_UPDATE_IF_NOEXIST)) { + close(ctx->prog_fd); + err = xsk_create_bpf_maps(xsk); + if (err) + goto out_map_ids; + + err = xsk_load_xdp_prog(xsk); + if (err) { + xsk_delete_bpf_maps(xsk); + goto out_map_ids; + } + + /* if update failed, caller will close fds */ + err = bpf_link_update(ctx->link_fd, ctx->prog_fd, 0); + if (err) + pr_warn("bpf_link_update failed: %s\n", strerror(errno)); + } else { err = -ENOENT; + } out_map_ids: free(map_ids); @@ -666,6 +725,49 @@ static int xsk_create_xsk_struct(int ifindex, struct xsk_socket *xsk) return 0; } +static int xsk_link_lookup(struct xsk_ctx *ctx, __u32 *prog_id) +{ + __u32 link_len = sizeof(struct bpf_link_info); + struct bpf_link_info link_info; + __u32 id = 0; + int err; + int fd; + + while (true) { + err = bpf_link_get_next_id(id, &id); + if (err) { + if (errno == ENOENT) + break; + pr_warn("can't get next link: %s\n", strerror(errno)); + break; + } + + fd = bpf_link_get_fd_by_id(id); + if (fd < 0) { + if (errno == ENOENT) + continue; + pr_warn("can't get link by id (%u): %s\n", id, strerror(errno)); + break; + } + + memset(&link_info, 0, link_len); + err = bpf_obj_get_info_by_fd(fd, &link_info, &link_len); + if (err) { + pr_warn("can't get link info: %s\n", strerror(errno)); + close(fd); + break; + } + if (link_info.xdp.ifindex == ctx->ifindex) { + ctx->link_fd = fd; + *prog_id = link_info.prog_id; + break; + } + close(fd); + } + + return errno == ENOENT ? 0 : err; +} + static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp, int *xsks_map_fd) { @@ -674,8 +776,7 @@ static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp, __u32 prog_id = 0; int err; - err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, - xsk->config.xdp_flags); + err = xsk_link_lookup(ctx, &prog_id); if (err) return err; @@ -685,9 +786,12 @@ static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp, return err; err = xsk_load_xdp_prog(xsk); - if (err) { + if (err) goto err_load_xdp_prog; - } + + err = xsk_create_bpf_link(xsk); + if (err) + goto err_create_bpf_link; } else { ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id); if (ctx->prog_fd < 0) @@ -695,20 +799,15 @@ static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp, err = xsk_lookup_bpf_maps(xsk); if (err) { close(ctx->prog_fd); + close(ctx->link_fd); return err; } } if (xsk->rx) { err = xsk_set_bpf_maps(xsk); - if (err) { - if (!prog_id) { - goto err_set_bpf_maps; - } else { - close(ctx->prog_fd); - return err; - } - } + if (err) + goto err_set_bpf_maps; } if (xsks_map_fd) *xsks_map_fd = ctx->xsks_map_fd; @@ -716,8 +815,9 @@ static int __xsk_setup_xdp_prog(struct xsk_socket *_xdp, return 0; err_set_bpf_maps: + close(ctx->link_fd); +err_create_bpf_link: close(ctx->prog_fd); - bpf_set_link_xdp_fd(ctx->ifindex, -1, 0); err_load_xdp_prog: xsk_delete_bpf_maps(xsk); @@ -1053,6 +1153,7 @@ void xsk_socket__delete(struct xsk_socket *xsk) if (ctx->prog_fd != -1) { xsk_delete_bpf_maps(xsk); close(ctx->prog_fd); + close(ctx->link_fd); } err = xsk_get_mmap_offsets(xsk->fd, &off);
Currently, if there are multiple xdpsock instances running on a single interface and in case one of the instances is terminated, the rest of them are left in an inoperable state due to the fact of unloaded XDP prog from interface. To address that, step away from setting bpf prog in favour of bpf_link. This means that refcounting of BPF resources will be done automatically by bpf_link itself. When setting up BPF resources during xsk socket creation, check whether bpf_link for a given ifindex already exists via set of calls to bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd and comparing the ifindexes from bpf_link and xsk socket. If there's no bpf_link yet, create one for a given XDP prog and unload explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set. If bpf_link is already at a given ifindex and underlying program is not AF-XDP one, bail out or update the bpf_link's prog given the presence of XDP_FLAGS_UPDATE_IF_NOEXIST. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> --- tools/lib/bpf/xsk.c | 143 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 122 insertions(+), 21 deletions(-)