Message ID | 20230612172307.3923165-1-sdf@google.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: netdev TX metadata | expand |
Some immediate thoughts after glancing through this: > --- Use cases --- > > The goal of this series is to add two new standard-ish places > in the transmit path: > > 1. Right before the packet is transmitted (with access to TX > descriptors) > 2. Right after the packet is actually transmitted and we've received the > completion (again, with access to TX completion descriptors) > > Accessing TX descriptors unlocks the following use-cases: > > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > use device offloads. The existing case implements TX timestamp. > - Observability: global per-netdev hooks can be used for tracing > the packets and exploring completion descriptors for all sorts of > device errors. > > Accessing TX descriptors also means that the hooks have to be called > from the drivers. > > The hooks are a light-weight alternative to XDP at egress and currently > don't provide any packet modification abilities. However, eventually, > can expose new kfuncs to operate on the packet (or, rather, the actual > descriptors; for performance sake). dynptr? > --- UAPI --- > > The hooks are implemented in a HID-BPF style. Meaning they don't > expose any UAPI and are implemented as tracing programs that call > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > programs. The series expands device-bound infrastructure to tracing > programs. Not a fan of the "attach from BPF syscall program" thing. These are part of the XDP data path API, and I think we should expose them as proper bpf_link attachments from userspace with introspection etc. But I guess the bpf_mprog thing will give us that? > --- skb vs xdp --- > > The hooks operate on a new light-weight devtx_frame which contains: > - data > - len > - sinfo > > This should allow us to have a unified (from BPF POW) place at TX > and not be super-taxing (we need to copy 2 pointers + len to the stack > for each invocation). Not sure what I think about this one. At the very least I think we should expose xdp->data_meta as well. I'm not sure what the use case for accessing skbs is? If that *is* indeed useful, probably there will also end up being a use case for accessing the full skb? > --- Multiprog attachment --- > > Currently, attach/detach don't expose links and don't support multiple > programs. I'm planning to use Daniel's bpf_mprog once it lands. > > --- TODO --- > > Things that I'm planning to do for the non-RFC series: > - have some real device support to verify xdp_hw_metadata works Would be good to see some performance numbers as well :) > - freplace > - Documentation/networking/xdp-rx-metadata.rst - like documentation > > --- CC --- > > CC'ing people only on the cover letter. Hopefully can find the rest via > lore. Well, I found it there, even though I was apparently left off the Cc list :( -Toke
On 06/12, Toke Høiland-Jørgensen wrote: > Some immediate thoughts after glancing through this: > > > --- Use cases --- > > > > The goal of this series is to add two new standard-ish places > > in the transmit path: > > > > 1. Right before the packet is transmitted (with access to TX > > descriptors) > > 2. Right after the packet is actually transmitted and we've received the > > completion (again, with access to TX completion descriptors) > > > > Accessing TX descriptors unlocks the following use-cases: > > > > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > > use device offloads. The existing case implements TX timestamp. > > - Observability: global per-netdev hooks can be used for tracing > > the packets and exploring completion descriptors for all sorts of > > device errors. > > > > Accessing TX descriptors also means that the hooks have to be called > > from the drivers. > > > > The hooks are a light-weight alternative to XDP at egress and currently > > don't provide any packet modification abilities. However, eventually, > > can expose new kfuncs to operate on the packet (or, rather, the actual > > descriptors; for performance sake). > > dynptr? Haven't considered, let me explore, but not sure what it buys us here? > > --- UAPI --- > > > > The hooks are implemented in a HID-BPF style. Meaning they don't > > expose any UAPI and are implemented as tracing programs that call > > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > > programs. The series expands device-bound infrastructure to tracing > > programs. > > Not a fan of the "attach from BPF syscall program" thing. These are part > of the XDP data path API, and I think we should expose them as proper > bpf_link attachments from userspace with introspection etc. But I guess > the bpf_mprog thing will give us that? bpf_mprog will just make those attach kfuncs return the link fd. The syscall program will still stay :-( > > --- skb vs xdp --- > > > > The hooks operate on a new light-weight devtx_frame which contains: > > - data > > - len > > - sinfo > > > > This should allow us to have a unified (from BPF POW) place at TX > > and not be super-taxing (we need to copy 2 pointers + len to the stack > > for each invocation). > > Not sure what I think about this one. At the very least I think we > should expose xdp->data_meta as well. I'm not sure what the use case for > accessing skbs is? If that *is* indeed useful, probably there will also > end up being a use case for accessing the full skb? skb_shared_info has meta_len, buf afaik, xdp doesn't use it. Maybe I a good opportunity to unify? Or probably won't work because if xdf_frame doesn't have frags, it won't have sinfo? > > --- Multiprog attachment --- > > > > Currently, attach/detach don't expose links and don't support multiple > > programs. I'm planning to use Daniel's bpf_mprog once it lands. > > > > --- TODO --- > > > > Things that I'm planning to do for the non-RFC series: > > - have some real device support to verify xdp_hw_metadata works > > Would be good to see some performance numbers as well :) +1 :-) > > - freplace > > - Documentation/networking/xdp-rx-metadata.rst - like documentation > > > > --- CC --- > > > > CC'ing people only on the cover letter. Hopefully can find the rest via > > lore. > > Well, I found it there, even though I was apparently left off the Cc > list :( > > -Toke Sure, I'll CC you explicitly next time! But I know you diligently follow bpf list, so decided to explicitly cc mostly netdev folks that might miss it otherwise.
Stanislav Fomichev <sdf@google.com> writes: > On 06/12, Toke Høiland-Jørgensen wrote: >> Some immediate thoughts after glancing through this: >> >> > --- Use cases --- >> > >> > The goal of this series is to add two new standard-ish places >> > in the transmit path: >> > >> > 1. Right before the packet is transmitted (with access to TX >> > descriptors) >> > 2. Right after the packet is actually transmitted and we've received the >> > completion (again, with access to TX completion descriptors) >> > >> > Accessing TX descriptors unlocks the following use-cases: >> > >> > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to >> > use device offloads. The existing case implements TX timestamp. >> > - Observability: global per-netdev hooks can be used for tracing >> > the packets and exploring completion descriptors for all sorts of >> > device errors. >> > >> > Accessing TX descriptors also means that the hooks have to be called >> > from the drivers. >> > >> > The hooks are a light-weight alternative to XDP at egress and currently >> > don't provide any packet modification abilities. However, eventually, >> > can expose new kfuncs to operate on the packet (or, rather, the actual >> > descriptors; for performance sake). >> >> dynptr? > > Haven't considered, let me explore, but not sure what it buys us > here? API consistency, certainly. Possibly also performance, if using the slice thing that gets you a direct pointer to the pkt data? Not sure about that, though, haven't done extensive benchmarking of dynptr yet... >> > --- UAPI --- >> > >> > The hooks are implemented in a HID-BPF style. Meaning they don't >> > expose any UAPI and are implemented as tracing programs that call >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall >> > programs. The series expands device-bound infrastructure to tracing >> > programs. >> >> Not a fan of the "attach from BPF syscall program" thing. These are part >> of the XDP data path API, and I think we should expose them as proper >> bpf_link attachments from userspace with introspection etc. But I guess >> the bpf_mprog thing will give us that? > > bpf_mprog will just make those attach kfuncs return the link fd. The > syscall program will still stay :-( Why does the attachment have to be done this way, exactly? Couldn't we just use the regular bpf_link attachment from userspace? AFAICT it's not really piggy-backing on the function override thing anyway when the attachment is per-dev? Or am I misunderstanding how all this works? >> > --- skb vs xdp --- >> > >> > The hooks operate on a new light-weight devtx_frame which contains: >> > - data >> > - len >> > - sinfo >> > >> > This should allow us to have a unified (from BPF POW) place at TX >> > and not be super-taxing (we need to copy 2 pointers + len to the stack >> > for each invocation). >> >> Not sure what I think about this one. At the very least I think we >> should expose xdp->data_meta as well. I'm not sure what the use case for >> accessing skbs is? If that *is* indeed useful, probably there will also >> end up being a use case for accessing the full skb? > > skb_shared_info has meta_len, buf afaik, xdp doesn't use it. Maybe I > a good opportunity to unify? Or probably won't work because if > xdf_frame doesn't have frags, it won't have sinfo? No, it won't. But why do we need this unification between the skb and xdp paths in the first place? Doesn't the skb path already have support for these things? Seems like we could just stick to making this xdp-only and keeping xdp_frame as the ctx argument? >> > --- Multiprog attachment --- >> > >> > Currently, attach/detach don't expose links and don't support multiple >> > programs. I'm planning to use Daniel's bpf_mprog once it lands. >> > >> > --- TODO --- >> > >> > Things that I'm planning to do for the non-RFC series: >> > - have some real device support to verify xdp_hw_metadata works >> >> Would be good to see some performance numbers as well :) > > +1 :-) > >> > - freplace >> > - Documentation/networking/xdp-rx-metadata.rst - like documentation >> > >> > --- CC --- >> > >> > CC'ing people only on the cover letter. Hopefully can find the rest via >> > lore. >> >> Well, I found it there, even though I was apparently left off the Cc >> list :( >> >> -Toke > > Sure, I'll CC you explicitly next time! But I know you diligently follow bpf > list, so decided to explicitly cc mostly netdev folks that might miss > it otherwise. Haha, fair point! And no big deal, I did obviously see it. I was just feeling a bit left out, that's all ;) -Toke
On Tue, Jun 13, 2023 at 10:18 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > Stanislav Fomichev <sdf@google.com> writes: > > > On 06/12, Toke Høiland-Jørgensen wrote: > >> Some immediate thoughts after glancing through this: > >> > >> > --- Use cases --- > >> > > >> > The goal of this series is to add two new standard-ish places > >> > in the transmit path: > >> > > >> > 1. Right before the packet is transmitted (with access to TX > >> > descriptors) > >> > 2. Right after the packet is actually transmitted and we've received the > >> > completion (again, with access to TX completion descriptors) > >> > > >> > Accessing TX descriptors unlocks the following use-cases: > >> > > >> > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > >> > use device offloads. The existing case implements TX timestamp. > >> > - Observability: global per-netdev hooks can be used for tracing > >> > the packets and exploring completion descriptors for all sorts of > >> > device errors. > >> > > >> > Accessing TX descriptors also means that the hooks have to be called > >> > from the drivers. > >> > > >> > The hooks are a light-weight alternative to XDP at egress and currently > >> > don't provide any packet modification abilities. However, eventually, > >> > can expose new kfuncs to operate on the packet (or, rather, the actual > >> > descriptors; for performance sake). > >> > >> dynptr? > > > > Haven't considered, let me explore, but not sure what it buys us > > here? > > API consistency, certainly. Possibly also performance, if using the > slice thing that gets you a direct pointer to the pkt data? Not sure > about that, though, haven't done extensive benchmarking of dynptr yet... Same. Let's keep it on the table, I'll try to explore. I was just thinking that having less abstraction here might be better performance-wise. > >> > --- UAPI --- > >> > > >> > The hooks are implemented in a HID-BPF style. Meaning they don't > >> > expose any UAPI and are implemented as tracing programs that call > >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > >> > programs. The series expands device-bound infrastructure to tracing > >> > programs. > >> > >> Not a fan of the "attach from BPF syscall program" thing. These are part > >> of the XDP data path API, and I think we should expose them as proper > >> bpf_link attachments from userspace with introspection etc. But I guess > >> the bpf_mprog thing will give us that? > > > > bpf_mprog will just make those attach kfuncs return the link fd. The > > syscall program will still stay :-( > > Why does the attachment have to be done this way, exactly? Couldn't we > just use the regular bpf_link attachment from userspace? AFAICT it's not > really piggy-backing on the function override thing anyway when the > attachment is per-dev? Or am I misunderstanding how all this works? It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives us an opportunity to fix things. We can do it via a regular syscall path if there is a consensus. > >> > --- skb vs xdp --- > >> > > >> > The hooks operate on a new light-weight devtx_frame which contains: > >> > - data > >> > - len > >> > - sinfo > >> > > >> > This should allow us to have a unified (from BPF POW) place at TX > >> > and not be super-taxing (we need to copy 2 pointers + len to the stack > >> > for each invocation). > >> > >> Not sure what I think about this one. At the very least I think we > >> should expose xdp->data_meta as well. I'm not sure what the use case for > >> accessing skbs is? If that *is* indeed useful, probably there will also > >> end up being a use case for accessing the full skb? > > > > skb_shared_info has meta_len, buf afaik, xdp doesn't use it. Maybe I > > a good opportunity to unify? Or probably won't work because if > > xdf_frame doesn't have frags, it won't have sinfo? > > No, it won't. But why do we need this unification between the skb and > xdp paths in the first place? Doesn't the skb path already have support > for these things? Seems like we could just stick to making this xdp-only > and keeping xdp_frame as the ctx argument? For skb path, I'm assuming we can read sinfo->meta_len; it feels nice to make it work for both cases? We can always export metadata len via some kfunc, sure. > >> > --- Multiprog attachment --- > >> > > >> > Currently, attach/detach don't expose links and don't support multiple > >> > programs. I'm planning to use Daniel's bpf_mprog once it lands. > >> > > >> > --- TODO --- > >> > > >> > Things that I'm planning to do for the non-RFC series: > >> > - have some real device support to verify xdp_hw_metadata works > >> > >> Would be good to see some performance numbers as well :) > > > > +1 :-) > > > >> > - freplace > >> > - Documentation/networking/xdp-rx-metadata.rst - like documentation > >> > > >> > --- CC --- > >> > > >> > CC'ing people only on the cover letter. Hopefully can find the rest via > >> > lore. > >> > >> Well, I found it there, even though I was apparently left off the Cc > >> list :( > >> > >> -Toke > > > > Sure, I'll CC you explicitly next time! But I know you diligently follow bpf > > list, so decided to explicitly cc mostly netdev folks that might miss > > it otherwise. > > Haha, fair point! And no big deal, I did obviously see it. I was just > feeling a bit left out, that's all ;) > > -Toke
Stanislav Fomichev <sdf@google.com> writes: > On Tue, Jun 13, 2023 at 10:18 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote: >> >> Stanislav Fomichev <sdf@google.com> writes: >> >> > On 06/12, Toke Høiland-Jørgensen wrote: >> >> Some immediate thoughts after glancing through this: >> >> >> >> > --- Use cases --- >> >> > >> >> > The goal of this series is to add two new standard-ish places >> >> > in the transmit path: >> >> > >> >> > 1. Right before the packet is transmitted (with access to TX >> >> > descriptors) >> >> > 2. Right after the packet is actually transmitted and we've received the >> >> > completion (again, with access to TX completion descriptors) >> >> > >> >> > Accessing TX descriptors unlocks the following use-cases: >> >> > >> >> > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to >> >> > use device offloads. The existing case implements TX timestamp. >> >> > - Observability: global per-netdev hooks can be used for tracing >> >> > the packets and exploring completion descriptors for all sorts of >> >> > device errors. >> >> > >> >> > Accessing TX descriptors also means that the hooks have to be called >> >> > from the drivers. >> >> > >> >> > The hooks are a light-weight alternative to XDP at egress and currently >> >> > don't provide any packet modification abilities. However, eventually, >> >> > can expose new kfuncs to operate on the packet (or, rather, the actual >> >> > descriptors; for performance sake). >> >> >> >> dynptr? >> > >> > Haven't considered, let me explore, but not sure what it buys us >> > here? >> >> API consistency, certainly. Possibly also performance, if using the >> slice thing that gets you a direct pointer to the pkt data? Not sure >> about that, though, haven't done extensive benchmarking of dynptr yet... > > Same. Let's keep it on the table, I'll try to explore. I was just > thinking that having less abstraction here might be better > performance-wise. Sure, let's evaluate this once we have performance numbers. >> >> > --- UAPI --- >> >> > >> >> > The hooks are implemented in a HID-BPF style. Meaning they don't >> >> > expose any UAPI and are implemented as tracing programs that call >> >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall >> >> > programs. The series expands device-bound infrastructure to tracing >> >> > programs. >> >> >> >> Not a fan of the "attach from BPF syscall program" thing. These are part >> >> of the XDP data path API, and I think we should expose them as proper >> >> bpf_link attachments from userspace with introspection etc. But I guess >> >> the bpf_mprog thing will give us that? >> > >> > bpf_mprog will just make those attach kfuncs return the link fd. The >> > syscall program will still stay :-( >> >> Why does the attachment have to be done this way, exactly? Couldn't we >> just use the regular bpf_link attachment from userspace? AFAICT it's not >> really piggy-backing on the function override thing anyway when the >> attachment is per-dev? Or am I misunderstanding how all this works? > > It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives > us an opportunity to fix things. > We can do it via a regular syscall path if there is a consensus. Yeah, the API exposed to the BPF program is kfunc-based in any case. If we were to at some point conclude that this whole thing was not useful at all and deprecate it, it doesn't seem to me that it makes much difference whether that means "you can no longer create a link attachment of this type via BPF_LINK_CREATE" or "you can no longer create a link attachment of this type via BPF_PROG_RUN of a syscall type program" doesn't really seem like a significant detail to me... >> >> > --- skb vs xdp --- >> >> > >> >> > The hooks operate on a new light-weight devtx_frame which contains: >> >> > - data >> >> > - len >> >> > - sinfo >> >> > >> >> > This should allow us to have a unified (from BPF POW) place at TX >> >> > and not be super-taxing (we need to copy 2 pointers + len to the stack >> >> > for each invocation). >> >> >> >> Not sure what I think about this one. At the very least I think we >> >> should expose xdp->data_meta as well. I'm not sure what the use case for >> >> accessing skbs is? If that *is* indeed useful, probably there will also >> >> end up being a use case for accessing the full skb? >> > >> > skb_shared_info has meta_len, buf afaik, xdp doesn't use it. Maybe I >> > a good opportunity to unify? Or probably won't work because if >> > xdf_frame doesn't have frags, it won't have sinfo? >> >> No, it won't. But why do we need this unification between the skb and >> xdp paths in the first place? Doesn't the skb path already have support >> for these things? Seems like we could just stick to making this xdp-only >> and keeping xdp_frame as the ctx argument? > > For skb path, I'm assuming we can read sinfo->meta_len; it feels nice > to make it work for both cases? > We can always export metadata len via some kfunc, sure. I wasn't referring to the metadata field specifically when talking about the skb path. I'm wondering why we need these hooks to work for the skb path at all? :) -Toke
On Tue, Jun 13, 2023 at 12:10 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > Stanislav Fomichev <sdf@google.com> writes: > > > On Tue, Jun 13, 2023 at 10:18 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > >> > >> Stanislav Fomichev <sdf@google.com> writes: > >> > >> > On 06/12, Toke Høiland-Jørgensen wrote: > >> >> Some immediate thoughts after glancing through this: > >> >> > >> >> > --- Use cases --- > >> >> > > >> >> > The goal of this series is to add two new standard-ish places > >> >> > in the transmit path: > >> >> > > >> >> > 1. Right before the packet is transmitted (with access to TX > >> >> > descriptors) > >> >> > 2. Right after the packet is actually transmitted and we've received the > >> >> > completion (again, with access to TX completion descriptors) > >> >> > > >> >> > Accessing TX descriptors unlocks the following use-cases: > >> >> > > >> >> > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > >> >> > use device offloads. The existing case implements TX timestamp. > >> >> > - Observability: global per-netdev hooks can be used for tracing > >> >> > the packets and exploring completion descriptors for all sorts of > >> >> > device errors. > >> >> > > >> >> > Accessing TX descriptors also means that the hooks have to be called > >> >> > from the drivers. > >> >> > > >> >> > The hooks are a light-weight alternative to XDP at egress and currently > >> >> > don't provide any packet modification abilities. However, eventually, > >> >> > can expose new kfuncs to operate on the packet (or, rather, the actual > >> >> > descriptors; for performance sake). > >> >> > >> >> dynptr? > >> > > >> > Haven't considered, let me explore, but not sure what it buys us > >> > here? > >> > >> API consistency, certainly. Possibly also performance, if using the > >> slice thing that gets you a direct pointer to the pkt data? Not sure > >> about that, though, haven't done extensive benchmarking of dynptr yet... > > > > Same. Let's keep it on the table, I'll try to explore. I was just > > thinking that having less abstraction here might be better > > performance-wise. > > Sure, let's evaluate this once we have performance numbers. > > >> >> > --- UAPI --- > >> >> > > >> >> > The hooks are implemented in a HID-BPF style. Meaning they don't > >> >> > expose any UAPI and are implemented as tracing programs that call > >> >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > >> >> > programs. The series expands device-bound infrastructure to tracing > >> >> > programs. > >> >> > >> >> Not a fan of the "attach from BPF syscall program" thing. These are part > >> >> of the XDP data path API, and I think we should expose them as proper > >> >> bpf_link attachments from userspace with introspection etc. But I guess > >> >> the bpf_mprog thing will give us that? > >> > > >> > bpf_mprog will just make those attach kfuncs return the link fd. The > >> > syscall program will still stay :-( > >> > >> Why does the attachment have to be done this way, exactly? Couldn't we > >> just use the regular bpf_link attachment from userspace? AFAICT it's not > >> really piggy-backing on the function override thing anyway when the > >> attachment is per-dev? Or am I misunderstanding how all this works? > > > > It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives > > us an opportunity to fix things. > > We can do it via a regular syscall path if there is a consensus. > > Yeah, the API exposed to the BPF program is kfunc-based in any case. If > we were to at some point conclude that this whole thing was not useful > at all and deprecate it, it doesn't seem to me that it makes much > difference whether that means "you can no longer create a link > attachment of this type via BPF_LINK_CREATE" or "you can no longer > create a link attachment of this type via BPF_PROG_RUN of a syscall type > program" doesn't really seem like a significant detail to me... In this case, why do you prefer it to go via regular syscall? Seems like we can avoid a bunch of boileplate syscall work with a kfunc that does the attachment? We might as well abstract it at, say, libbpf layer which would generate/load this small bpf program to call a kfunc. > >> >> > --- skb vs xdp --- > >> >> > > >> >> > The hooks operate on a new light-weight devtx_frame which contains: > >> >> > - data > >> >> > - len > >> >> > - sinfo > >> >> > > >> >> > This should allow us to have a unified (from BPF POW) place at TX > >> >> > and not be super-taxing (we need to copy 2 pointers + len to the stack > >> >> > for each invocation). > >> >> > >> >> Not sure what I think about this one. At the very least I think we > >> >> should expose xdp->data_meta as well. I'm not sure what the use case for > >> >> accessing skbs is? If that *is* indeed useful, probably there will also > >> >> end up being a use case for accessing the full skb? > >> > > >> > skb_shared_info has meta_len, buf afaik, xdp doesn't use it. Maybe I > >> > a good opportunity to unify? Or probably won't work because if > >> > xdf_frame doesn't have frags, it won't have sinfo? > >> > >> No, it won't. But why do we need this unification between the skb and > >> xdp paths in the first place? Doesn't the skb path already have support > >> for these things? Seems like we could just stick to making this xdp-only > >> and keeping xdp_frame as the ctx argument? > > > > For skb path, I'm assuming we can read sinfo->meta_len; it feels nice > > to make it work for both cases? > > We can always export metadata len via some kfunc, sure. > > I wasn't referring to the metadata field specifically when talking about > the skb path. I'm wondering why we need these hooks to work for the skb > path at all? :) Aaah. I think John wanted them to trigger for skb path, so I'm trying to explore whether having both makes sense. But also, if we go purely xdp_frame, what about af_xdp in copy mode? That's still skb-driven, right? Not sure this skb vs xdp is a clear cut :-/
On Tue, Jun 13, 2023 at 2:17 PM Stanislav Fomichev <sdf@google.com> wrote: > > > >> >> > --- UAPI --- > > >> >> > > > >> >> > The hooks are implemented in a HID-BPF style. Meaning they don't > > >> >> > expose any UAPI and are implemented as tracing programs that call > > >> >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > > >> >> > programs. The series expands device-bound infrastructure to tracing > > >> >> > programs. > > >> >> > > >> >> Not a fan of the "attach from BPF syscall program" thing. These are part > > >> >> of the XDP data path API, and I think we should expose them as proper > > >> >> bpf_link attachments from userspace with introspection etc. But I guess > > >> >> the bpf_mprog thing will give us that? > > >> > > > >> > bpf_mprog will just make those attach kfuncs return the link fd. The > > >> > syscall program will still stay :-( > > >> > > >> Why does the attachment have to be done this way, exactly? Couldn't we > > >> just use the regular bpf_link attachment from userspace? AFAICT it's not > > >> really piggy-backing on the function override thing anyway when the > > >> attachment is per-dev? Or am I misunderstanding how all this works? > > > > > > It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives > > > us an opportunity to fix things. > > > We can do it via a regular syscall path if there is a consensus. > > > > Yeah, the API exposed to the BPF program is kfunc-based in any case. If > > we were to at some point conclude that this whole thing was not useful > > at all and deprecate it, it doesn't seem to me that it makes much > > difference whether that means "you can no longer create a link > > attachment of this type via BPF_LINK_CREATE" or "you can no longer > > create a link attachment of this type via BPF_PROG_RUN of a syscall type > > program" doesn't really seem like a significant detail to me... > > In this case, why do you prefer it to go via regular syscall? Seems > like we can avoid a bunch of boileplate syscall work with a kfunc that > does the attachment? > We might as well abstract it at, say, libbpf layer which would > generate/load this small bpf program to call a kfunc. I'm not sure we're on the same page here. imo using syscall bpf prog that calls kfunc to do a per-device attach is an overkill here. It's an experimental feature, but you're already worried about multiple netdevs? Can you add an empty nop function and attach to it tracing style with fentry ? It won't be per-netdev, but do you have to do per-device demux by the kernel? Can your tracing bpf prog do that instead? It's just an ifindex compare. This way than non-uapi bits will be even smaller and no need to change struct netdevice.
On Tue, Jun 13, 2023 at 3:32 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Tue, Jun 13, 2023 at 2:17 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > > >> >> > --- UAPI --- > > > >> >> > > > > >> >> > The hooks are implemented in a HID-BPF style. Meaning they don't > > > >> >> > expose any UAPI and are implemented as tracing programs that call > > > >> >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > > > >> >> > programs. The series expands device-bound infrastructure to tracing > > > >> >> > programs. > > > >> >> > > > >> >> Not a fan of the "attach from BPF syscall program" thing. These are part > > > >> >> of the XDP data path API, and I think we should expose them as proper > > > >> >> bpf_link attachments from userspace with introspection etc. But I guess > > > >> >> the bpf_mprog thing will give us that? > > > >> > > > > >> > bpf_mprog will just make those attach kfuncs return the link fd. The > > > >> > syscall program will still stay :-( > > > >> > > > >> Why does the attachment have to be done this way, exactly? Couldn't we > > > >> just use the regular bpf_link attachment from userspace? AFAICT it's not > > > >> really piggy-backing on the function override thing anyway when the > > > >> attachment is per-dev? Or am I misunderstanding how all this works? > > > > > > > > It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives > > > > us an opportunity to fix things. > > > > We can do it via a regular syscall path if there is a consensus. > > > > > > Yeah, the API exposed to the BPF program is kfunc-based in any case. If > > > we were to at some point conclude that this whole thing was not useful > > > at all and deprecate it, it doesn't seem to me that it makes much > > > difference whether that means "you can no longer create a link > > > attachment of this type via BPF_LINK_CREATE" or "you can no longer > > > create a link attachment of this type via BPF_PROG_RUN of a syscall type > > > program" doesn't really seem like a significant detail to me... > > > > In this case, why do you prefer it to go via regular syscall? Seems > > like we can avoid a bunch of boileplate syscall work with a kfunc that > > does the attachment? > > We might as well abstract it at, say, libbpf layer which would > > generate/load this small bpf program to call a kfunc. > > I'm not sure we're on the same page here. > imo using syscall bpf prog that calls kfunc to do a per-device attach > is an overkill here. > It's an experimental feature, but you're already worried about > multiple netdevs? > > Can you add an empty nop function and attach to it tracing style > with fentry ? > It won't be per-netdev, but do you have to do per-device demux > by the kernel? Can your tracing bpf prog do that instead? > It's just an ifindex compare. > This way than non-uapi bits will be even smaller and no need > to change struct netdevice. It's probably going to work if each driver has a separate set of tx fentry points, something like: {veth,mlx5,etc}_devtx_submit() {veth,mlx5,etc}_devtx_complete() Because I still need to have those netdev-bound tracing programs to get access to driver kfuncs. I can try to sketch something together for a v2.
On Mon, 12 Jun 2023 10:23:00 -0700 Stanislav Fomichev wrote: > The goal of this series is to add two new standard-ish places > in the transmit path: > > 1. Right before the packet is transmitted (with access to TX > descriptors) I'm not sure that the Tx descriptors can be populated piecemeal. If we were ever to support more standard offload features, which require packet geometry (hdr offsets etc.) to be described "call per feature" will end up duplicating arguments, and there will be a lot of args.. And if there is an SKB path in the future combining the normal SKB offloads with the half-rendered descriptors may be a pain. > 2. Right after the packet is actually transmitted and we've received the > completion (again, with access to TX completion descriptors) For completions trace-style attach makes perfect sense. > Accessing TX descriptors unlocks the following use-cases: > > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > use device offloads. The existing case implements TX timestamp. > - Observability: global per-netdev hooks can be used for tracing > the packets and exploring completion descriptors for all sorts of > device errors. > > Accessing TX descriptors also means that the hooks have to be called > from the drivers. > > The hooks are a light-weight alternative to XDP at egress and currently Hm, a lot of lightweight features lately. Unbearable lightness of features.
On 6/13/23 9:31 PM, Jakub Kicinski wrote: > On Mon, 12 Jun 2023 10:23:00 -0700 Stanislav Fomichev wrote: >> The goal of this series is to add two new standard-ish places >> in the transmit path: >> >> 1. Right before the packet is transmitted (with access to TX >> descriptors) If a device requires multiple Tx descriptors per skb or multibuf frame, how would that be handled within the XDP API? > > I'm not sure that the Tx descriptors can be populated piecemeal. If it is host memory before the pidx move, why would that matter? Do you have a specific example in mind? > If we were ever to support more standard offload features, which > require packet geometry (hdr offsets etc.) to be described "call > per feature" will end up duplicating arguments, and there will be > a lot of args.. > > And if there is an SKB path in the future combining the normal SKB > offloads with the half-rendered descriptors may be a pain. Once the descriptor(s) is (are) populated, the skb is irrelevant is it not? Only complication that comes to mind is wanting to add or remove headers (e.g., tunnels) which will be much more complicated at this point, but might still be possible on a per NIC (and maybe version) basis.
On Tue, Jun 13, 2023 at 4:16 PM Stanislav Fomichev <sdf@google.com> wrote: > > On Tue, Jun 13, 2023 at 3:32 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Tue, Jun 13, 2023 at 2:17 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > > > > >> >> > --- UAPI --- > > > > >> >> > > > > > >> >> > The hooks are implemented in a HID-BPF style. Meaning they don't > > > > >> >> > expose any UAPI and are implemented as tracing programs that call > > > > >> >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > > > > >> >> > programs. The series expands device-bound infrastructure to tracing > > > > >> >> > programs. > > > > >> >> > > > > >> >> Not a fan of the "attach from BPF syscall program" thing. These are part > > > > >> >> of the XDP data path API, and I think we should expose them as proper > > > > >> >> bpf_link attachments from userspace with introspection etc. But I guess > > > > >> >> the bpf_mprog thing will give us that? > > > > >> > > > > > >> > bpf_mprog will just make those attach kfuncs return the link fd. The > > > > >> > syscall program will still stay :-( > > > > >> > > > > >> Why does the attachment have to be done this way, exactly? Couldn't we > > > > >> just use the regular bpf_link attachment from userspace? AFAICT it's not > > > > >> really piggy-backing on the function override thing anyway when the > > > > >> attachment is per-dev? Or am I misunderstanding how all this works? > > > > > > > > > > It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives > > > > > us an opportunity to fix things. > > > > > We can do it via a regular syscall path if there is a consensus. > > > > > > > > Yeah, the API exposed to the BPF program is kfunc-based in any case. If > > > > we were to at some point conclude that this whole thing was not useful > > > > at all and deprecate it, it doesn't seem to me that it makes much > > > > difference whether that means "you can no longer create a link > > > > attachment of this type via BPF_LINK_CREATE" or "you can no longer > > > > create a link attachment of this type via BPF_PROG_RUN of a syscall type > > > > program" doesn't really seem like a significant detail to me... > > > > > > In this case, why do you prefer it to go via regular syscall? Seems > > > like we can avoid a bunch of boileplate syscall work with a kfunc that > > > does the attachment? > > > We might as well abstract it at, say, libbpf layer which would > > > generate/load this small bpf program to call a kfunc. > > > > I'm not sure we're on the same page here. > > imo using syscall bpf prog that calls kfunc to do a per-device attach > > is an overkill here. > > It's an experimental feature, but you're already worried about > > multiple netdevs? > > > > Can you add an empty nop function and attach to it tracing style > > with fentry ? > > It won't be per-netdev, but do you have to do per-device demux > > by the kernel? Can your tracing bpf prog do that instead? > > It's just an ifindex compare. > > This way than non-uapi bits will be even smaller and no need > > to change struct netdevice. > > It's probably going to work if each driver has a separate set of tx > fentry points, something like: > {veth,mlx5,etc}_devtx_submit() > {veth,mlx5,etc}_devtx_complete() Right. And per-driver descriptors. The 'generic' xdptx metadata is unlikely to be practical. Marshaling in and out of it is going to be too perf sensitive. I'd just add an attach point in the driver with enough args for bpf progs to make sense of the context and extend the verifier to make few safe fields writeable. kfuncs to read/request timestamp are probably too slow.
On Tue, 13 Jun 2023 20:54:26 -0700 David Ahern wrote: > On 6/13/23 9:31 PM, Jakub Kicinski wrote: > > On Mon, 12 Jun 2023 10:23:00 -0700 Stanislav Fomichev wrote: > >> The goal of this series is to add two new standard-ish places > >> in the transmit path: > >> > >> 1. Right before the packet is transmitted (with access to TX > >> descriptors) > > If a device requires multiple Tx descriptors per skb or multibuf frame, > how would that be handled within the XDP API? > > > I'm not sure that the Tx descriptors can be populated piecemeal. > > If it is host memory before the pidx move, why would that matter? Do you > have a specific example in mind? I don't mean it's impossible implement, but it's may get cumbersome. TSO/CSO/crypto may all need to know where L4 header starts, f.e. Some ECN marking in the NIC may also want to know where L3 is. So the offsets will get duplicated in each API. > > If we were ever to support more standard offload features, which > > require packet geometry (hdr offsets etc.) to be described "call > > per feature" will end up duplicating arguments, and there will be > > a lot of args.. > > > > And if there is an SKB path in the future combining the normal SKB > > offloads with the half-rendered descriptors may be a pain. > > Once the descriptor(s) is (are) populated, the skb is irrelevant is it > not? Only complication that comes to mind is wanting to add or remove > headers (e.g., tunnels) which will be much more complicated at this > point, but might still be possible on a per NIC (and maybe version) basis. I guess one can write the skb descriptors first, then modify them from the BPF. Either way I feel like the helper approach for Tx will result in drivers saving the info into some local struct and then rendering the descriptors after. We'll see.
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Tue, Jun 13, 2023 at 4:16 PM Stanislav Fomichev <sdf@google.com> wrote: >> >> On Tue, Jun 13, 2023 at 3:32 PM Alexei Starovoitov >> <alexei.starovoitov@gmail.com> wrote: >> > >> > On Tue, Jun 13, 2023 at 2:17 PM Stanislav Fomichev <sdf@google.com> wrote: >> > > >> > > > >> >> > --- UAPI --- >> > > > >> >> > >> > > > >> >> > The hooks are implemented in a HID-BPF style. Meaning they don't >> > > > >> >> > expose any UAPI and are implemented as tracing programs that call >> > > > >> >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall >> > > > >> >> > programs. The series expands device-bound infrastructure to tracing >> > > > >> >> > programs. >> > > > >> >> >> > > > >> >> Not a fan of the "attach from BPF syscall program" thing. These are part >> > > > >> >> of the XDP data path API, and I think we should expose them as proper >> > > > >> >> bpf_link attachments from userspace with introspection etc. But I guess >> > > > >> >> the bpf_mprog thing will give us that? >> > > > >> > >> > > > >> > bpf_mprog will just make those attach kfuncs return the link fd. The >> > > > >> > syscall program will still stay :-( >> > > > >> >> > > > >> Why does the attachment have to be done this way, exactly? Couldn't we >> > > > >> just use the regular bpf_link attachment from userspace? AFAICT it's not >> > > > >> really piggy-backing on the function override thing anyway when the >> > > > >> attachment is per-dev? Or am I misunderstanding how all this works? >> > > > > >> > > > > It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives >> > > > > us an opportunity to fix things. >> > > > > We can do it via a regular syscall path if there is a consensus. >> > > > >> > > > Yeah, the API exposed to the BPF program is kfunc-based in any case. If >> > > > we were to at some point conclude that this whole thing was not useful >> > > > at all and deprecate it, it doesn't seem to me that it makes much >> > > > difference whether that means "you can no longer create a link >> > > > attachment of this type via BPF_LINK_CREATE" or "you can no longer >> > > > create a link attachment of this type via BPF_PROG_RUN of a syscall type >> > > > program" doesn't really seem like a significant detail to me... >> > > >> > > In this case, why do you prefer it to go via regular syscall? Seems >> > > like we can avoid a bunch of boileplate syscall work with a kfunc that >> > > does the attachment? >> > > We might as well abstract it at, say, libbpf layer which would >> > > generate/load this small bpf program to call a kfunc. >> > >> > I'm not sure we're on the same page here. >> > imo using syscall bpf prog that calls kfunc to do a per-device attach >> > is an overkill here. >> > It's an experimental feature, but you're already worried about >> > multiple netdevs? >> > >> > Can you add an empty nop function and attach to it tracing style >> > with fentry ? >> > It won't be per-netdev, but do you have to do per-device demux >> > by the kernel? Can your tracing bpf prog do that instead? >> > It's just an ifindex compare. >> > This way than non-uapi bits will be even smaller and no need >> > to change struct netdevice. >> >> It's probably going to work if each driver has a separate set of tx >> fentry points, something like: >> {veth,mlx5,etc}_devtx_submit() >> {veth,mlx5,etc}_devtx_complete() I really don't get the opposition to exposing proper APIs; as a dataplane developer I want to attach a program to an interface. The kernel's role is to provide a consistent interface for this, not to require users to become driver developers just to get at the required details. > Right. And per-driver descriptors. > The 'generic' xdptx metadata is unlikely to be practical. > Marshaling in and out of it is going to be too perf sensitive. > I'd just add an attach point in the driver with enough > args for bpf progs to make sense of the context and extend > the verifier to make few safe fields writeable. This is a rehashing of the argument we had on the RX side: just exposing descriptors is a bad API because it forces BPF programs to deal with hardware errata - which is exactly the kind of thing that belongs in a driver. Exposing kfuncs allows drivers to deal with hardware quirks while exposing a consistent API to (BPF) users. > kfuncs to read/request timestamp are probably too slow. Which is why we should be inlining them :) -Toke
On Wed, Jun 14, 2023 at 5:00 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > >> > >> It's probably going to work if each driver has a separate set of tx > >> fentry points, something like: > >> {veth,mlx5,etc}_devtx_submit() > >> {veth,mlx5,etc}_devtx_complete() > > I really don't get the opposition to exposing proper APIs; as a > dataplane developer I want to attach a program to an interface. The > kernel's role is to provide a consistent interface for this, not to > require users to become driver developers just to get at the required > details. Consistent interface can appear only when there is a consistency across nic manufacturers. I'm suggesting to experiment in the most unstable way and if/when the consistency is discovered then generalize.
On 06/13, Jakub Kicinski wrote: > On Tue, 13 Jun 2023 20:54:26 -0700 David Ahern wrote: > > On 6/13/23 9:31 PM, Jakub Kicinski wrote: > > > On Mon, 12 Jun 2023 10:23:00 -0700 Stanislav Fomichev wrote: > > >> The goal of this series is to add two new standard-ish places > > >> in the transmit path: > > >> > > >> 1. Right before the packet is transmitted (with access to TX > > >> descriptors) > > > > If a device requires multiple Tx descriptors per skb or multibuf frame, > > how would that be handled within the XDP API? > > > > > I'm not sure that the Tx descriptors can be populated piecemeal. > > > > If it is host memory before the pidx move, why would that matter? Do you > > have a specific example in mind? > > I don't mean it's impossible implement, but it's may get cumbersome. > TSO/CSO/crypto may all need to know where L4 header starts, f.e. > Some ECN marking in the NIC may also want to know where L3 is. > So the offsets will get duplicated in each API. > > > > If we were ever to support more standard offload features, which > > > require packet geometry (hdr offsets etc.) to be described "call > > > per feature" will end up duplicating arguments, and there will be > > > a lot of args.. > > > > > > And if there is an SKB path in the future combining the normal SKB > > > offloads with the half-rendered descriptors may be a pain. > > > > Once the descriptor(s) is (are) populated, the skb is irrelevant is it > > not? Only complication that comes to mind is wanting to add or remove > > headers (e.g., tunnels) which will be much more complicated at this > > point, but might still be possible on a per NIC (and maybe version) basis. > > I guess one can write the skb descriptors first, then modify them from > the BPF. Either way I feel like the helper approach for Tx will result > in drivers saving the info into some local struct and then rendering > the descriptors after. We'll see. I agree that it's probably the "easiest" option to implement for the majority of the devices that were designed without much of a programmability this late in the stack. But maybe some devices can or at least we can try to influence future designs :-)
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Wed, Jun 14, 2023 at 5:00 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote: >> >> >> >> >> It's probably going to work if each driver has a separate set of tx >> >> fentry points, something like: >> >> {veth,mlx5,etc}_devtx_submit() >> >> {veth,mlx5,etc}_devtx_complete() >> >> I really don't get the opposition to exposing proper APIs; as a >> dataplane developer I want to attach a program to an interface. The >> kernel's role is to provide a consistent interface for this, not to >> require users to become driver developers just to get at the required >> details. > > Consistent interface can appear only when there is a consistency > across nic manufacturers. > I'm suggesting to experiment in the most unstable way and > if/when the consistency is discovered then generalize. That would be fine for new experimental HW features, but we're talking about timestamps here: a feature that is already supported by multiple drivers and for which the stack has a working abstraction. There's no reason why we can't have that for the XDP path as well. -Toke
On Thu, Jun 15, 2023 at 5:36 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > On Wed, Jun 14, 2023 at 5:00 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > >> > >> >> > >> >> It's probably going to work if each driver has a separate set of tx > >> >> fentry points, something like: > >> >> {veth,mlx5,etc}_devtx_submit() > >> >> {veth,mlx5,etc}_devtx_complete() > >> > >> I really don't get the opposition to exposing proper APIs; as a > >> dataplane developer I want to attach a program to an interface. The > >> kernel's role is to provide a consistent interface for this, not to > >> require users to become driver developers just to get at the required > >> details. > > > > Consistent interface can appear only when there is a consistency > > across nic manufacturers. > > I'm suggesting to experiment in the most unstable way and > > if/when the consistency is discovered then generalize. > > That would be fine for new experimental HW features, but we're talking > about timestamps here: a feature that is already supported by multiple > drivers and for which the stack has a working abstraction. There's no > reason why we can't have that for the XDP path as well. ... has an abstraction to receive, but has no mechanism to set it selectively per packet and read it on completion.
On Thu, Jun 15, 2023 at 9:10 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Jun 15, 2023 at 5:36 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > > > On Wed, Jun 14, 2023 at 5:00 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > >> > > >> >> > > >> >> It's probably going to work if each driver has a separate set of tx > > >> >> fentry points, something like: > > >> >> {veth,mlx5,etc}_devtx_submit() > > >> >> {veth,mlx5,etc}_devtx_complete() > > >> > > >> I really don't get the opposition to exposing proper APIs; as a > > >> dataplane developer I want to attach a program to an interface. The > > >> kernel's role is to provide a consistent interface for this, not to > > >> require users to become driver developers just to get at the required > > >> details. > > > > > > Consistent interface can appear only when there is a consistency > > > across nic manufacturers. > > > I'm suggesting to experiment in the most unstable way and > > > if/when the consistency is discovered then generalize. > > > > That would be fine for new experimental HW features, but we're talking > > about timestamps here: a feature that is already supported by multiple > > drivers and for which the stack has a working abstraction. There's no > > reason why we can't have that for the XDP path as well. > > ... has an abstraction to receive, but has no mechanism to set it > selectively per packet and read it on completion. Timestamp might be a bit of an outlier here where it's just setting some bit in some existing descriptor. For some features, the drivers might have to reserve extra descriptors and I'm not sure how safe it would be to let the programs arbitrarily mess with the descriptor queues like that. I'll probably keep this kfunc approach for v2 rfc for now (will try to get rid of the complicated attachment at least), but let's keep discussing.
On Mon, Jun 12, 2023 at 2:01 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > Some immediate thoughts after glancing through this: > > > --- Use cases --- > > > > The goal of this series is to add two new standard-ish places > > in the transmit path: > > > > 1. Right before the packet is transmitted (with access to TX > > descriptors) > > 2. Right after the packet is actually transmitted and we've received the > > completion (again, with access to TX completion descriptors) > > > > Accessing TX descriptors unlocks the following use-cases: > > > > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > > use device offloads. The existing case implements TX timestamp. > > - Observability: global per-netdev hooks can be used for tracing > > the packets and exploring completion descriptors for all sorts of > > device errors. > > > > Accessing TX descriptors also means that the hooks have to be called > > from the drivers. > > > > The hooks are a light-weight alternative to XDP at egress and currently > > don't provide any packet modification abilities. However, eventually, > > can expose new kfuncs to operate on the packet (or, rather, the actual > > descriptors; for performance sake). > > dynptr? > > > --- UAPI --- > > > > The hooks are implemented in a HID-BPF style. Meaning they don't > > expose any UAPI and are implemented as tracing programs that call > > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > > programs. The series expands device-bound infrastructure to tracing > > programs. > > Not a fan of the "attach from BPF syscall program" thing. These are part > of the XDP data path API, and I think we should expose them as proper > bpf_link attachments from userspace with introspection etc. But I guess > the bpf_mprog thing will give us that? > > > --- skb vs xdp --- > > > > The hooks operate on a new light-weight devtx_frame which contains: > > - data > > - len > > - sinfo > > > > This should allow us to have a unified (from BPF POW) place at TX > > and not be super-taxing (we need to copy 2 pointers + len to the stack > > for each invocation). > > Not sure what I think about this one. At the very least I think we > should expose xdp->data_meta as well. I'm not sure what the use case for > accessing skbs is? If that *is* indeed useful, probably there will also > end up being a use case for accessing the full skb? I spent some time looking at data_meta story on AF_XDP TX and it doesn't look like it's supported (at least in a general way). You obviously get some data_meta when you do XDP_TX, but if you want to pass something to the bpf prog when doing TX via the AF_XDP ring, it gets complicated. In zerocopy mode, we can probably use XDP_UMEM_UNALIGNED_CHUNK_FLAG and pass something in the headroom. If copy-mode, there is no support to do skb_metadata_set. Probably makes sense to have something like tx_metalen on the xsk? And skb_metadata_set it in copy more and skip it in zerocopy mode? Or maybe I'm missing something?
On Thu, 15 Jun 2023 09:31:19 -0700 Stanislav Fomichev wrote: > Timestamp might be a bit of an outlier here where it's just setting > some bit in some existing descriptor. > For some features, the drivers might have to reserve extra descriptors > and I'm not sure how safe it would be to let the programs arbitrarily > mess with the descriptor queues like that. I was gonna say, most NICs will have some form of descriptor chaining, with strict ordering, to perform reasonably with small packets. > I'll probably keep this kfunc approach for v2 rfc for now (will try to > get rid of the complicated attachment at least), but let's keep > discussing. SGTM, FWIW.
On Fri, 16 Jun 2023 at 02:09, Stanislav Fomichev <sdf@google.com> wrote: > > On Mon, Jun 12, 2023 at 2:01 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > > > Some immediate thoughts after glancing through this: > > > > > --- Use cases --- > > > > > > The goal of this series is to add two new standard-ish places > > > in the transmit path: > > > > > > 1. Right before the packet is transmitted (with access to TX > > > descriptors) > > > 2. Right after the packet is actually transmitted and we've received the > > > completion (again, with access to TX completion descriptors) > > > > > > Accessing TX descriptors unlocks the following use-cases: > > > > > > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > > > use device offloads. The existing case implements TX timestamp. > > > - Observability: global per-netdev hooks can be used for tracing > > > the packets and exploring completion descriptors for all sorts of > > > device errors. > > > > > > Accessing TX descriptors also means that the hooks have to be called > > > from the drivers. > > > > > > The hooks are a light-weight alternative to XDP at egress and currently > > > don't provide any packet modification abilities. However, eventually, > > > can expose new kfuncs to operate on the packet (or, rather, the actual > > > descriptors; for performance sake). > > > > dynptr? > > > > > --- UAPI --- > > > > > > The hooks are implemented in a HID-BPF style. Meaning they don't > > > expose any UAPI and are implemented as tracing programs that call > > > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > > > programs. The series expands device-bound infrastructure to tracing > > > programs. > > > > Not a fan of the "attach from BPF syscall program" thing. These are part > > of the XDP data path API, and I think we should expose them as proper > > bpf_link attachments from userspace with introspection etc. But I guess > > the bpf_mprog thing will give us that? > > > > > --- skb vs xdp --- > > > > > > The hooks operate on a new light-weight devtx_frame which contains: > > > - data > > > - len > > > - sinfo > > > > > > This should allow us to have a unified (from BPF POW) place at TX > > > and not be super-taxing (we need to copy 2 pointers + len to the stack > > > for each invocation). > > > > Not sure what I think about this one. At the very least I think we > > should expose xdp->data_meta as well. I'm not sure what the use case for > > accessing skbs is? If that *is* indeed useful, probably there will also > > end up being a use case for accessing the full skb? > > I spent some time looking at data_meta story on AF_XDP TX and it > doesn't look like it's supported (at least in a general way). > You obviously get some data_meta when you do XDP_TX, but if you want > to pass something to the bpf prog when doing TX via the AF_XDP ring, > it gets complicated. When we designed this some 5 - 6 years ago, we thought that there would be an XDP for egress action in the "nearish" future that could be used to interpret the metadata field in front of the packet. Basically, the user would load an XDP egress program that would define the metadata layout by the operations it would perform on the metadata area. But since XDP on egress has not happened, you are right, there is definitely something missing to be able to use metadata on Tx. Or could your proposed hook points be used for something like this? > In zerocopy mode, we can probably use XDP_UMEM_UNALIGNED_CHUNK_FLAG > and pass something in the headroom. This feature is mainly used to allow for multiple packets on the same chunk (to save space) and also to be able to have packets spanning two chunks. Even in aligned mode, you can start a packet at an arbitrary address in the chunk as long as the whole packet fits into the chunk. So no problem having headroom in any of the modes. > If copy-mode, there is no support to do skb_metadata_set. > > Probably makes sense to have something like tx_metalen on the xsk? And > skb_metadata_set it in copy more and skip it in zerocopy mode? > Or maybe I'm missing something? >
On Fri, Jun 16, 2023 at 1:13 AM Magnus Karlsson <magnus.karlsson@gmail.com> wrote: > > On Fri, 16 Jun 2023 at 02:09, Stanislav Fomichev <sdf@google.com> wrote: > > > > On Mon, Jun 12, 2023 at 2:01 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > > > > > Some immediate thoughts after glancing through this: > > > > > > > --- Use cases --- > > > > > > > > The goal of this series is to add two new standard-ish places > > > > in the transmit path: > > > > > > > > 1. Right before the packet is transmitted (with access to TX > > > > descriptors) > > > > 2. Right after the packet is actually transmitted and we've received the > > > > completion (again, with access to TX completion descriptors) > > > > > > > > Accessing TX descriptors unlocks the following use-cases: > > > > > > > > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > > > > use device offloads. The existing case implements TX timestamp. > > > > - Observability: global per-netdev hooks can be used for tracing > > > > the packets and exploring completion descriptors for all sorts of > > > > device errors. > > > > > > > > Accessing TX descriptors also means that the hooks have to be called > > > > from the drivers. > > > > > > > > The hooks are a light-weight alternative to XDP at egress and currently > > > > don't provide any packet modification abilities. However, eventually, > > > > can expose new kfuncs to operate on the packet (or, rather, the actual > > > > descriptors; for performance sake). > > > > > > dynptr? > > > > > > > --- UAPI --- > > > > > > > > The hooks are implemented in a HID-BPF style. Meaning they don't > > > > expose any UAPI and are implemented as tracing programs that call > > > > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > > > > programs. The series expands device-bound infrastructure to tracing > > > > programs. > > > > > > Not a fan of the "attach from BPF syscall program" thing. These are part > > > of the XDP data path API, and I think we should expose them as proper > > > bpf_link attachments from userspace with introspection etc. But I guess > > > the bpf_mprog thing will give us that? > > > > > > > --- skb vs xdp --- > > > > > > > > The hooks operate on a new light-weight devtx_frame which contains: > > > > - data > > > > - len > > > > - sinfo > > > > > > > > This should allow us to have a unified (from BPF POW) place at TX > > > > and not be super-taxing (we need to copy 2 pointers + len to the stack > > > > for each invocation). > > > > > > Not sure what I think about this one. At the very least I think we > > > should expose xdp->data_meta as well. I'm not sure what the use case for > > > accessing skbs is? If that *is* indeed useful, probably there will also > > > end up being a use case for accessing the full skb? > > > > I spent some time looking at data_meta story on AF_XDP TX and it > > doesn't look like it's supported (at least in a general way). > > You obviously get some data_meta when you do XDP_TX, but if you want > > to pass something to the bpf prog when doing TX via the AF_XDP ring, > > it gets complicated. > > When we designed this some 5 - 6 years ago, we thought that there > would be an XDP for egress action in the "nearish" future that could > be used to interpret the metadata field in front of the packet. > Basically, the user would load an XDP egress program that would define > the metadata layout by the operations it would perform on the metadata > area. But since XDP on egress has not happened, you are right, there > is definitely something missing to be able to use metadata on Tx. Or > could your proposed hook points be used for something like this? Thanks for the context! Yes, the proposal is to use these new tx hooks to read out af_xdp metadata and apply it to the packet via a bunch of tbd kfuncs. AF_XDP and BPF programs would have to have a contract about the metadata layout (same as we have on rx). > > In zerocopy mode, we can probably use XDP_UMEM_UNALIGNED_CHUNK_FLAG > > and pass something in the headroom. > > This feature is mainly used to allow for multiple packets on the same > chunk (to save space) and also to be able to have packets spanning two > chunks. Even in aligned mode, you can start a packet at an arbitrary > address in the chunk as long as the whole packet fits into the chunk. > So no problem having headroom in any of the modes. But if I put it into the headroom it will only be passed down to the driver in zero-copy mode, right? If I do tx_desc->addr = packet_start, no medata (that goes prior to packet_start) gets copied into skb in the copy mode (it seems). Or do you suggest that the interface should be tx_desc->addr = metadata_start and the bpf program should call the equivalent of bpf_xdp_adjust_head to consume this metadata?
On 06/16, Stanislav Fomichev wrote: > On Fri, Jun 16, 2023 at 1:13 AM Magnus Karlsson > <magnus.karlsson@gmail.com> wrote: > > > > On Fri, 16 Jun 2023 at 02:09, Stanislav Fomichev <sdf@google.com> wrote: > > > > > > On Mon, Jun 12, 2023 at 2:01 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > > > > > > > Some immediate thoughts after glancing through this: > > > > > > > > > --- Use cases --- > > > > > > > > > > The goal of this series is to add two new standard-ish places > > > > > in the transmit path: > > > > > > > > > > 1. Right before the packet is transmitted (with access to TX > > > > > descriptors) > > > > > 2. Right after the packet is actually transmitted and we've received the > > > > > completion (again, with access to TX completion descriptors) > > > > > > > > > > Accessing TX descriptors unlocks the following use-cases: > > > > > > > > > > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > > > > > use device offloads. The existing case implements TX timestamp. > > > > > - Observability: global per-netdev hooks can be used for tracing > > > > > the packets and exploring completion descriptors for all sorts of > > > > > device errors. > > > > > > > > > > Accessing TX descriptors also means that the hooks have to be called > > > > > from the drivers. > > > > > > > > > > The hooks are a light-weight alternative to XDP at egress and currently > > > > > don't provide any packet modification abilities. However, eventually, > > > > > can expose new kfuncs to operate on the packet (or, rather, the actual > > > > > descriptors; for performance sake). > > > > > > > > dynptr? > > > > > > > > > --- UAPI --- > > > > > > > > > > The hooks are implemented in a HID-BPF style. Meaning they don't > > > > > expose any UAPI and are implemented as tracing programs that call > > > > > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > > > > > programs. The series expands device-bound infrastructure to tracing > > > > > programs. > > > > > > > > Not a fan of the "attach from BPF syscall program" thing. These are part > > > > of the XDP data path API, and I think we should expose them as proper > > > > bpf_link attachments from userspace with introspection etc. But I guess > > > > the bpf_mprog thing will give us that? > > > > > > > > > --- skb vs xdp --- > > > > > > > > > > The hooks operate on a new light-weight devtx_frame which contains: > > > > > - data > > > > > - len > > > > > - sinfo > > > > > > > > > > This should allow us to have a unified (from BPF POW) place at TX > > > > > and not be super-taxing (we need to copy 2 pointers + len to the stack > > > > > for each invocation). > > > > > > > > Not sure what I think about this one. At the very least I think we > > > > should expose xdp->data_meta as well. I'm not sure what the use case for > > > > accessing skbs is? If that *is* indeed useful, probably there will also > > > > end up being a use case for accessing the full skb? > > > > > > I spent some time looking at data_meta story on AF_XDP TX and it > > > doesn't look like it's supported (at least in a general way). > > > You obviously get some data_meta when you do XDP_TX, but if you want > > > to pass something to the bpf prog when doing TX via the AF_XDP ring, > > > it gets complicated. > > > > When we designed this some 5 - 6 years ago, we thought that there > > would be an XDP for egress action in the "nearish" future that could > > be used to interpret the metadata field in front of the packet. > > Basically, the user would load an XDP egress program that would define > > the metadata layout by the operations it would perform on the metadata > > area. But since XDP on egress has not happened, you are right, there > > is definitely something missing to be able to use metadata on Tx. Or > > could your proposed hook points be used for something like this? > > Thanks for the context! > Yes, the proposal is to use these new tx hooks to read out af_xdp > metadata and apply it to the packet via a bunch of tbd kfuncs. > AF_XDP and BPF programs would have to have a contract about the > metadata layout (same as we have on rx). > > > > In zerocopy mode, we can probably use XDP_UMEM_UNALIGNED_CHUNK_FLAG > > > and pass something in the headroom. > > > > This feature is mainly used to allow for multiple packets on the same > > chunk (to save space) and also to be able to have packets spanning two > > chunks. Even in aligned mode, you can start a packet at an arbitrary > > address in the chunk as long as the whole packet fits into the chunk. > > So no problem having headroom in any of the modes. > > But if I put it into the headroom it will only be passed down to the > driver in zero-copy mode, right? > If I do tx_desc->addr = packet_start, no medata (that goes prior to > packet_start) gets copied into skb in the copy mode (it seems). > Or do you suggest that the interface should be tx_desc->addr = > metadata_start and the bpf program should call the equivalent of > bpf_xdp_adjust_head to consume this metadata? For copy-mode, here is what I've prototyped. That seems to work. For zero-copy, I don't think we need anything extra (besides exposing xsk->tx_meta_len at the hook point, tbd). Does the patch below make sense? diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index e96a1151ec75..30018b3b862d 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -51,6 +51,7 @@ struct xdp_sock { struct list_head flush_node; struct xsk_buff_pool *pool; u16 queue_id; + u8 tx_metadata_len; bool zc; enum { XSK_READY = 0, diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index a78a8096f4ce..2374eafff7db 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -63,6 +63,7 @@ struct xdp_mmap_offsets { #define XDP_UMEM_COMPLETION_RING 6 #define XDP_STATISTICS 7 #define XDP_OPTIONS 8 +#define XDP_TX_METADATA_LEN 9 struct xdp_umem_reg { __u64 addr; /* Start of packet data area */ diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index cc1e7f15fa73..a95872712547 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -493,14 +493,21 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, return ERR_PTR(err); skb_reserve(skb, hr); - skb_put(skb, len); + skb_put(skb, len + xs->tx_metadata_len); buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); + buffer -= xs->tx_metadata_len; + err = skb_store_bits(skb, 0, buffer, len); if (unlikely(err)) { kfree_skb(skb); return ERR_PTR(err); } + + if (xs->tx_metadata_len) { + skb_metadata_set(skb, xs->tx_metadata_len); + __skb_pull(skb, xs->tx_metadata_len); + } } skb->dev = dev; @@ -1137,6 +1144,27 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, mutex_unlock(&xs->mutex); return err; } + case XDP_TX_METADATA_LEN: + { + int val; + + if (optlen < sizeof(val)) + return -EINVAL; + if (copy_from_sockptr(&val, optval, sizeof(val))) + return -EFAULT; + + if (val >= 256) + return -EINVAL; + + mutex_lock(&xs->mutex); + if (xs->state != XSK_READY) { + mutex_unlock(&xs->mutex); + return -EBUSY; + } + xs->tx_metadata_len = val; + mutex_unlock(&xs->mutex); + return err; + } default: break; }
On Sat, 17 Jun 2023 at 01:10, Stanislav Fomichev <sdf@google.com> wrote: > > On 06/16, Stanislav Fomichev wrote: > > On Fri, Jun 16, 2023 at 1:13 AM Magnus Karlsson > > <magnus.karlsson@gmail.com> wrote: > > > > > > On Fri, 16 Jun 2023 at 02:09, Stanislav Fomichev <sdf@google.com> wrote: > > > > > > > > On Mon, Jun 12, 2023 at 2:01 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > > > > > > > > > Some immediate thoughts after glancing through this: > > > > > > > > > > > --- Use cases --- > > > > > > > > > > > > The goal of this series is to add two new standard-ish places > > > > > > in the transmit path: > > > > > > > > > > > > 1. Right before the packet is transmitted (with access to TX > > > > > > descriptors) > > > > > > 2. Right after the packet is actually transmitted and we've received the > > > > > > completion (again, with access to TX completion descriptors) > > > > > > > > > > > > Accessing TX descriptors unlocks the following use-cases: > > > > > > > > > > > > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > > > > > > use device offloads. The existing case implements TX timestamp. > > > > > > - Observability: global per-netdev hooks can be used for tracing > > > > > > the packets and exploring completion descriptors for all sorts of > > > > > > device errors. > > > > > > > > > > > > Accessing TX descriptors also means that the hooks have to be called > > > > > > from the drivers. > > > > > > > > > > > > The hooks are a light-weight alternative to XDP at egress and currently > > > > > > don't provide any packet modification abilities. However, eventually, > > > > > > can expose new kfuncs to operate on the packet (or, rather, the actual > > > > > > descriptors; for performance sake). > > > > > > > > > > dynptr? > > > > > > > > > > > --- UAPI --- > > > > > > > > > > > > The hooks are implemented in a HID-BPF style. Meaning they don't > > > > > > expose any UAPI and are implemented as tracing programs that call > > > > > > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > > > > > > programs. The series expands device-bound infrastructure to tracing > > > > > > programs. > > > > > > > > > > Not a fan of the "attach from BPF syscall program" thing. These are part > > > > > of the XDP data path API, and I think we should expose them as proper > > > > > bpf_link attachments from userspace with introspection etc. But I guess > > > > > the bpf_mprog thing will give us that? > > > > > > > > > > > --- skb vs xdp --- > > > > > > > > > > > > The hooks operate on a new light-weight devtx_frame which contains: > > > > > > - data > > > > > > - len > > > > > > - sinfo > > > > > > > > > > > > This should allow us to have a unified (from BPF POW) place at TX > > > > > > and not be super-taxing (we need to copy 2 pointers + len to the stack > > > > > > for each invocation). > > > > > > > > > > Not sure what I think about this one. At the very least I think we > > > > > should expose xdp->data_meta as well. I'm not sure what the use case for > > > > > accessing skbs is? If that *is* indeed useful, probably there will also > > > > > end up being a use case for accessing the full skb? > > > > > > > > I spent some time looking at data_meta story on AF_XDP TX and it > > > > doesn't look like it's supported (at least in a general way). > > > > You obviously get some data_meta when you do XDP_TX, but if you want > > > > to pass something to the bpf prog when doing TX via the AF_XDP ring, > > > > it gets complicated. > > > > > > When we designed this some 5 - 6 years ago, we thought that there > > > would be an XDP for egress action in the "nearish" future that could > > > be used to interpret the metadata field in front of the packet. > > > Basically, the user would load an XDP egress program that would define > > > the metadata layout by the operations it would perform on the metadata > > > area. But since XDP on egress has not happened, you are right, there > > > is definitely something missing to be able to use metadata on Tx. Or > > > could your proposed hook points be used for something like this? > > > > Thanks for the context! > > Yes, the proposal is to use these new tx hooks to read out af_xdp > > metadata and apply it to the packet via a bunch of tbd kfuncs. > > AF_XDP and BPF programs would have to have a contract about the > > metadata layout (same as we have on rx). > > > > > > In zerocopy mode, we can probably use XDP_UMEM_UNALIGNED_CHUNK_FLAG > > > > and pass something in the headroom. > > > > > > This feature is mainly used to allow for multiple packets on the same > > > chunk (to save space) and also to be able to have packets spanning two > > > chunks. Even in aligned mode, you can start a packet at an arbitrary > > > address in the chunk as long as the whole packet fits into the chunk. > > > So no problem having headroom in any of the modes. > > > > But if I put it into the headroom it will only be passed down to the > > driver in zero-copy mode, right? > > If I do tx_desc->addr = packet_start, no medata (that goes prior to > > packet_start) gets copied into skb in the copy mode (it seems). > > Or do you suggest that the interface should be tx_desc->addr = > > metadata_start and the bpf program should call the equivalent of > > bpf_xdp_adjust_head to consume this metadata? > > For copy-mode, here is what I've prototyped. That seems to work. > For zero-copy, I don't think we need anything extra (besides exposing > xsk->tx_meta_len at the hook point, tbd). Does the patch below make > sense? Was just going to suggest adding a setsockopt, so this makes perfect sense to me. Thanks! > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > index e96a1151ec75..30018b3b862d 100644 > --- a/include/net/xdp_sock.h > +++ b/include/net/xdp_sock.h > @@ -51,6 +51,7 @@ struct xdp_sock { > struct list_head flush_node; > struct xsk_buff_pool *pool; > u16 queue_id; > + u8 tx_metadata_len; > bool zc; > enum { > XSK_READY = 0, > diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h > index a78a8096f4ce..2374eafff7db 100644 > --- a/include/uapi/linux/if_xdp.h > +++ b/include/uapi/linux/if_xdp.h > @@ -63,6 +63,7 @@ struct xdp_mmap_offsets { > #define XDP_UMEM_COMPLETION_RING 6 > #define XDP_STATISTICS 7 > #define XDP_OPTIONS 8 > +#define XDP_TX_METADATA_LEN 9 > > struct xdp_umem_reg { > __u64 addr; /* Start of packet data area */ > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index cc1e7f15fa73..a95872712547 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -493,14 +493,21 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > return ERR_PTR(err); > > skb_reserve(skb, hr); > - skb_put(skb, len); > + skb_put(skb, len + xs->tx_metadata_len); > > buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); > + buffer -= xs->tx_metadata_len; > + > err = skb_store_bits(skb, 0, buffer, len); > if (unlikely(err)) { > kfree_skb(skb); > return ERR_PTR(err); > } > + > + if (xs->tx_metadata_len) { > + skb_metadata_set(skb, xs->tx_metadata_len); > + __skb_pull(skb, xs->tx_metadata_len); > + } > } > > skb->dev = dev; > @@ -1137,6 +1144,27 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, > mutex_unlock(&xs->mutex); > return err; > } > + case XDP_TX_METADATA_LEN: > + { > + int val; > + > + if (optlen < sizeof(val)) > + return -EINVAL; > + if (copy_from_sockptr(&val, optval, sizeof(val))) > + return -EFAULT; > + > + if (val >= 256) > + return -EINVAL; > + > + mutex_lock(&xs->mutex); > + if (xs->state != XSK_READY) { > + mutex_unlock(&xs->mutex); > + return -EBUSY; > + } > + xs->tx_metadata_len = val; > + mutex_unlock(&xs->mutex); > + return err; > + } > default: > break; > }