Message ID | 20240410140141.495384-1-jhs@mojatatu.com (mailing list archive) |
---|---|
Headers | show |
Series | Introducing P4TC (series 1) | expand |
On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote: > The only change that v16 makes is to add a nack to patch 14 on kfuncs > from Daniel and John. We strongly disagree with the nack; unfortunately I > have to rehash whats already in the cover letter and has been discussed over > and over and over again: I feel bad asking, but I have to, since all options I have here are IMHO quite sub-optimal. How bad would be dropping patch 14 and reworking the rest with alternative s/w datapath? (I guess restoring it from oldest revision of this series). Paolo
On Thu, Apr 11, 2024 at 10:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote: > > The only change that v16 makes is to add a nack to patch 14 on kfuncs > > from Daniel and John. We strongly disagree with the nack; unfortunately I > > have to rehash whats already in the cover letter and has been discussed over > > and over and over again: > > I feel bad asking, but I have to, since all options I have here are > IMHO quite sub-optimal. > > How bad would be dropping patch 14 and reworking the rest with > alternative s/w datapath? (I guess restoring it from oldest revision of > this series). We want to keep using ebpf for the s/w datapath if that is not clear by now. I do not understand the obstructionism tbh. Are users allowed to use kfuncs as part of infra or not? My understanding is yes. This community is getting too political and my worry is that we have corporatism creeping in like it is in standards bodies. We started by not using ebpf. The same people who are objecting now went up in arms and insisted we use ebpf. As a member of this community, my motivation was to meet them in the middle by compromising. We invested another year to move to that middle ground. Now they are insisting we do not use ebpf because they dont like our design or how we are using ebpf or maybe it's not a use case they have any need for or some other politics. I lost track of the moving goal posts. Open source is about solving your itch. This code is entirely on TC, zero code changed in ebpf core. The new goalpost is based on emotional outrage over use of functions. The whole thing is getting extremely toxic. cheers, jamal
On Thu, Apr 11, 2024 at 12:24 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On Thu, Apr 11, 2024 at 10:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote: > > > The only change that v16 makes is to add a nack to patch 14 on kfuncs > > > from Daniel and John. We strongly disagree with the nack; unfortunately I > > > have to rehash whats already in the cover letter and has been discussed over > > > and over and over again: > > > > I feel bad asking, but I have to, since all options I have here are > > IMHO quite sub-optimal. > > > > How bad would be dropping patch 14 and reworking the rest with > > alternative s/w datapath? (I guess restoring it from oldest revision of > > this series). > > > We want to keep using ebpf for the s/w datapath if that is not clear by now. > I do not understand the obstructionism tbh. Are users allowed to use > kfuncs as part of infra or not? My understanding is yes. > This community is getting too political and my worry is that we have > corporatism creeping in like it is in standards bodies. > We started by not using ebpf. The same people who are objecting now > went up in arms and insisted we use ebpf. As a member of this > community, my motivation was to meet them in the middle by > compromising. We invested another year to move to that middle ground. > Now they are insisting we do not use ebpf because they dont like our > design or how we are using ebpf or maybe it's not a use case they have > any need for or some other politics. I lost track of the moving goal > posts. Open source is about solving your itch. This code is entirely > on TC, zero code changed in ebpf core. The new goalpost is based on > emotional outrage over use of functions. The whole thing is getting > extremely toxic. > Paolo, Following up since no movement for a week now;-> I am going to give benefit of doubt that there was miscommunication or misunderstanding for all the back and forth that has happened so far with the nackers. I will provide a summary below on the main points raised and then provide responses: 1) "Use maps" It doesnt make sense for our requirement. The reason we are using TC is because a) P4 has an excellent fit with TC match action paradigm b) we are targeting both s/w and h/w and the TC model caters well for this. The objects belong to TC, shared between s/w, h/w and control plane (and netlink is the API). Maybe this diagram would help: https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png While the s/w part stands on its own accord (as elaborated many times), for TC which has offloads, the s/w twin is introduced before the h/w equivalent. This is what this series is doing. 2) "but ... it is not performant" This has been brought up in regards to netlink and kfuncs. Performance is a lower priority to P4 correctness and expressibility. Netlink provides us the abstractions we need, it works with TC for both s/w and h/w offload and has a lot of knowledge base for expressing control plane APIs. We dont believe reinventing all that makes sense. Kfuncs are a means to an end - they provide us the gluing we need to have an ebpf s/w datapath to the TC objects. Getting an extra 10-100Kpps is not a driving factor. 3) "but you did it wrong, here's how you do it..." I gave up on responding to this - but do note this sentiment is a big theme in the exchanges and consumed most of the electrons. We are _never_ going to get any consensus with statements like "tc actions are a mistake" or "use tcx". 4) "... drop the kfunc patch" kfuncs essentially boil down to function calls. They don't require any special handling by the eBPF verifier nor introduce new semantics to eBPF. They are similar in nature to the already existing kfuncs interacting with other kernel objects such as nf_conntrack. The precedence (repeated in conferences and email threads multiple times) is: kfuncs dont have to be sent to ebpf list or reviewed by folks in the ebpf world. And We believe that rule applies to us as well. Either kfuncs (and frankly ebpf) is infrastructure glue or it's not. Now for a little rant: Open source is not a zero-sum game. Ebpf already coexists with netfilter, tc, etc and various subsystems happily. I hope our requirement is clear and i dont have to keep justifying why P4 or relitigate over and over again why we need TC. Open source is about scratching your itch and our itch is totally contained within TC. I cant help but feel that this community is getting way too pervasive with politics and obscure agendas. I understand agendas, I just dont understand the zero-sum thinking. My view is this series should still be applied with the nacks since it sits entirely on its own silo within networking/TC (and has nothing to do with ebpf). cheers, jamal
On Fri, Apr 19, 2024 at 5:08 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > My view is this series should still be applied with the nacks since it > sits entirely on its own silo within networking/TC (and has nothing to > do with ebpf). My Nack applies to the whole set. The kernel doesn't need this anti-feature for many reasons already explained.
On Fri, Apr 19, 2024 at 10:23 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Apr 19, 2024 at 5:08 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > My view is this series should still be applied with the nacks since it > > sits entirely on its own silo within networking/TC (and has nothing to > > do with ebpf). > > My Nack applies to the whole set. The kernel doesn't need this anti-feature > for many reasons already explained. Can you be more explicit? What else would you add to the list i posted above? cheers, jamal
On Fri, Apr 19, 2024 at 7:34 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On Fri, Apr 19, 2024 at 10:23 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Fri, Apr 19, 2024 at 5:08 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > My view is this series should still be applied with the nacks since it > > > sits entirely on its own silo within networking/TC (and has nothing to > > > do with ebpf). > > > > My Nack applies to the whole set. The kernel doesn't need this anti-feature > > for many reasons already explained. > > Can you be more explicit? What else would you add to the list i posted above? Since you're refusing to work with us your only option is to mention my Nack in the cover letter and send it as a PR to Linus during the merge window.
On Fri, Apr 19, 2024 at 10:37 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Apr 19, 2024 at 7:34 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > On Fri, Apr 19, 2024 at 10:23 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Fri, Apr 19, 2024 at 5:08 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > > > My view is this series should still be applied with the nacks since it > > > > sits entirely on its own silo within networking/TC (and has nothing to > > > > do with ebpf). > > > > > > My Nack applies to the whole set. The kernel doesn't need this anti-feature > > > for many reasons already explained. > > > > Can you be more explicit? What else would you add to the list i posted above? > > Since you're refusing to work with us your only option Who is "us"? ebpf? I hope you are not speaking on behalf of the net subsystem. You are entitled to your opinion (and aggression) - and there is a lot of that with you, but this should be based on technical merit not your emotions. I summarized the reasons brought up by you and Cilium. Do you have more to add to that list? If you do, please add to it. > is to mention my Nack in the cover letter and send it > as a PR to Linus during the merge window. You dont get to decide that - I was talking to the networking people. cheers, jamal
On Fri, Apr 19, 2024 at 7:45 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > You dont get to decide that - I was talking to the networking people. You think they want net-next PR to get derailed because of this?
On Fri, Apr 19, 2024 at 10:49 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Apr 19, 2024 at 7:45 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > You dont get to decide that - I was talking to the networking people. > > You think they want net-next PR to get derailed because of this? Why would it be derailed? cheers, jamal
On Fri, 2024-04-19 at 08:08 -0400, Jamal Hadi Salim wrote: > On Thu, Apr 11, 2024 at 12:24 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > On Thu, Apr 11, 2024 at 10:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote: > > > > The only change that v16 makes is to add a nack to patch 14 on kfuncs > > > > from Daniel and John. We strongly disagree with the nack; unfortunately I > > > > have to rehash whats already in the cover letter and has been discussed over > > > > and over and over again: > > > > > > I feel bad asking, but I have to, since all options I have here are > > > IMHO quite sub-optimal. > > > > > > How bad would be dropping patch 14 and reworking the rest with > > > alternative s/w datapath? (I guess restoring it from oldest revision of > > > this series). > > > > > > We want to keep using ebpf for the s/w datapath if that is not clear by now. > > I do not understand the obstructionism tbh. Are users allowed to use > > kfuncs as part of infra or not? My understanding is yes. > > This community is getting too political and my worry is that we have > > corporatism creeping in like it is in standards bodies. > > We started by not using ebpf. The same people who are objecting now > > went up in arms and insisted we use ebpf. As a member of this > > community, my motivation was to meet them in the middle by > > compromising. We invested another year to move to that middle ground. > > Now they are insisting we do not use ebpf because they dont like our > > design or how we are using ebpf or maybe it's not a use case they have > > any need for or some other politics. I lost track of the moving goal > > posts. Open source is about solving your itch. This code is entirely > > on TC, zero code changed in ebpf core. The new goalpost is based on > > emotional outrage over use of functions. The whole thing is getting > > extremely toxic. > > > > Paolo, > Following up since no movement for a week now;-> > I am going to give benefit of doubt that there was miscommunication or > misunderstanding for all the back and forth that has happened so far > with the nackers. I will provide a summary below on the main points > raised and then provide responses: > > 1) "Use maps" > > It doesnt make sense for our requirement. The reason we are using TC > is because a) P4 has an excellent fit with TC match action paradigm b) > we are targeting both s/w and h/w and the TC model caters well for > this. The objects belong to TC, shared between s/w, h/w and control > plane (and netlink is the API). Maybe this diagram would help: > https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png > > While the s/w part stands on its own accord (as elaborated many > times), for TC which has offloads, the s/w twin is introduced before > the h/w equivalent. This is what this series is doing. > > 2) "but ... it is not performant" > This has been brought up in regards to netlink and kfuncs. Performance > is a lower priority to P4 correctness and expressibility. > Netlink provides us the abstractions we need, it works with TC for > both s/w and h/w offload and has a lot of knowledge base for > expressing control plane APIs. We dont believe reinventing all that > makes sense. > Kfuncs are a means to an end - they provide us the gluing we need to > have an ebpf s/w datapath to the TC objects. Getting an extra > 10-100Kpps is not a driving factor. > > 3) "but you did it wrong, here's how you do it..." > > I gave up on responding to this - but do note this sentiment is a big > theme in the exchanges and consumed most of the electrons. We are > _never_ going to get any consensus with statements like "tc actions > are a mistake" or "use tcx". > > 4) "... drop the kfunc patch" > > kfuncs essentially boil down to function calls. They don't require any > special handling by the eBPF verifier nor introduce new semantics to > eBPF. They are similar in nature to the already existing kfuncs > interacting with other kernel objects such as nf_conntrack. > The precedence (repeated in conferences and email threads multiple > times) is: kfuncs dont have to be sent to ebpf list or reviewed by > folks in the ebpf world. And We believe that rule applies to us as > well. Either kfuncs (and frankly ebpf) is infrastructure glue or it's > not. > > Now for a little rant: > > Open source is not a zero-sum game. Ebpf already coexists with > netfilter, tc, etc and various subsystems happily. > I hope our requirement is clear and i dont have to keep justifying why > P4 or relitigate over and over again why we need TC. Open source is > about scratching your itch and our itch is totally contained within > TC. I cant help but feel that this community is getting way too > pervasive with politics and obscure agendas. I understand agendas, I > just dont understand the zero-sum thinking. > My view is this series should still be applied with the nacks since it > sits entirely on its own silo within networking/TC (and has nothing to > do with ebpf). It's really hard for me - meaning I'll not do that - applying a series that has been so fiercely nacked, especially given that the other maintainers are not supporting it. I really understand this is very bad for you. Let me try to do an extreme attempt to find some middle ground between this series and the bpf folks. My understanding is that the most disliked item is the lifecycle for the objects allocated via the kfunc(s). If I understand correctly, the hard requirement on bpf side is that any kernel object allocated by kfunc must be released at program unload time. p4tc postpone such allocation to recycle the structure. While there are other arguments, my reading of the past few iterations is that solving the above node should lift the nack, am I correct? Could p4tc pre-allocate all the p4tc_table_entry_act_bpf_kern entries and let p4a_runt_create_bpf() fail if the pool is empty? would that satisfy the bpf requirement? Otherwise could p4tc force free the p4tc_table_entry_act_bpf_kern at unload time? Thanks, Paolo
On Fri, Apr 19, 2024 at 1:20 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Fri, 2024-04-19 at 08:08 -0400, Jamal Hadi Salim wrote: > > On Thu, Apr 11, 2024 at 12:24 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > On Thu, Apr 11, 2024 at 10:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote: > > > > > The only change that v16 makes is to add a nack to patch 14 on kfuncs > > > > > from Daniel and John. We strongly disagree with the nack; unfortunately I > > > > > have to rehash whats already in the cover letter and has been discussed over > > > > > and over and over again: > > > > > > > > I feel bad asking, but I have to, since all options I have here are > > > > IMHO quite sub-optimal. > > > > > > > > How bad would be dropping patch 14 and reworking the rest with > > > > alternative s/w datapath? (I guess restoring it from oldest revision of > > > > this series). > > > > > > > > > We want to keep using ebpf for the s/w datapath if that is not clear by now. > > > I do not understand the obstructionism tbh. Are users allowed to use > > > kfuncs as part of infra or not? My understanding is yes. > > > This community is getting too political and my worry is that we have > > > corporatism creeping in like it is in standards bodies. > > > We started by not using ebpf. The same people who are objecting now > > > went up in arms and insisted we use ebpf. As a member of this > > > community, my motivation was to meet them in the middle by > > > compromising. We invested another year to move to that middle ground. > > > Now they are insisting we do not use ebpf because they dont like our > > > design or how we are using ebpf or maybe it's not a use case they have > > > any need for or some other politics. I lost track of the moving goal > > > posts. Open source is about solving your itch. This code is entirely > > > on TC, zero code changed in ebpf core. The new goalpost is based on > > > emotional outrage over use of functions. The whole thing is getting > > > extremely toxic. > > > > > > > Paolo, > > Following up since no movement for a week now;-> > > I am going to give benefit of doubt that there was miscommunication or > > misunderstanding for all the back and forth that has happened so far > > with the nackers. I will provide a summary below on the main points > > raised and then provide responses: > > > > 1) "Use maps" > > > > It doesnt make sense for our requirement. The reason we are using TC > > is because a) P4 has an excellent fit with TC match action paradigm b) > > we are targeting both s/w and h/w and the TC model caters well for > > this. The objects belong to TC, shared between s/w, h/w and control > > plane (and netlink is the API). Maybe this diagram would help: > > https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png > > > > While the s/w part stands on its own accord (as elaborated many > > times), for TC which has offloads, the s/w twin is introduced before > > the h/w equivalent. This is what this series is doing. > > > > 2) "but ... it is not performant" > > This has been brought up in regards to netlink and kfuncs. Performance > > is a lower priority to P4 correctness and expressibility. > > Netlink provides us the abstractions we need, it works with TC for > > both s/w and h/w offload and has a lot of knowledge base for > > expressing control plane APIs. We dont believe reinventing all that > > makes sense. > > Kfuncs are a means to an end - they provide us the gluing we need to > > have an ebpf s/w datapath to the TC objects. Getting an extra > > 10-100Kpps is not a driving factor. > > > > 3) "but you did it wrong, here's how you do it..." > > > > I gave up on responding to this - but do note this sentiment is a big > > theme in the exchanges and consumed most of the electrons. We are > > _never_ going to get any consensus with statements like "tc actions > > are a mistake" or "use tcx". > > > > 4) "... drop the kfunc patch" > > > > kfuncs essentially boil down to function calls. They don't require any > > special handling by the eBPF verifier nor introduce new semantics to > > eBPF. They are similar in nature to the already existing kfuncs > > interacting with other kernel objects such as nf_conntrack. > > The precedence (repeated in conferences and email threads multiple > > times) is: kfuncs dont have to be sent to ebpf list or reviewed by > > folks in the ebpf world. And We believe that rule applies to us as > > well. Either kfuncs (and frankly ebpf) is infrastructure glue or it's > > not. > > > > Now for a little rant: > > > > Open source is not a zero-sum game. Ebpf already coexists with > > netfilter, tc, etc and various subsystems happily. > > I hope our requirement is clear and i dont have to keep justifying why > > P4 or relitigate over and over again why we need TC. Open source is > > about scratching your itch and our itch is totally contained within > > TC. I cant help but feel that this community is getting way too > > pervasive with politics and obscure agendas. I understand agendas, I > > just dont understand the zero-sum thinking. > > My view is this series should still be applied with the nacks since it > > sits entirely on its own silo within networking/TC (and has nothing to > > do with ebpf). > > It's really hard for me - meaning I'll not do that - applying a series > that has been so fiercely nacked, especially given that the other > maintainers are not supporting it. > > I really understand this is very bad for you. > > Let me try to do an extreme attempt to find some middle ground between > this series and the bpf folks. > > My understanding is that the most disliked item is the lifecycle for > the objects allocated via the kfunc(s). > > If I understand correctly, the hard requirement on bpf side is that any > kernel object allocated by kfunc must be released at program unload > time. p4tc postpone such allocation to recycle the structure. > > While there are other arguments, my reading of the past few iterations > is that solving the above node should lift the nack, am I correct? > > Could p4tc pre-allocate all the p4tc_table_entry_act_bpf_kern entries > and let p4a_runt_create_bpf() fail if the pool is empty? would that > satisfy the bpf requirement? Let me think about it and weigh the consequences. > Otherwise could p4tc force free the p4tc_table_entry_act_bpf_kern at > unload time? This one wont work for us unfortunately. If we have entries added by the control plane with skip_sw just because the ebpf program is gone doesnt mean they disappear. cheers, jamal there are use cases where no entry is loaded by the s/w datap > Thanks, > > Paolo > >
On Fri, Apr 19, 2024 at 2:01 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On Fri, Apr 19, 2024 at 1:20 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Fri, 2024-04-19 at 08:08 -0400, Jamal Hadi Salim wrote: > > > On Thu, Apr 11, 2024 at 12:24 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > > > On Thu, Apr 11, 2024 at 10:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote: > > > > > > The only change that v16 makes is to add a nack to patch 14 on kfuncs > > > > > > from Daniel and John. We strongly disagree with the nack; unfortunately I > > > > > > have to rehash whats already in the cover letter and has been discussed over > > > > > > and over and over again: > > > > > > > > > > I feel bad asking, but I have to, since all options I have here are > > > > > IMHO quite sub-optimal. > > > > > > > > > > How bad would be dropping patch 14 and reworking the rest with > > > > > alternative s/w datapath? (I guess restoring it from oldest revision of > > > > > this series). > > > > > > > > > > > > We want to keep using ebpf for the s/w datapath if that is not clear by now. > > > > I do not understand the obstructionism tbh. Are users allowed to use > > > > kfuncs as part of infra or not? My understanding is yes. > > > > This community is getting too political and my worry is that we have > > > > corporatism creeping in like it is in standards bodies. > > > > We started by not using ebpf. The same people who are objecting now > > > > went up in arms and insisted we use ebpf. As a member of this > > > > community, my motivation was to meet them in the middle by > > > > compromising. We invested another year to move to that middle ground. > > > > Now they are insisting we do not use ebpf because they dont like our > > > > design or how we are using ebpf or maybe it's not a use case they have > > > > any need for or some other politics. I lost track of the moving goal > > > > posts. Open source is about solving your itch. This code is entirely > > > > on TC, zero code changed in ebpf core. The new goalpost is based on > > > > emotional outrage over use of functions. The whole thing is getting > > > > extremely toxic. > > > > > > > > > > Paolo, > > > Following up since no movement for a week now;-> > > > I am going to give benefit of doubt that there was miscommunication or > > > misunderstanding for all the back and forth that has happened so far > > > with the nackers. I will provide a summary below on the main points > > > raised and then provide responses: > > > > > > 1) "Use maps" > > > > > > It doesnt make sense for our requirement. The reason we are using TC > > > is because a) P4 has an excellent fit with TC match action paradigm b) > > > we are targeting both s/w and h/w and the TC model caters well for > > > this. The objects belong to TC, shared between s/w, h/w and control > > > plane (and netlink is the API). Maybe this diagram would help: > > > https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png > > > > > > While the s/w part stands on its own accord (as elaborated many > > > times), for TC which has offloads, the s/w twin is introduced before > > > the h/w equivalent. This is what this series is doing. > > > > > > 2) "but ... it is not performant" > > > This has been brought up in regards to netlink and kfuncs. Performance > > > is a lower priority to P4 correctness and expressibility. > > > Netlink provides us the abstractions we need, it works with TC for > > > both s/w and h/w offload and has a lot of knowledge base for > > > expressing control plane APIs. We dont believe reinventing all that > > > makes sense. > > > Kfuncs are a means to an end - they provide us the gluing we need to > > > have an ebpf s/w datapath to the TC objects. Getting an extra > > > 10-100Kpps is not a driving factor. > > > > > > 3) "but you did it wrong, here's how you do it..." > > > > > > I gave up on responding to this - but do note this sentiment is a big > > > theme in the exchanges and consumed most of the electrons. We are > > > _never_ going to get any consensus with statements like "tc actions > > > are a mistake" or "use tcx". > > > > > > 4) "... drop the kfunc patch" > > > > > > kfuncs essentially boil down to function calls. They don't require any > > > special handling by the eBPF verifier nor introduce new semantics to > > > eBPF. They are similar in nature to the already existing kfuncs > > > interacting with other kernel objects such as nf_conntrack. > > > The precedence (repeated in conferences and email threads multiple > > > times) is: kfuncs dont have to be sent to ebpf list or reviewed by > > > folks in the ebpf world. And We believe that rule applies to us as > > > well. Either kfuncs (and frankly ebpf) is infrastructure glue or it's > > > not. > > > > > > Now for a little rant: > > > > > > Open source is not a zero-sum game. Ebpf already coexists with > > > netfilter, tc, etc and various subsystems happily. > > > I hope our requirement is clear and i dont have to keep justifying why > > > P4 or relitigate over and over again why we need TC. Open source is > > > about scratching your itch and our itch is totally contained within > > > TC. I cant help but feel that this community is getting way too > > > pervasive with politics and obscure agendas. I understand agendas, I > > > just dont understand the zero-sum thinking. > > > My view is this series should still be applied with the nacks since it > > > sits entirely on its own silo within networking/TC (and has nothing to > > > do with ebpf). > > > > It's really hard for me - meaning I'll not do that - applying a series > > that has been so fiercely nacked, especially given that the other > > maintainers are not supporting it. > > > > I really understand this is very bad for you. > > > > Let me try to do an extreme attempt to find some middle ground between > > this series and the bpf folks. > > > > My understanding is that the most disliked item is the lifecycle for > > the objects allocated via the kfunc(s). > > > > If I understand correctly, the hard requirement on bpf side is that any > > kernel object allocated by kfunc must be released at program unload > > time. p4tc postpone such allocation to recycle the structure. > > > > While there are other arguments, my reading of the past few iterations > > is that solving the above node should lift the nack, am I correct? > > > > Could p4tc pre-allocate all the p4tc_table_entry_act_bpf_kern entries > > and let p4a_runt_create_bpf() fail if the pool is empty? would that > > satisfy the bpf requirement? > > Let me think about it and weigh the consequences. > Sorry, was busy evaluating. Yes, we can enforce the memory allocation constraints such that when the ebpf program is removed any entries added by said ebpf program can be removed from the datapath. > > Otherwise could p4tc force free the p4tc_table_entry_act_bpf_kern at > > unload time? > > This one wont work for us unfortunately. If we have entries added by > the control plane with skip_sw just because the ebpf program is gone > doesnt mean they disappear. Just to clarify (the figure https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png should help) : For P4 table objects, there are 3 types of entries: 1) created by control path for s/w datapath with skip_hw 2) created by control path for h/w datapath with skip_sw and 3) dynamically created by s/w datapath (ebpf) not far off from conntrack. The only ones we can remove when the ebpf program goes away are from #3. cheers, jamal
On Fri, 2024-04-26 at 13:12 -0400, Jamal Hadi Salim wrote: > On Fri, Apr 19, 2024 at 2:01 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > On Fri, Apr 19, 2024 at 1:20 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > On Fri, 2024-04-19 at 08:08 -0400, Jamal Hadi Salim wrote: > > > > On Thu, Apr 11, 2024 at 12:24 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > > > > > On Thu, Apr 11, 2024 at 10:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > > > On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote: > > > > > > > The only change that v16 makes is to add a nack to patch 14 on kfuncs > > > > > > > from Daniel and John. We strongly disagree with the nack; unfortunately I > > > > > > > have to rehash whats already in the cover letter and has been discussed over > > > > > > > and over and over again: > > > > > > > > > > > > I feel bad asking, but I have to, since all options I have here are > > > > > > IMHO quite sub-optimal. > > > > > > > > > > > > How bad would be dropping patch 14 and reworking the rest with > > > > > > alternative s/w datapath? (I guess restoring it from oldest revision of > > > > > > this series). > > > > > > > > > > > > > > > We want to keep using ebpf for the s/w datapath if that is not clear by now. > > > > > I do not understand the obstructionism tbh. Are users allowed to use > > > > > kfuncs as part of infra or not? My understanding is yes. > > > > > This community is getting too political and my worry is that we have > > > > > corporatism creeping in like it is in standards bodies. > > > > > We started by not using ebpf. The same people who are objecting now > > > > > went up in arms and insisted we use ebpf. As a member of this > > > > > community, my motivation was to meet them in the middle by > > > > > compromising. We invested another year to move to that middle ground. > > > > > Now they are insisting we do not use ebpf because they dont like our > > > > > design or how we are using ebpf or maybe it's not a use case they have > > > > > any need for or some other politics. I lost track of the moving goal > > > > > posts. Open source is about solving your itch. This code is entirely > > > > > on TC, zero code changed in ebpf core. The new goalpost is based on > > > > > emotional outrage over use of functions. The whole thing is getting > > > > > extremely toxic. > > > > > > > > > > > > > Paolo, > > > > Following up since no movement for a week now;-> > > > > I am going to give benefit of doubt that there was miscommunication or > > > > misunderstanding for all the back and forth that has happened so far > > > > with the nackers. I will provide a summary below on the main points > > > > raised and then provide responses: > > > > > > > > 1) "Use maps" > > > > > > > > It doesnt make sense for our requirement. The reason we are using TC > > > > is because a) P4 has an excellent fit with TC match action paradigm b) > > > > we are targeting both s/w and h/w and the TC model caters well for > > > > this. The objects belong to TC, shared between s/w, h/w and control > > > > plane (and netlink is the API). Maybe this diagram would help: > > > > https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png > > > > > > > > While the s/w part stands on its own accord (as elaborated many > > > > times), for TC which has offloads, the s/w twin is introduced before > > > > the h/w equivalent. This is what this series is doing. > > > > > > > > 2) "but ... it is not performant" > > > > This has been brought up in regards to netlink and kfuncs. Performance > > > > is a lower priority to P4 correctness and expressibility. > > > > Netlink provides us the abstractions we need, it works with TC for > > > > both s/w and h/w offload and has a lot of knowledge base for > > > > expressing control plane APIs. We dont believe reinventing all that > > > > makes sense. > > > > Kfuncs are a means to an end - they provide us the gluing we need to > > > > have an ebpf s/w datapath to the TC objects. Getting an extra > > > > 10-100Kpps is not a driving factor. > > > > > > > > 3) "but you did it wrong, here's how you do it..." > > > > > > > > I gave up on responding to this - but do note this sentiment is a big > > > > theme in the exchanges and consumed most of the electrons. We are > > > > _never_ going to get any consensus with statements like "tc actions > > > > are a mistake" or "use tcx". > > > > > > > > 4) "... drop the kfunc patch" > > > > > > > > kfuncs essentially boil down to function calls. They don't require any > > > > special handling by the eBPF verifier nor introduce new semantics to > > > > eBPF. They are similar in nature to the already existing kfuncs > > > > interacting with other kernel objects such as nf_conntrack. > > > > The precedence (repeated in conferences and email threads multiple > > > > times) is: kfuncs dont have to be sent to ebpf list or reviewed by > > > > folks in the ebpf world. And We believe that rule applies to us as > > > > well. Either kfuncs (and frankly ebpf) is infrastructure glue or it's > > > > not. > > > > > > > > Now for a little rant: > > > > > > > > Open source is not a zero-sum game. Ebpf already coexists with > > > > netfilter, tc, etc and various subsystems happily. > > > > I hope our requirement is clear and i dont have to keep justifying why > > > > P4 or relitigate over and over again why we need TC. Open source is > > > > about scratching your itch and our itch is totally contained within > > > > TC. I cant help but feel that this community is getting way too > > > > pervasive with politics and obscure agendas. I understand agendas, I > > > > just dont understand the zero-sum thinking. > > > > My view is this series should still be applied with the nacks since it > > > > sits entirely on its own silo within networking/TC (and has nothing to > > > > do with ebpf). > > > > > > It's really hard for me - meaning I'll not do that - applying a series > > > that has been so fiercely nacked, especially given that the other > > > maintainers are not supporting it. > > > > > > I really understand this is very bad for you. > > > > > > Let me try to do an extreme attempt to find some middle ground between > > > this series and the bpf folks. > > > > > > My understanding is that the most disliked item is the lifecycle for > > > the objects allocated via the kfunc(s). > > > > > > If I understand correctly, the hard requirement on bpf side is that any > > > kernel object allocated by kfunc must be released at program unload > > > time. p4tc postpone such allocation to recycle the structure. > > > > > > While there are other arguments, my reading of the past few iterations > > > is that solving the above node should lift the nack, am I correct? > > > > > > Could p4tc pre-allocate all the p4tc_table_entry_act_bpf_kern entries > > > and let p4a_runt_create_bpf() fail if the pool is empty? would that > > > satisfy the bpf requirement? > > > > Let me think about it and weigh the consequences. > > > > Sorry, was busy evaluating. Yes, we can enforce the memory allocation > constraints such that when the ebpf program is removed any entries > added by said ebpf program can be removed from the datapath. I suggested the such changes based on my interpretation of this long and complex discussion, I can have missed some or many relevant points. @Alexei: could you please double check the above and eventually, hopefully, confirm that such change would lift your nacked-by? Thanks! Paolo
On Fri, Apr 26, 2024 at 10:21 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Fri, 2024-04-26 at 13:12 -0400, Jamal Hadi Salim wrote: > > On Fri, Apr 19, 2024 at 2:01 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > On Fri, Apr 19, 2024 at 1:20 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > On Fri, 2024-04-19 at 08:08 -0400, Jamal Hadi Salim wrote: > > > > > On Thu, Apr 11, 2024 at 12:24 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > > > > > > > On Thu, Apr 11, 2024 at 10:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > > > > > On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote: > > > > > > > > The only change that v16 makes is to add a nack to patch 14 on kfuncs > > > > > > > > from Daniel and John. We strongly disagree with the nack; unfortunately I > > > > > > > > have to rehash whats already in the cover letter and has been discussed over > > > > > > > > and over and over again: > > > > > > > > > > > > > > I feel bad asking, but I have to, since all options I have here are > > > > > > > IMHO quite sub-optimal. > > > > > > > > > > > > > > How bad would be dropping patch 14 and reworking the rest with > > > > > > > alternative s/w datapath? (I guess restoring it from oldest revision of > > > > > > > this series). > > > > > > > > > > > > > > > > > > We want to keep using ebpf for the s/w datapath if that is not clear by now. > > > > > > I do not understand the obstructionism tbh. Are users allowed to use > > > > > > kfuncs as part of infra or not? My understanding is yes. > > > > > > This community is getting too political and my worry is that we have > > > > > > corporatism creeping in like it is in standards bodies. > > > > > > We started by not using ebpf. The same people who are objecting now > > > > > > went up in arms and insisted we use ebpf. As a member of this > > > > > > community, my motivation was to meet them in the middle by > > > > > > compromising. We invested another year to move to that middle ground. > > > > > > Now they are insisting we do not use ebpf because they dont like our > > > > > > design or how we are using ebpf or maybe it's not a use case they have > > > > > > any need for or some other politics. I lost track of the moving goal > > > > > > posts. Open source is about solving your itch. This code is entirely > > > > > > on TC, zero code changed in ebpf core. The new goalpost is based on > > > > > > emotional outrage over use of functions. The whole thing is getting > > > > > > extremely toxic. > > > > > > > > > > > > > > > > Paolo, > > > > > Following up since no movement for a week now;-> > > > > > I am going to give benefit of doubt that there was miscommunication or > > > > > misunderstanding for all the back and forth that has happened so far > > > > > with the nackers. I will provide a summary below on the main points > > > > > raised and then provide responses: > > > > > > > > > > 1) "Use maps" > > > > > > > > > > It doesnt make sense for our requirement. The reason we are using TC > > > > > is because a) P4 has an excellent fit with TC match action paradigm b) > > > > > we are targeting both s/w and h/w and the TC model caters well for > > > > > this. The objects belong to TC, shared between s/w, h/w and control > > > > > plane (and netlink is the API). Maybe this diagram would help: > > > > > https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png > > > > > > > > > > While the s/w part stands on its own accord (as elaborated many > > > > > times), for TC which has offloads, the s/w twin is introduced before > > > > > the h/w equivalent. This is what this series is doing. > > > > > > > > > > 2) "but ... it is not performant" > > > > > This has been brought up in regards to netlink and kfuncs. Performance > > > > > is a lower priority to P4 correctness and expressibility. > > > > > Netlink provides us the abstractions we need, it works with TC for > > > > > both s/w and h/w offload and has a lot of knowledge base for > > > > > expressing control plane APIs. We dont believe reinventing all that > > > > > makes sense. > > > > > Kfuncs are a means to an end - they provide us the gluing we need to > > > > > have an ebpf s/w datapath to the TC objects. Getting an extra > > > > > 10-100Kpps is not a driving factor. > > > > > > > > > > 3) "but you did it wrong, here's how you do it..." > > > > > > > > > > I gave up on responding to this - but do note this sentiment is a big > > > > > theme in the exchanges and consumed most of the electrons. We are > > > > > _never_ going to get any consensus with statements like "tc actions > > > > > are a mistake" or "use tcx". > > > > > > > > > > 4) "... drop the kfunc patch" > > > > > > > > > > kfuncs essentially boil down to function calls. They don't require any > > > > > special handling by the eBPF verifier nor introduce new semantics to > > > > > eBPF. They are similar in nature to the already existing kfuncs > > > > > interacting with other kernel objects such as nf_conntrack. > > > > > The precedence (repeated in conferences and email threads multiple > > > > > times) is: kfuncs dont have to be sent to ebpf list or reviewed by > > > > > folks in the ebpf world. And We believe that rule applies to us as > > > > > well. Either kfuncs (and frankly ebpf) is infrastructure glue or it's > > > > > not. > > > > > > > > > > Now for a little rant: > > > > > > > > > > Open source is not a zero-sum game. Ebpf already coexists with > > > > > netfilter, tc, etc and various subsystems happily. > > > > > I hope our requirement is clear and i dont have to keep justifying why > > > > > P4 or relitigate over and over again why we need TC. Open source is > > > > > about scratching your itch and our itch is totally contained within > > > > > TC. I cant help but feel that this community is getting way too > > > > > pervasive with politics and obscure agendas. I understand agendas, I > > > > > just dont understand the zero-sum thinking. > > > > > My view is this series should still be applied with the nacks since it > > > > > sits entirely on its own silo within networking/TC (and has nothing to > > > > > do with ebpf). > > > > > > > > It's really hard for me - meaning I'll not do that - applying a series > > > > that has been so fiercely nacked, especially given that the other > > > > maintainers are not supporting it. > > > > > > > > I really understand this is very bad for you. > > > > > > > > Let me try to do an extreme attempt to find some middle ground between > > > > this series and the bpf folks. > > > > > > > > My understanding is that the most disliked item is the lifecycle for > > > > the objects allocated via the kfunc(s). > > > > > > > > If I understand correctly, the hard requirement on bpf side is that any > > > > kernel object allocated by kfunc must be released at program unload > > > > time. p4tc postpone such allocation to recycle the structure. > > > > > > > > While there are other arguments, my reading of the past few iterations > > > > is that solving the above node should lift the nack, am I correct? > > > > > > > > Could p4tc pre-allocate all the p4tc_table_entry_act_bpf_kern entries > > > > and let p4a_runt_create_bpf() fail if the pool is empty? would that > > > > satisfy the bpf requirement? > > > > > > Let me think about it and weigh the consequences. > > > > > > > Sorry, was busy evaluating. Yes, we can enforce the memory allocation > > constraints such that when the ebpf program is removed any entries > > added by said ebpf program can be removed from the datapath. > > I suggested the such changes based on my interpretation of this long > and complex discussion, I can have missed some or many relevant points. > @Alexei: could you please double check the above and eventually, > hopefully, confirm that such change would lift your nacked-by? No. The whole design is broken. Remembering what was allocated by kfunc and freeing it later is not fixing the design at all. Sorry.
On Fri, Apr 26, 2024 at 1:43 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Apr 26, 2024 at 10:21 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Fri, 2024-04-26 at 13:12 -0400, Jamal Hadi Salim wrote: > > > On Fri, Apr 19, 2024 at 2:01 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > > > On Fri, Apr 19, 2024 at 1:20 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > On Fri, 2024-04-19 at 08:08 -0400, Jamal Hadi Salim wrote: > > > > > > On Thu, Apr 11, 2024 at 12:24 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 10:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > > > > > > > On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote: > > > > > > > > > The only change that v16 makes is to add a nack to patch 14 on kfuncs > > > > > > > > > from Daniel and John. We strongly disagree with the nack; unfortunately I > > > > > > > > > have to rehash whats already in the cover letter and has been discussed over > > > > > > > > > and over and over again: > > > > > > > > > > > > > > > > I feel bad asking, but I have to, since all options I have here are > > > > > > > > IMHO quite sub-optimal. > > > > > > > > > > > > > > > > How bad would be dropping patch 14 and reworking the rest with > > > > > > > > alternative s/w datapath? (I guess restoring it from oldest revision of > > > > > > > > this series). > > > > > > > > > > > > > > > > > > > > > We want to keep using ebpf for the s/w datapath if that is not clear by now. > > > > > > > I do not understand the obstructionism tbh. Are users allowed to use > > > > > > > kfuncs as part of infra or not? My understanding is yes. > > > > > > > This community is getting too political and my worry is that we have > > > > > > > corporatism creeping in like it is in standards bodies. > > > > > > > We started by not using ebpf. The same people who are objecting now > > > > > > > went up in arms and insisted we use ebpf. As a member of this > > > > > > > community, my motivation was to meet them in the middle by > > > > > > > compromising. We invested another year to move to that middle ground. > > > > > > > Now they are insisting we do not use ebpf because they dont like our > > > > > > > design or how we are using ebpf or maybe it's not a use case they have > > > > > > > any need for or some other politics. I lost track of the moving goal > > > > > > > posts. Open source is about solving your itch. This code is entirely > > > > > > > on TC, zero code changed in ebpf core. The new goalpost is based on > > > > > > > emotional outrage over use of functions. The whole thing is getting > > > > > > > extremely toxic. > > > > > > > > > > > > > > > > > > > Paolo, > > > > > > Following up since no movement for a week now;-> > > > > > > I am going to give benefit of doubt that there was miscommunication or > > > > > > misunderstanding for all the back and forth that has happened so far > > > > > > with the nackers. I will provide a summary below on the main points > > > > > > raised and then provide responses: > > > > > > > > > > > > 1) "Use maps" > > > > > > > > > > > > It doesnt make sense for our requirement. The reason we are using TC > > > > > > is because a) P4 has an excellent fit with TC match action paradigm b) > > > > > > we are targeting both s/w and h/w and the TC model caters well for > > > > > > this. The objects belong to TC, shared between s/w, h/w and control > > > > > > plane (and netlink is the API). Maybe this diagram would help: > > > > > > https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png > > > > > > > > > > > > While the s/w part stands on its own accord (as elaborated many > > > > > > times), for TC which has offloads, the s/w twin is introduced before > > > > > > the h/w equivalent. This is what this series is doing. > > > > > > > > > > > > 2) "but ... it is not performant" > > > > > > This has been brought up in regards to netlink and kfuncs. Performance > > > > > > is a lower priority to P4 correctness and expressibility. > > > > > > Netlink provides us the abstractions we need, it works with TC for > > > > > > both s/w and h/w offload and has a lot of knowledge base for > > > > > > expressing control plane APIs. We dont believe reinventing all that > > > > > > makes sense. > > > > > > Kfuncs are a means to an end - they provide us the gluing we need to > > > > > > have an ebpf s/w datapath to the TC objects. Getting an extra > > > > > > 10-100Kpps is not a driving factor. > > > > > > > > > > > > 3) "but you did it wrong, here's how you do it..." > > > > > > > > > > > > I gave up on responding to this - but do note this sentiment is a big > > > > > > theme in the exchanges and consumed most of the electrons. We are > > > > > > _never_ going to get any consensus with statements like "tc actions > > > > > > are a mistake" or "use tcx". > > > > > > > > > > > > 4) "... drop the kfunc patch" > > > > > > > > > > > > kfuncs essentially boil down to function calls. They don't require any > > > > > > special handling by the eBPF verifier nor introduce new semantics to > > > > > > eBPF. They are similar in nature to the already existing kfuncs > > > > > > interacting with other kernel objects such as nf_conntrack. > > > > > > The precedence (repeated in conferences and email threads multiple > > > > > > times) is: kfuncs dont have to be sent to ebpf list or reviewed by > > > > > > folks in the ebpf world. And We believe that rule applies to us as > > > > > > well. Either kfuncs (and frankly ebpf) is infrastructure glue or it's > > > > > > not. > > > > > > > > > > > > Now for a little rant: > > > > > > > > > > > > Open source is not a zero-sum game. Ebpf already coexists with > > > > > > netfilter, tc, etc and various subsystems happily. > > > > > > I hope our requirement is clear and i dont have to keep justifying why > > > > > > P4 or relitigate over and over again why we need TC. Open source is > > > > > > about scratching your itch and our itch is totally contained within > > > > > > TC. I cant help but feel that this community is getting way too > > > > > > pervasive with politics and obscure agendas. I understand agendas, I > > > > > > just dont understand the zero-sum thinking. > > > > > > My view is this series should still be applied with the nacks since it > > > > > > sits entirely on its own silo within networking/TC (and has nothing to > > > > > > do with ebpf). > > > > > > > > > > It's really hard for me - meaning I'll not do that - applying a series > > > > > that has been so fiercely nacked, especially given that the other > > > > > maintainers are not supporting it. > > > > > > > > > > I really understand this is very bad for you. > > > > > > > > > > Let me try to do an extreme attempt to find some middle ground between > > > > > this series and the bpf folks. > > > > > > > > > > My understanding is that the most disliked item is the lifecycle for > > > > > the objects allocated via the kfunc(s). > > > > > > > > > > If I understand correctly, the hard requirement on bpf side is that any > > > > > kernel object allocated by kfunc must be released at program unload > > > > > time. p4tc postpone such allocation to recycle the structure. > > > > > > > > > > While there are other arguments, my reading of the past few iterations > > > > > is that solving the above node should lift the nack, am I correct? > > > > > > > > > > Could p4tc pre-allocate all the p4tc_table_entry_act_bpf_kern entries > > > > > and let p4a_runt_create_bpf() fail if the pool is empty? would that > > > > > satisfy the bpf requirement? > > > > > > > > Let me think about it and weigh the consequences. > > > > > > > > > > Sorry, was busy evaluating. Yes, we can enforce the memory allocation > > > constraints such that when the ebpf program is removed any entries > > > added by said ebpf program can be removed from the datapath. > > > > I suggested the such changes based on my interpretation of this long > > and complex discussion, I can have missed some or many relevant points. > > @Alexei: could you please double check the above and eventually, > > hopefully, confirm that such change would lift your nacked-by? > > No. The whole design is broken. > Remembering what was allocated by kfunc and freeing it later > is not fixing the design at all. Can you be a little less vague? We are dealing with multiple domains here _including hw offloads_ and as mentioned already, a few times now, for that reason these objects belong to the P4TC domain. If it wasnt clear this diagram explains the design: https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png IOW, P4 objects(to be specific table entries in this discussion) may be shared between s/w and/or h/w. Note: there is no allocation done by the kfunc - it will just pick from a fixed pool of pre-allocated entries. Where is the "design broken" considering all this? cheers, jamal
On Fri, Apr 26, 2024 at 2:03 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On Fri, Apr 26, 2024 at 1:43 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Fri, Apr 26, 2024 at 10:21 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > On Fri, 2024-04-26 at 13:12 -0400, Jamal Hadi Salim wrote: > > > > On Fri, Apr 19, 2024 at 2:01 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > > > > > On Fri, Apr 19, 2024 at 1:20 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > > > On Fri, 2024-04-19 at 08:08 -0400, Jamal Hadi Salim wrote: > > > > > > > On Thu, Apr 11, 2024 at 12:24 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 10:07 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > > > > > > > > > On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote: > > > > > > > > > > The only change that v16 makes is to add a nack to patch 14 on kfuncs > > > > > > > > > > from Daniel and John. We strongly disagree with the nack; unfortunately I > > > > > > > > > > have to rehash whats already in the cover letter and has been discussed over > > > > > > > > > > and over and over again: > > > > > > > > > > > > > > > > > > I feel bad asking, but I have to, since all options I have here are > > > > > > > > > IMHO quite sub-optimal. > > > > > > > > > > > > > > > > > > How bad would be dropping patch 14 and reworking the rest with > > > > > > > > > alternative s/w datapath? (I guess restoring it from oldest revision of > > > > > > > > > this series). > > > > > > > > > > > > > > > > > > > > > > > > We want to keep using ebpf for the s/w datapath if that is not clear by now. > > > > > > > > I do not understand the obstructionism tbh. Are users allowed to use > > > > > > > > kfuncs as part of infra or not? My understanding is yes. > > > > > > > > This community is getting too political and my worry is that we have > > > > > > > > corporatism creeping in like it is in standards bodies. > > > > > > > > We started by not using ebpf. The same people who are objecting now > > > > > > > > went up in arms and insisted we use ebpf. As a member of this > > > > > > > > community, my motivation was to meet them in the middle by > > > > > > > > compromising. We invested another year to move to that middle ground. > > > > > > > > Now they are insisting we do not use ebpf because they dont like our > > > > > > > > design or how we are using ebpf or maybe it's not a use case they have > > > > > > > > any need for or some other politics. I lost track of the moving goal > > > > > > > > posts. Open source is about solving your itch. This code is entirely > > > > > > > > on TC, zero code changed in ebpf core. The new goalpost is based on > > > > > > > > emotional outrage over use of functions. The whole thing is getting > > > > > > > > extremely toxic. > > > > > > > > > > > > > > > > > > > > > > Paolo, > > > > > > > Following up since no movement for a week now;-> > > > > > > > I am going to give benefit of doubt that there was miscommunication or > > > > > > > misunderstanding for all the back and forth that has happened so far > > > > > > > with the nackers. I will provide a summary below on the main points > > > > > > > raised and then provide responses: > > > > > > > > > > > > > > 1) "Use maps" > > > > > > > > > > > > > > It doesnt make sense for our requirement. The reason we are using TC > > > > > > > is because a) P4 has an excellent fit with TC match action paradigm b) > > > > > > > we are targeting both s/w and h/w and the TC model caters well for > > > > > > > this. The objects belong to TC, shared between s/w, h/w and control > > > > > > > plane (and netlink is the API). Maybe this diagram would help: > > > > > > > https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png > > > > > > > > > > > > > > While the s/w part stands on its own accord (as elaborated many > > > > > > > times), for TC which has offloads, the s/w twin is introduced before > > > > > > > the h/w equivalent. This is what this series is doing. > > > > > > > > > > > > > > 2) "but ... it is not performant" > > > > > > > This has been brought up in regards to netlink and kfuncs. Performance > > > > > > > is a lower priority to P4 correctness and expressibility. > > > > > > > Netlink provides us the abstractions we need, it works with TC for > > > > > > > both s/w and h/w offload and has a lot of knowledge base for > > > > > > > expressing control plane APIs. We dont believe reinventing all that > > > > > > > makes sense. > > > > > > > Kfuncs are a means to an end - they provide us the gluing we need to > > > > > > > have an ebpf s/w datapath to the TC objects. Getting an extra > > > > > > > 10-100Kpps is not a driving factor. > > > > > > > > > > > > > > 3) "but you did it wrong, here's how you do it..." > > > > > > > > > > > > > > I gave up on responding to this - but do note this sentiment is a big > > > > > > > theme in the exchanges and consumed most of the electrons. We are > > > > > > > _never_ going to get any consensus with statements like "tc actions > > > > > > > are a mistake" or "use tcx". > > > > > > > > > > > > > > 4) "... drop the kfunc patch" > > > > > > > > > > > > > > kfuncs essentially boil down to function calls. They don't require any > > > > > > > special handling by the eBPF verifier nor introduce new semantics to > > > > > > > eBPF. They are similar in nature to the already existing kfuncs > > > > > > > interacting with other kernel objects such as nf_conntrack. > > > > > > > The precedence (repeated in conferences and email threads multiple > > > > > > > times) is: kfuncs dont have to be sent to ebpf list or reviewed by > > > > > > > folks in the ebpf world. And We believe that rule applies to us as > > > > > > > well. Either kfuncs (and frankly ebpf) is infrastructure glue or it's > > > > > > > not. > > > > > > > > > > > > > > Now for a little rant: > > > > > > > > > > > > > > Open source is not a zero-sum game. Ebpf already coexists with > > > > > > > netfilter, tc, etc and various subsystems happily. > > > > > > > I hope our requirement is clear and i dont have to keep justifying why > > > > > > > P4 or relitigate over and over again why we need TC. Open source is > > > > > > > about scratching your itch and our itch is totally contained within > > > > > > > TC. I cant help but feel that this community is getting way too > > > > > > > pervasive with politics and obscure agendas. I understand agendas, I > > > > > > > just dont understand the zero-sum thinking. > > > > > > > My view is this series should still be applied with the nacks since it > > > > > > > sits entirely on its own silo within networking/TC (and has nothing to > > > > > > > do with ebpf). > > > > > > > > > > > > It's really hard for me - meaning I'll not do that - applying a series > > > > > > that has been so fiercely nacked, especially given that the other > > > > > > maintainers are not supporting it. > > > > > > > > > > > > I really understand this is very bad for you. > > > > > > > > > > > > Let me try to do an extreme attempt to find some middle ground between > > > > > > this series and the bpf folks. > > > > > > > > > > > > My understanding is that the most disliked item is the lifecycle for > > > > > > the objects allocated via the kfunc(s). > > > > > > > > > > > > If I understand correctly, the hard requirement on bpf side is that any > > > > > > kernel object allocated by kfunc must be released at program unload > > > > > > time. p4tc postpone such allocation to recycle the structure. > > > > > > > > > > > > While there are other arguments, my reading of the past few iterations > > > > > > is that solving the above node should lift the nack, am I correct? > > > > > > > > > > > > Could p4tc pre-allocate all the p4tc_table_entry_act_bpf_kern entries > > > > > > and let p4a_runt_create_bpf() fail if the pool is empty? would that > > > > > > satisfy the bpf requirement? > > > > > > > > > > Let me think about it and weigh the consequences. > > > > > > > > > > > > > Sorry, was busy evaluating. Yes, we can enforce the memory allocation > > > > constraints such that when the ebpf program is removed any entries > > > > added by said ebpf program can be removed from the datapath. > > > > > > I suggested the such changes based on my interpretation of this long > > > and complex discussion, I can have missed some or many relevant points. > > > @Alexei: could you please double check the above and eventually, > > > hopefully, confirm that such change would lift your nacked-by? > > > > No. The whole design is broken. > > Remembering what was allocated by kfunc and freeing it later > > is not fixing the design at all. > > Can you be a little less vague? > We are dealing with multiple domains here _including hw offloads_ and > as mentioned already, a few times now, for that reason these objects > belong to the P4TC domain. If it wasnt clear this diagram explains the > design: > https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png > IOW, P4 objects(to be specific table entries in this discussion) may > be shared between s/w and/or h/w. > Note: there is no allocation done by the kfunc - it will just pick > from a fixed pool of pre-allocated entries. Where is the "design > broken" considering all this? Ok, not that i was expecting an answer and i think i have waited long enough. Frankly my agreement to make the change and the time spent to validate were just an attempt to make an effort for a compromise (as we have done many many times) - but really that approach works against our requirements to control the aging/deletion/replacement policy. I dont believe there's any good faith from the nackers. For that reason that offer is off the table. It should be noted our changes that Alexei is objecting to is more tame than for example https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_bpf.c#L318 We didnt see Alexeis nack on that code. I am not asking to be given speacial treatment but it is clear we have a hole in the process currently. All i am asking for is fair treatment. At this point, given that Paolo says the patches cant be applied because of 3 cross-subsystem nacks, my suggestion on how we resolve this is to appoint a third person arbitrator. This person cannot be part of the TC or eBPF collective and has to be agreed to by both parties. Hopefully this will introduce some new set of rules that will help the maintainers resolve such issues should they surface in the future. I will collect all the other issues raised and my responses and create a web page so things dont get lost in the noise. I will post then and maybe send to a wider audience. cheers, jamal > cheers, > jamal
As stated a few times, we strongly disagree with the nature of the Nacks from Alexei, Daniel and John. We dont think there is good ground for the Nacks. A brief history on the P4TC patches: We posted V1 in January 2023. The main objection then was that we needed to use eBPF. After some discussion and investigation on our part we found that using kfuncs would satisfy our goals as well as the objections raised. We posted 28 RFC patches looking for feedback from eBPF and other folks with V2 in May 2023 - these patches were not ready but we were nevertheless soliciting for feedback. By Version 7 in October/2023 we removed the RFC tag (meaning we are asking for inclusion). In Version 8 we sent the first 15 patches as series 1(following netdev rules that allow only 15 patches); 5 of these patches are trivial tc core patches. Starting with V8 and upto V14 the releases were mostly suggested changes (much thanks to folks who made suggestions for technical changes) and at one point it was a bug fix for an issue caught by our syzkaller instance. When it seemed like Paolo was heading towards applying series 1 given the feedback, Alexei nacked patch 14 when we released V14, see: https://lore.kernel.org/bpf/20240404122338.372945-5-jhs@mojatatu.com/ V15 only change was adding Alexei's nack. V15 was followed by Daniel and then John also nacking the same patch 14. V16's only change was to add these extra Nacks. At that point(v16) i asked for the series to be applied despite the Nacks because, frankly, the Nacks have no merit. Paolo was not comfortable applying patches with Nacks and tried to mediate. In his mediation effort he asked if we could remove eBPF - and our answer was no because after all that time we have become dependent on it and frankly there was no technical reason not to use eBPF. Paolo then asked if we could satisfy one of the points Alexei raised in terms of clearing table entries when an eBPF program was unloaded. We spent a week investigating and came to a conclusion that we could do it as a compromise (even though it is not something fitting to our requirements and there is existing code that we copied from doing exactly what Alexei is objecting to). Alexei rejected this offer. This puts Paolo in a difficult position because it is clear there is no compromise to be had. I feel we are in uncharted teritory. Since we are in a quagmire, I am asking for a third party mediator to review the objections and validate if they have merit. I have created a web page to capture all the objections raised by the 3 gents over a period of time at: https://github.com/p4tc-dev/pushback-patches If any of the 3 people feel i have misrepresented their objections or missed an important detail please let me know and i will fix the page. cheers, jamal
Hi Jamal! On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > At that point(v16) i asked for the series to be applied despite the > Nacks because, frankly, the Nacks have no merit. Paolo was not > comfortable applying patches with Nacks and tried to mediate. In his > mediation effort he asked if we could remove eBPF - and our answer was > no because after all that time we have become dependent on it and > frankly there was no technical reason not to use eBPF. I'm not fully clear on who you're appealing to, and I may be missing some points. But maybe it will be more useful than hurtful if I clarify my point of view. AFAIU BPF folks disagree with the use of their subsystem, and they point out that P4 pipelines can be implemented using BPF in the first place. To which you reply that you like (a highly dated type of) a netlink interface, and (handwavey) ability to configure the data path SW or HW via the same interface. AFAICT there's some but not very strong support for P4TC, and it doesn't benefit or solve any problems of the broader networking stack (e.g. expressing or configuring parser graphs in general) So from my perspective, the submission is neither technically strong enough, nor broadly useful enough to consider making questionable precedents for, i.e. to override maintainers on how their subsystems are extended.
On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <kuba@kernel.org> wrote: > > Hi Jamal! > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > > At that point(v16) i asked for the series to be applied despite the > > Nacks because, frankly, the Nacks have no merit. Paolo was not > > comfortable applying patches with Nacks and tried to mediate. In his > > mediation effort he asked if we could remove eBPF - and our answer was > > no because after all that time we have become dependent on it and > > frankly there was no technical reason not to use eBPF. > > I'm not fully clear on who you're appealing to, and I may be missing > some points. But maybe it will be more useful than hurtful if I clarify > my point of view. > > AFAIU BPF folks disagree with the use of their subsystem, and they > point out that P4 pipelines can be implemented using BPF in the first > place. > To which you reply that you like (a highly dated type of) a netlink > interface, and (handwavey) ability to configure the data path SW or > HW via the same interface. It's not what I "like" , rather it is a requirement to support both s/w and h/w offload. The TC model is the traditional approach to deploy these models. I addressed the same comment you are making above in #1a and #1b (https://github.com/p4tc-dev/pushback-patches). OTOH, "BPF folks disagree with the use of their subsystem" is a problematic statement. Is BPF infra for the kernel community or is it something the ebpf folks can decide, at their whim, to allow who they like to use or not. We are not changing any BPF code. And there's already a case where the interfaces are used exactly as we used them in the conntrack code i pointed to in the page (we literally copied that code). Why is it ok for conntrack code to use exactly the same approach but not us? > AFAICT there's some but not very strong support for P4TC, I dont agree. Paolo asked this question and afaik Intel, AMD (both build P4-native NICs) and the folks interested in the MS DASH project responded saying they are in support. Look at who is being Cced. A lot of these folks who attend biweekly discussion calls on P4TC. Sample: https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/ > and it > doesn't benefit or solve any problems of the broader networking stack > (e.g. expressing or configuring parser graphs in general) > I am not sure where the parser thing comes from - the parser is generated as eBPF. > So from my perspective, the submission is neither technically strong > enough, nor broadly useful enough to consider making questionable precedents > for, i.e. to override maintainers on how their subsystems are extended. I believe as a community nobody should just have the power to nack things just because - as i stated in the page, not even Linus. That code doesnt touch anything to do with eBPF maintainers (meaning things they have to fix when an issue shows up) neither does it "extend" as you state any ebpf code and it is all part of the networking subsystem. Sure, anybody has the right to nack but I contend that nacks should be based on technical reasons. I have listed all the objections in that page and how i have responded to them over time. Someone needs to look at those objectively and say if they are valid. The arguement made so far(By Paolo and now by you) is "we cant override maintainers on how their subsystems are used" then we are in uncharted territory, thats why i am asking for arbitration. cheers, jamal
On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <kuba@kernel.org> wrote: >> AFAICT there's some but not very strong support for P4TC, On Wed, May 22, 2024 at 4:04 PM Jamal Hadi Salim <jhs@mojatatu.com > wrote: >I dont agree. Paolo asked this question and afaik Intel, AMD (both build P4-native NICs) and the folks interested in the MS DASH project >responded saying they are in support. Look at who is being Cced. A lot of these folks who attend biweekly discussion calls on P4TC. >Sample: >https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/ FWIW, Intel is in full support of P4TC as we have stated several times in the past.
Apologies for resending as plain text, the first try was HTML and got rejected by bots. > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > Hi Jamal! > > > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > > > At that point(v16) i asked for the series to be applied despite the > > > Nacks because, frankly, the Nacks have no merit. Paolo was not > > > comfortable applying patches with Nacks and tried to mediate. In his > > > mediation effort he asked if we could remove eBPF - and our answer was > > > no because after all that time we have become dependent on it and > > > frankly there was no technical reason not to use eBPF. > > > > I'm not fully clear on who you're appealing to, and I may be missing > > some points. But maybe it will be more useful than hurtful if I clarify > > my point of view. > > > > AFAIU BPF folks disagree with the use of their subsystem, and they > > point out that P4 pipelines can be implemented using BPF in the first > > place. > > To which you reply that you like (a highly dated type of) a netlink > > interface, and (handwavey) ability to configure the data path SW or > > HW via the same interface. > > It's not what I "like" , rather it is a requirement to support both > s/w and h/w offload. The TC model is the traditional approach to > deploy these models. I addressed the same comment you are making above > in #1a and #1b (https://urldefense.com/v3/__https://github.com/p4tc-dev/pushback-patches__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs0-w2zKP2A$). > > OTOH, "BPF folks disagree with the use of their subsystem" is a > problematic statement. Is BPF infra for the kernel community or is it > something the ebpf folks can decide, at their whim, to allow who they > like to use or not. We are not changing any BPF code. And there's > already a case where the interfaces are used exactly as we used them > in the conntrack code i pointed to in the page (we literally copied > that code). Why is it ok for conntrack code to use exactly the same > approach but not us? > > > AFAICT there's some but not very strong support for P4TC, > > I dont agree. Paolo asked this question and afaik Intel, AMD (both > build P4-native NICs) and the folks interested in the MS DASH project > responded saying they are in support. Look at who is being Cced. A lot > of these folks who attend biweekly discussion calls on P4TC. Sample: > https://urldefense.com/v3/__https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs09TFzoQBw$ > +1 > > and it > > doesn't benefit or solve any problems of the broader networking stack > > (e.g. expressing or configuring parser graphs in general) > > > Huh? As a DSL, P4 has already been proven to be an extremely effective and popular way to express parse graphs, stack manipulation, and stateful programming. Yesterday, I used the P4TC dev branch to implement something in one sitting, which includes parsing RoCEv2 network stacks. I just cut and pasted P4 code originally written for a P4 ASIC into a working P4TC example to add functionality. It took mere seconds to compile and launch it, and a few minutes to test it. I know of no other workflow which provides such quick turnaround and is so accessible. I'd like it to be as ubiquitous as eBPF itself. > I am not sure where the parser thing comes from - the parser is > generated as eBPF. > > > So from my perspective, the submission is neither technically strong > > enough, nor broadly useful enough to consider making questionable precedents > > for, i.e. to override maintainers on how their subsystems are extended. I disagree vehemently on the "broadly useful enough" comment. > > I believe as a community nobody should just have the power to nack > things just because - as i stated in the page, not even Linus. That > code doesnt touch anything to do with eBPF maintainers (meaning things > they have to fix when an issue shows up) neither does it "extend" as > you state any ebpf code and it is all part of the networking > subsystem. Sure, anybody has the right to nack but I contend that > nacks should be based on technical reasons. I have listed all the > objections in that page and how i have responded to them over time. > Someone needs to look at those objectively and say if they are valid. > The arguement made so far(By Paolo and now by you) is "we cant > override maintainers on how their subsystems are used" then we are in > uncharted territory, thats why i am asking for arbitration. > > cheers, > jamal Maintainers: I am perplexed and dismayed that this is getting so much pushback. None of the objections, regardless of their merits (or not) seem to outweigh the potential benefits to end-users. I am extremely interested in using P4TC, it adds a lot of value and reuses so much existing Linux infra. The custom extern model is compelling. The control plane CRUDXPS will tie nicely into P4Runtime and TDI. I have an application which needs to run purely in SW - no HW offload, so prior suggestions to wait for it to "approve" this is frustrating. I could use this yesterday. Furthermore, as an active contributor to sonic-dash, where we model the pipeline in P4, I can state that P4TC could be a compelling alternative to bmv2, which is slow, long in the tooth and lacks PNA support. I beseech the NACKers to take a deep breath, reevaluate any entrenched positions and consider how much goodness this will add, even if this is not your preference for implementing datapaths. It doesn't have to be. That can and should be decided by the larger community. This could open the door to thousands of creative developers who are comfortable in P4 but not adept in low-level networking code. P4 had a significant impact on democratizing network programming, and that was just on bmv2 and Tofino, which is EOL. Making performant and powerful P4TC ubiquitous on virtually any Linux server could have a similar effect, just like eBPF opened a lot of doors to non-kernel programmers to do interesting things. Be a part of that transformation!
On Wed, May 22, 2024 at 5:09 PM Chris Sommers <chris.sommers@keysight.com> wrote: > > > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > Hi Jamal! > > > > > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > > > > At that point(v16) i asked for the series to be applied despite the > > > > Nacks because, frankly, the Nacks have no merit. Paolo was not > > > > comfortable applying patches with Nacks and tried to mediate. In his > > > > mediation effort he asked if we could remove eBPF - and our answer was > > > > no because after all that time we have become dependent on it and > > > > frankly there was no technical reason not to use eBPF. > > > > > > I'm not fully clear on who you're appealing to, and I may be missing > > > some points. But maybe it will be more useful than hurtful if I clarify > > > my point of view. > > > > > > AFAIU BPF folks disagree with the use of their subsystem, and they > > > point out that P4 pipelines can be implemented using BPF in the first > > > place. > > > To which you reply that you like (a highly dated type of) a netlink > > > interface, and (handwavey) ability to configure the data path SW or > > > HW via the same interface. > > > > It's not what I "like" , rather it is a requirement to support both > > s/w and h/w offload. The TC model is the traditional approach to > > deploy these models. I addressed the same comment you are making above > > in #1a and #1b (https://urldefense.com/v3/__https://github.com/p4tc-dev/pushback-patches__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs0-w2zKP2A$). > > > > OTOH, "BPF folks disagree with the use of their subsystem" is a > > problematic statement. Is BPF infra for the kernel community or is it > > something the ebpf folks can decide, at their whim, to allow who they > > like to use or not. We are not changing any BPF code. And there's > > already a case where the interfaces are used exactly as we used them > > in the conntrack code i pointed to in the page (we literally copied > > that code). Why is it ok for conntrack code to use exactly the same > > approach but not us? > > > > > AFAICT there's some but not very strong support for P4TC, > > > > I dont agree. Paolo asked this question and afaik Intel, AMD (both > > build P4-native NICs) and the folks interested in the MS DASH project > > responded saying they are in support. Look at who is being Cced. A lot > > of these folks who attend biweekly discussion calls on P4TC. Sample: > > https://urldefense.com/v3/__https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs09TFzoQBw$ > > > +1 > > > and it > > > doesn't benefit or solve any problems of the broader networking stack > > > (e.g. expressing or configuring parser graphs in general) > > > > > > > Huh? As a DSL, P4 has already been proven to be an extremely effective and popular way to express parse graphs, stack manipulation, and stateful programming. Yesterday, I used the P4TC dev branch to implement something in one sitting, which includes parsing RoCEv2 network stacks. I just cut and pasted P4 code originally written for a P4 ASIC into a working P4TC example to add functionality. It took mere seconds to compile and launch it, and a few minutes to test it. I know of no other workflow which provides such quick turnaround and is so accessible. I'd like it to be as ubiquitous as eBPF itself. Chris, When you say "it took mere seconds to compile and launch" are you taking into account the ramp up time that it takes to learn P4 and become proficient to do something interesting? Considering that P4 syntax is very different from typical languages than networking programmers are typically familiar with, this ramp up time is non-zero. OTOH, eBPF is ubiquitous because it's primarily programmed in Restricted C-- this makes it easy for many programmers since they don't have to learn a completely new language and so the ramp up time for the average networking programmer is much less for using eBPF. This is really the fundamental problem with DSLs, they require specialized skill sets in a programming language for a narrow use case (and specialized compilers, tool chains, debugging, etc)-- this means a DSL only makes sense if there is no other means to accomplish the same effects using a commodity language with perhaps a specialized library (it's not just in the networking realm, consider the advantages of using CUDA-C instead of a DLS for GPUs). Personally, I don't believe that P4 has yet to be proven necessary for programming a datapath-- for instance we can program a parser in declarative representation in C, https://netdevconf.info/0x16/papers/11/High%20Performance%20Programmable%20Parsers.pdf. So unless P4 is proven necessary, then I'm doubtful it will ever be a ubiquitous way to program the kernel-- it seems much more likely that people will continue to use C and eBPF, and for those users that want to use P4 they can use P4->eBPF compiler. Tom > > > I am not sure where the parser thing comes from - the parser is > > generated as eBPF. > > > > > So from my perspective, the submission is neither technically strong > > > enough, nor broadly useful enough to consider making questionable precedents > > > for, i.e. to override maintainers on how their subsystems are extended. > I disagree vehemently on the "broadly useful enough" comment. > > > > I believe as a community nobody should just have the power to nack > > things just because - as i stated in the page, not even Linus. That > > code doesnt touch anything to do with eBPF maintainers (meaning things > > they have to fix when an issue shows up) neither does it "extend" as > > you state any ebpf code and it is all part of the networking > > subsystem. Sure, anybody has the right to nack but I contend that > > nacks should be based on technical reasons. I have listed all the > > objections in that page and how i have responded to them over time. > > Someone needs to look at those objectively and say if they are valid. > > The arguement made so far(By Paolo and now by you) is "we cant > > override maintainers on how their subsystems are used" then we are in > > uncharted territory, thats why i am asking for arbitration. > > > > cheers, > > jamal > Maintainers: I am perplexed and dismayed that this is getting so much pushback. None of the objections, regardless of their merits (or not) seem to outweigh the potential benefits to end-users. I am extremely interested in using P4TC, it adds a lot of value and reuses so much existing Linux infra. The custom extern model is compelling. The control plane CRUDXPS will tie nicely into P4Runtime and TDI. I have an application which needs to run purely in SW - no HW offload, so prior suggestions to wait for it to "approve" this is frustrating. I could use this yesterday. Furthermore, as an active contributor to sonic-dash, where we model the pipeline in P4, I can state that P4TC could be a compelling alternative to bmv2, which is slow, long in the tooth and lacks PNA support. > > I beseech the NACKers to take a deep breath, reevaluate any entrenched positions and consider how much goodness this will add, even if this is not your preference for implementing datapaths. It doesn't have to be. That can and should be decided by the larger community. This could open the door to thousands of creative developers who are comfortable in P4 but not adept in low-level networking code. P4 had a significant impact on democratizing network programming, and that was just on bmv2 and Tofino, which is EOL. Making performant and powerful P4TC ubiquitous on virtually any Linux server could have a similar effect, just like eBPF opened a lot of doors to non-kernel programmers to do interesting things. Be a part of that transformation!
On Wed, May 22, 2024 at 8:54 PM Tom Herbert <tom@sipanda.io> wrote: > > On Wed, May 22, 2024 at 5:09 PM Chris Sommers > <chris.sommers@keysight.com> wrote: > > > > > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > > > Hi Jamal! > > > > > > > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > > > > > At that point(v16) i asked for the series to be applied despite the > > > > > Nacks because, frankly, the Nacks have no merit. Paolo was not > > > > > comfortable applying patches with Nacks and tried to mediate. In his > > > > > mediation effort he asked if we could remove eBPF - and our answer was > > > > > no because after all that time we have become dependent on it and > > > > > frankly there was no technical reason not to use eBPF. > > > > > > > > I'm not fully clear on who you're appealing to, and I may be missing > > > > some points. But maybe it will be more useful than hurtful if I clarify > > > > my point of view. > > > > > > > > AFAIU BPF folks disagree with the use of their subsystem, and they > > > > point out that P4 pipelines can be implemented using BPF in the first > > > > place. > > > > To which you reply that you like (a highly dated type of) a netlink > > > > interface, and (handwavey) ability to configure the data path SW or > > > > HW via the same interface. > > > > > > It's not what I "like" , rather it is a requirement to support both > > > s/w and h/w offload. The TC model is the traditional approach to > > > deploy these models. I addressed the same comment you are making above > > > in #1a and #1b (https://urldefense.com/v3/__https://github.com/p4tc-dev/pushback-patches__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs0-w2zKP2A$). > > > > > > OTOH, "BPF folks disagree with the use of their subsystem" is a > > > problematic statement. Is BPF infra for the kernel community or is it > > > something the ebpf folks can decide, at their whim, to allow who they > > > like to use or not. We are not changing any BPF code. And there's > > > already a case where the interfaces are used exactly as we used them > > > in the conntrack code i pointed to in the page (we literally copied > > > that code). Why is it ok for conntrack code to use exactly the same > > > approach but not us? > > > > > > > AFAICT there's some but not very strong support for P4TC, > > > > > > I dont agree. Paolo asked this question and afaik Intel, AMD (both > > > build P4-native NICs) and the folks interested in the MS DASH project > > > responded saying they are in support. Look at who is being Cced. A lot > > > of these folks who attend biweekly discussion calls on P4TC. Sample: > > > https://urldefense.com/v3/__https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs09TFzoQBw$ > > > > > +1 > > > > and it > > > > doesn't benefit or solve any problems of the broader networking stack > > > > (e.g. expressing or configuring parser graphs in general) > > > > > > > > > > > Huh? As a DSL, P4 has already been proven to be an extremely effective and popular way to express parse graphs, stack manipulation, and stateful programming. Yesterday, I used the P4TC dev branch to implement something in one sitting, which includes parsing RoCEv2 network stacks. I just cut and pasted P4 code originally written for a P4 ASIC into a working P4TC example to add functionality. It took mere seconds to compile and launch it, and a few minutes to test it. I know of no other workflow which provides such quick turnaround and is so accessible. I'd like it to be as ubiquitous as eBPF itself. > > Chris, > > When you say "it took mere seconds to compile and launch" are you > taking into account the ramp up time that it takes to learn P4 and > become proficient to do something interesting? Considering that P4 > syntax is very different from typical languages than networking > programmers are typically familiar with, this ramp up time is > non-zero. OTOH, eBPF is ubiquitous because it's primarily programmed > in Restricted C-- this makes it easy for many programmers since they > don't have to learn a completely new language and so the ramp up time > for the average networking programmer is much less for using eBPF. > > This is really the fundamental problem with DSLs, they require > specialized skill sets in a programming language for a narrow use case > (and specialized compilers, tool chains, debugging, etc)-- this means > a DSL only makes sense if there is no other means to accomplish the > same effects using a commodity language with perhaps a specialized > library (it's not just in the networking realm, consider the > advantages of using CUDA-C instead of a DLS for GPUs). Personally, I > don't believe that P4 has yet to be proven necessary for programming a > datapath-- for instance we can program a parser in declarative > representation in C, > https://netdevconf.info/0x16/papers/11/High%20Performance%20Programmable%20Parsers.pdf. > > So unless P4 is proven necessary, then I'm doubtful it will ever be a > ubiquitous way to program the kernel-- it seems much more likely that > people will continue to use C and eBPF, and for those users that want > to use P4 they can use P4->eBPF compiler. > Tom, I cant stop the distraction of this thread becoming a discussion on the merits of DSL vs a lower level language (and I know you are not a P4 fan) but please change the subject so we dont loose the main focus which is a discussion on the patches. I have done it for you. Chris if you wish to respond please respond under the new thread subject. cheers, jamal cheers, jamal
> On Wed, May 22, 2024 at 8:54 PM Tom Herbert <mailto:tom@sipanda.io> wrote: > > > > On Wed, May 22, 2024 at 5:09 PM Chris Sommers > > <mailto:chris.sommers@keysight.com> wrote: > > > > > > > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <mailto:kuba@kernel.org> wrote: > > > > > > > > > > Hi Jamal! > > > > > > > > > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > > > > > > At that point(v16) i asked for the series to be applied despite the > > > > > > Nacks because, frankly, the Nacks have no merit. Paolo was not > > > > > > comfortable applying patches with Nacks and tried to mediate. In his > > > > > > mediation effort he asked if we could remove eBPF - and our answer was > > > > > > no because after all that time we have become dependent on it and > > > > > > frankly there was no technical reason not to use eBPF. > > > > > > > > > > I'm not fully clear on who you're appealing to, and I may be missing > > > > > some points. But maybe it will be more useful than hurtful if I clarify > > > > > my point of view. > > > > > > > > > > AFAIU BPF folks disagree with the use of their subsystem, and they > > > > > point out that P4 pipelines can be implemented using BPF in the first > > > > > place. > > > > > To which you reply that you like (a highly dated type of) a netlink > > > > > interface, and (handwavey) ability to configure the data path SW or > > > > > HW via the same interface. > > > > > > > > It's not what I "like" , rather it is a requirement to support both > > > > s/w and h/w offload. The TC model is the traditional approach to > > > > deploy these models. I addressed the same comment you are making above > > > > in #1a and #1b (https://urldefense.com/v3/__https://github.com/p4tc-dev/pushback-patches__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs0-w2zKP2A$). > >> > > > > > OTOH, "BPF folks disagree with the use of their subsystem" is a > > > > problematic statement. Is BPF infra for the kernel community or is it > > > > something the ebpf folks can decide, at their whim, to allow who they > > > > like to use or not. We are not changing any BPF code. And there's > > > > already a case where the interfaces are used exactly as we used them > > > > in the conntrack code i pointed to in the page (we literally copied > > > > that code). Why is it ok for conntrack code to use exactly the same > > > > approach but not us? > > > > > > > > > AFAICT there's some but not very strong support for P4TC, > > > > > > > > I dont agree. Paolo asked this question and afaik Intel, AMD (both > > > > build P4-native NICs) and the folks interested in the MS DASH project > > > > responded saying they are in support. Look at who is being Cced. A lot > > > > of these folks who attend biweekly discussion calls on P4TC. Sample: > > > > https://urldefense.com/v3/__https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs09TFzoQBw$ > >> > > > > +1 > > > > > and it > > > > > doesn't benefit or solve any problems of the broader networking stack > > > > > (e.g. expressing or configuring parser graphs in general) > > > > > > > > > > > > > > > Huh? As a DSL, P4 has already been proven to be an extremely effective and popular way to express parse graphs, stack manipulation, and stateful programming. Yesterday, I used the P4TC dev branch to implement something in one sitting, which includes parsing RoCEv2 network stacks. I just cut and pasted P4 code originally written for a P4 ASIC into a working P4TC example to add functionality. It took mere seconds to compile and launch it, and a few minutes to test it. I know of no other workflow which provides such quick turnaround and is so accessible. I'd like it to be as ubiquitous as eBPF itself. > > > > Chris, > > > > When you say "it took mere seconds to compile and launch" are you > > taking into account the ramp up time that it takes to learn P4 and > > become proficient to do something interesting? Hi Tom, thanks for the dialog. To answer your question, it took seconds to compile and deploy, not learn P4. Adding the parsing for several headers took minutes. If you want to compare learning curve, learning to write P4 code and let the framework handle all the painful low-level Linux details is way easier than trying to learn how to write c code for Linux networking. It’s not even close. I’ve written C for 40 years, P4 for 7 years, and dabbled in eBPF so I can attest to the ease of learning and using P4. I’ve onboarded and mentored engineers who barely knew C, to develop complex networking products using P4, and built the automation APIs (REST, gRPC) to manage them. One person can develop an entire commercial product by themselves in months. P4 has expanded the reach of programmers such that both HW and SW engineers can easily learn P4 and become pretty adept at it. I would not expect even experienced c programmers to be able to master Linux internals very quickly. Writing a P4-TC program and injecting it via tc was like magic the first time. >> Considering that P4 > > syntax is very different from typical languages than networking > > programmers are typically familiar with, this ramp up time is > > non-zero. OTOH, eBPF is ubiquitous because it's primarily programmed > > in Restricted C-- this makes it easy for many programmers since they > > don't have to learn a completely new language and so the ramp up time > > for the average networking programmer is much less for using eBPF. I think your statement about “typical network programmers” overlooks the fact that since P4 was introduced, it has been taught in many universities to teach networking and possibly enabled a whole new breed of “network engineers” who can solve real problems without even knowing C programming. Without P4 they might never have gone this route. A class in network stack programming using c would have so many prerequisites to even get to parsing, compared to P4, where it could be demonstrated in one lesson. These “networking programmers” are not typical by your standards, but there are many such. They have just as much claim to the title "network programmer” as a C programmer. Similarly, an assembly language programmer is no less than a C or Python programmer. People writing P4 are usually focused on applications, and it is very useful and productive for that. Why should someone have to learn low-level C or eBPF to solve their problem? > > > > This is really the fundamental problem with DSLs, they require > > specialized skill sets in a programming language for a narrow use case > > (and specialized compilers, tool chains, debugging, etc)-- this means > > a DSL only makes sense if there is no other means to accomplish the > > same effects using a commodity language with perhaps a specialized > > library (it's not just in the networking realm, consider the > > advantages of using CUDA-C instead of a DLS for GPUs). A pretty strong opinion, but DSLs arise to fill a need and P4 did so. It's still going strong. >> Personally, I > > don't believe that P4 has yet to be proven necessary for programming a > > datapath-- for instance we can program a parser in declarative > > representation in C, > > https://urldefense.com/v3/__https://netdevconf.info/0x16/papers/11/High*20Performance*20Programmable*20Parsers.pdf__;JSUl!!I5pVk4LIGAfnvw!m9zrSDvddfzSt_sMBjOEvqw31RzAwWlEDM4ah5IJ2kqsmq6XtPIVJd-1_ZoGWBXKLyda77RYLvGR83Ginw$. CPL (slide11) looks like a DSL wrapped in JSON to me. “Solution: Common Parser Language (CPL); Parser representation in declarative .json” So I am confused. It is either a new language a.k.a. DSL, or it's not. Nothing against it, I'm sure it is great, but let's call it what it is. We already have parser representations in declarative p4. And it's used and known worldwide. And has a respectable specification, any users and working groups. And it's formally provable (https://github.com/verified-network-toolchain/petr4) > > > > So unless P4 is proven necessary, then I'm doubtful it will ever be a > > ubiquitous way to program the kernel-- it seems much more likely that > > people will continue to use C and eBPF, and for those users that want > > to use P4 they can use P4->eBPF compiler. “ubiquitous way to program the kernel” – is not my goal. I don’t even want to know about the kernel when I am writing p4 - it's just a means to an end. I want to manipulate packets on a Linux host. P4DPDK, P4-eBPF, P4-TC – all let me do that. I LOVE the fact that P4-TC would be available in every Linux distro once upstreamed. It would solve so many deployment issues, benefit from regression testing, etc. So much goodness. " and for those users that want to use P4 they can use P4->eBPF compiler." -I'd really like to choose for myself and not have someone make that choice for me. P4-TC checks all the boxes for me. Thanks for the point of view, it's healthy to debate. Cheers, Chris > > > > Tom, > I cant stop the distraction of this thread becoming a discussion on > the merits of DSL vs a lower level language (and I know you are not a > P4 fan) but please change the subject so we dont loose the main focus > which is a discussion on the patches. I have done it for you. Chris if > you wish to respond please respond under the new thread subject. > > cheers, > jamal
On Wed, May 22, 2024 at 7:30 PM Chris Sommers <chris.sommers@keysight.com> wrote: > > > On Wed, May 22, 2024 at 8:54 PM Tom Herbert <mailto:tom@sipanda.io> wrote: > > > > > > On Wed, May 22, 2024 at 5:09 PM Chris Sommers > > > <mailto:chris.sommers@keysight.com> wrote: > > > > > > > > > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <mailto:kuba@kernel.org> wrote: > > > > > > > > > > > > Hi Jamal! > > > > > > > > > > > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > > > > > > > At that point(v16) i asked for the series to be applied despite the > > > > > > > Nacks because, frankly, the Nacks have no merit. Paolo was not > > > > > > > comfortable applying patches with Nacks and tried to mediate. In his > > > > > > > mediation effort he asked if we could remove eBPF - and our answer was > > > > > > > no because after all that time we have become dependent on it and > > > > > > > frankly there was no technical reason not to use eBPF. > > > > > > > > > > > > I'm not fully clear on who you're appealing to, and I may be missing > > > > > > some points. But maybe it will be more useful than hurtful if I clarify > > > > > > my point of view. > > > > > > > > > > > > AFAIU BPF folks disagree with the use of their subsystem, and they > > > > > > point out that P4 pipelines can be implemented using BPF in the first > > > > > > place. > > > > > > To which you reply that you like (a highly dated type of) a netlink > > > > > > interface, and (handwavey) ability to configure the data path SW or > > > > > > HW via the same interface. > > > > > > > > > > It's not what I "like" , rather it is a requirement to support both > > > > > s/w and h/w offload. The TC model is the traditional approach to > > > > > deploy these models. I addressed the same comment you are making above > > > > > in #1a and #1b (https://urldefense.com/v3/__https://github.com/p4tc-dev/pushback-patches__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs0-w2zKP2A$). > > >> > > > > > > OTOH, "BPF folks disagree with the use of their subsystem" is a > > > > > problematic statement. Is BPF infra for the kernel community or is it > > > > > something the ebpf folks can decide, at their whim, to allow who they > > > > > like to use or not. We are not changing any BPF code. And there's > > > > > already a case where the interfaces are used exactly as we used them > > > > > in the conntrack code i pointed to in the page (we literally copied > > > > > that code). Why is it ok for conntrack code to use exactly the same > > > > > approach but not us? > > > > > > > > > > > AFAICT there's some but not very strong support for P4TC, > > > > > > > > > > I dont agree. Paolo asked this question and afaik Intel, AMD (both > > > > > build P4-native NICs) and the folks interested in the MS DASH project > > > > > responded saying they are in support. Look at who is being Cced. A lot > > > > > of these folks who attend biweekly discussion calls on P4TC. Sample: > > > > > https://urldefense.com/v3/__https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs09TFzoQBw$ > > >> > > > > > +1 > > > > > > and it > > > > > > doesn't benefit or solve any problems of the broader networking stack > > > > > > (e.g. expressing or configuring parser graphs in general) > > > > > > > > > > > > > > > > > > > Huh? As a DSL, P4 has already been proven to be an extremely effective and popular way to express parse graphs, stack manipulation, and stateful programming. Yesterday, I used the P4TC dev branch to implement something in one sitting, which includes parsing RoCEv2 network stacks. I just cut and pasted P4 code originally written for a P4 ASIC into a working P4TC example to add functionality. It took mere seconds to compile and launch it, and a few minutes to test it. I know of no other workflow which provides such quick turnaround and is so accessible. I'd like it to be as ubiquitous as eBPF itself. > > > > > > Chris, > > > > > > When you say "it took mere seconds to compile and launch" are you > > > taking into account the ramp up time that it takes to learn P4 and > > > become proficient to do something interesting? > > Hi Tom, thanks for the dialog. To answer your question, it took seconds to compile and deploy, not learn P4. Adding the parsing for several headers took minutes. If you want to compare learning curve, learning to write P4 code and let the framework handle all the painful low-level Linux details is way easier than trying to learn how to write c code for Linux networking. It’s not even close. I’ve written C for 40 years, P4 for 7 years, and dabbled in eBPF so I can attest to the ease of learning and using P4. I’ve onboarded and mentored engineers who barely knew C, to develop complex networking products using P4, and built the automation APIs (REST, gRPC) to manage them. One person can develop an entire commercial product by themselves in months. P4 has expanded the reach of programmers such that both HW and SW engineers can easily learn P4 and become pretty adept at it. I would not expect even experienced c programmers to be able to master Linux internals very quickly. Writing a P4-TC program and injecting it via tc was like magic the first time. > > >> Considering that P4 > > > syntax is very different from typical languages than networking > > > programmers are typically familiar with, this ramp up time is > > > non-zero. OTOH, eBPF is ubiquitous because it's primarily programmed > > > in Restricted C-- this makes it easy for many programmers since they > > > don't have to learn a completely new language and so the ramp up time > > > for the average networking programmer is much less for using eBPF. > > I think your statement about “typical network programmers” overlooks the fact that since P4 was introduced, it has been taught in many universities to teach networking and possibly enabled a whole new breed of “network engineers” who can solve real problems without even knowing C programming. Without P4 they might never have gone this route. A class in network stack programming using c would have so many prerequisites to even get to parsing, compared to P4, where it could be demonstrated in one lesson. These “networking programmers” are not typical by your standards, but there are many such. They have just as much claim to the title "network programmer” as a C programmer. Similarly, an assembly language programmer is no less than a C or Python programmer. People writing P4 are usually focused on applications, and it is very useful and productive for that. Why should someone have to learn low-level C or eBPF to solve their problem? Hio Chris, You're comparing learning a completely new language versus programming in a subset of an established language, they're really not comparable. When one programs in Restricted-C they just need to understand what features of C are supported. > > > > > > > This is really the fundamental problem with DSLs, they require > > > specialized skill sets in a programming language for a narrow use case > > > (and specialized compilers, tool chains, debugging, etc)-- this means > > > a DSL only makes sense if there is no other means to accomplish the > > > same effects using a commodity language with perhaps a specialized > > > library (it's not just in the networking realm, consider the > > > advantages of using CUDA-C instead of a DLS for GPUs). > > A pretty strong opinion, but DSLs arise to fill a need and P4 did so. It's still going strong. > > >> Personally, I > > > don't believe that P4 has yet to be proven necessary for programming a > > > datapath-- for instance we can program a parser in declarative > > > representation in C, > > > https://urldefense.com/v3/__https://netdevconf.info/0x16/papers/11/High*20Performance*20Programmable*20Parsers.pdf__;JSUl!!I5pVk4LIGAfnvw!m9zrSDvddfzSt_sMBjOEvqw31RzAwWlEDM4ah5IJ2kqsmq6XtPIVJd-1_ZoGWBXKLyda77RYLvGR83Ginw$. > > CPL (slide11) looks like a DSL wrapped in JSON to me. “Solution: Common Parser Language (CPL); Parser representation in declarative .json” So I am confused. It is either a new language a.k.a. DSL, or it's not. Nothing against it, I'm sure it is great, but let's call it what it is. Correct, it's not a new language. We've since renamed it Common Parser Representation. > We already have parser representations in declarative p4. And it's used and known worldwide. And has a respectable specification, any users and working groups. And it's formally provable (https://github.com/verified-network-toolchain/petr4) > > > > > > > So unless P4 is proven necessary, then I'm doubtful it will ever be a > > > ubiquitous way to program the kernel-- it seems much more likely that > > > people will continue to use C and eBPF, and for those users that want > > > to use P4 they can use P4->eBPF compiler. > > “ubiquitous way to program the kernel” – is not my goal. I don’t even want to know about the kernel when I am writing p4 - it's just a means to an end. I want to manipulate packets on a Linux host. P4DPDK, P4-eBPF, P4-TC – all let me do that. I LOVE the fact that P4-TC would be available in every Linux distro once upstreamed. It would solve so many deployment issues, benefit from regression testing, etc. So much goodness > > " and for those users that want to use P4 they can use P4->eBPF compiler." -I'd really like to choose for myself and not have someone make that choice for me. P4-TC checks all the boxes for me. Sure, but this is a lot of kernel code and that will require support and maintenance. It needs to be justified, and the fact that someone wants it just to have a choice is, frankly, not much of a justification. I think a justification needs to start with "Why isn't P4->eBPF sufficient?" (the question has been raised several times, but it still doesn't seem like there's a strong answer). Tom > > Thanks for the point of view, it's healthy to debate. > Cheers, > Chris > > > > > > > > Tom, > > I cant stop the distraction of this thread becoming a discussion on > > the merits of DSL vs a lower level language (and I know you are not a > > P4 fan) but please change the subject so we dont loose the main focus > > which is a discussion on the patches. I have done it for you. Chris if > > you wish to respond please respond under the new thread subject. > > > > cheers, > > jamal >
Hi Chris, P4 was created to support programming the hardware data path in high end routers, but P4-TC would enable the use of P4 across all Linux devices. Since this is potentially a lot of code going into the kernel to support it, I believe it's entirely fair for us to evaluate and give feedback on the P4 language and its suitability for the broader user community including environments where there will never be a need for P4 hardware. Note that I am questioning the design decisions of P4 in the context of supporting a DSL in the kernel via P4-TC, if the P4->eBPF compiler is used then then these concerns are less pertinent. Nevertheless, I would suggest that the P4 folks take the points being raised as constructive feedback on the language. I took a cursory look at several P4 programs including tutorials, switch code, firewalls, etc. I have particular interest in variable length headers, so I'll use https://github.com/jafingerhut/p4-guide/blob/master/checksum/checksum-ipv4-with-options.p4 as a reference. The first thing I noticed about P4 is that almost everything is expressed as a bit field. Like bit<8> and bit<32>. I suppose this arises from the fact that P4 was originally intended to run in non-CPU hardware where there's no inherent unit of data like bytes. But, CPUs don't work that way; CPUs work ordinal types of bytes, half words, words, double words, etc. (__u8, __u16, __u32, __u64). That means that all mainstream computer languages fundamentally operate on ordinal types even if the variable types are explicitly declared. If someone programming in P4 needs to map original types to bit fields in P4, so if they want a __u32 they need to use a bit<32> in P4 (except they're not exactly equivalent, a __u32 in C is guaranteed to be byte aligned and I'm assuming in P4 bit<32> is not guaranteed to be byte aligned-- this seems like it might be susceptible to programming errors). I'd also point out that networking protocols are also defined using ordinal type fields, there are some exceptions, but for the most part protocol fields try to be in units of bytes (or octets if you want to be old school!). I believe life would be easier for the programmer if they could just define variables and fields with ordinal types, the fix here seems simple enough just add typedefs to P4 like "typedef __u32 bit<32>". In the IP header definition there's "varbit<320> options;". It took me several seconds to decode this and realize this is space for forty bytes of IP options (i.e. 8 * 40 == 320). I suppose this follows the design of using bit fields for everything, but I think this is more than just an annoyance like the bit fields for ordinal types are. First off, it's not very readable. I've never heard anyone say that there's 320 bits of IP options, or seen an RFC specify that. Likewise, the standard Ethernet MTU is 1500 bytes, not 12,000 bits which would seem to be how that would be expressed in P4. So this seems very unreadable to me and potentially prone to errors. The fix for this also seems easy, why not just add varbyte to P4 so we can do varbyte<40>, varbyte<87>, varbyte<123>, etc.? The next thing I notice about the P4 programs I surveyed is that all of them seem to define the protocol headers within the protocol. Every program seems to have "header ethernet_t" and "header ipv4_t" and other protocols that are used and protocol constants like Ethertypes also seem to be spelled out in each program. Sometimes these are in include files within the program. What I don't see is that P4 has a standard set of include files for defining protocol headers. For instance, in Linux C we would just do "#include <linux/if_ether.h>" and "#include <linux/ip.h>" to get the definitions of the Ethernet header and IPv4 header. In fact, if someone were to submit a patch to Netdev that included its own definition of Ethernet or an IP header structure they would almost certainly get pushback. It's a fundamental programming principle, not just in networking but pretty much everywhere, to not continuously redefine common and standard constructs-- just put common things in header files that can be shared by multiple programs (to do otherwise substantially increases the possibility of errors, bloats code, and reduces readability). Marshalling up common definitions into header files that are common in the P4 development environment seems simple enough (maybe it's already done?), but I would also point out that Linux has included files that describe protocol formats and header structures for almost every protocol under the sun that are well tested. It would be great if somehow we could somehow leverage that work. For instance, in the P4 samples I looked at srcAddr and dstAddr are defined for IP addresses, but in linux/ip.h their saddr and daddr are the respective field names. Why not just base the P4 definition on the Linux one? Then when someone is porting code from Linux to P4 they can use the same field names-- this makes things a lot easier on the programmer! I'll also mention that we wrote a little Python script to generate P4 header and constant definitions from Linux headers. It almost worked, the snag we hit was that P4 has some limits on nesting structures and unions so we couldn't translate some of the C structures to P4 (if you're interested I can provide the details on the problem we hit). The IPv4 header checksum code was a real head scratcher for me. Do we really need to state each field in the IP header just to compute the checksum? (and not just do this once, but twice :-( ). See code below for verifyChecksum and updateChecksum. In C, verifying and setting the IP header checksum is really easy: if (checksum(iphdr, 0, iphdr->ihl << 4)) goto bad_csum; ip->csum = checksum(iphdr, 0, iphdr->ihl << 4); Relative to the C code, the P4 code seems very convoluted to me and prone to errors. What if someone accidentally omits a field? What if fields become slightly out of order? Also, no one would ever describe the IPv4 checksum as taking the checksum over the IHL, diffserv, totalLen, ... That is *way* too complicated for an algorithm that is really simple-- from RFC791: "The checksum field is the 16 bit one's complement of the one's complement sum of all 16 bit words in the header.". Reverse engineering the design, the clue seems to be HashAlgorithm.csum16. Maybe in P4 the IP checksum is just considered another form of hash, and I suspect the input to hash computation is specified as sort of data structure to make things generic (for instance, how we create a substructure in flow keys in flow_dissector to compute a SipHash over the TCP and UDP tuple). But, the IPv4 checksum isn't just another hash-- on a host, we need to compute the checksum for *every* IPv4 packet. This has to be fast and simple, we can do this in as few as five instructions or less. So even if the code below is correct, I have to wonder how easy it is to emit an efficient executable. Would a compiler easily realize that all the fields in the pseudo structure are contiguous without holes such that it can omit those five instructions? I don't know how prevalent this method of listing all the fields in a data structure as arguments to a function is in P4, but, by almost any objective measure, I have to say that the code below is bad and bloated. Maybe there's a better way to do it in P4, but if there's not then this is a deficiency in the P4 language. Tom control verifyChecksum(inout headers hdr, inout metadata meta) { apply { // There is code similar to this in Github repo p4lang/p4c in // file testdata/p4_16_samples/flowlet_switching-bmv2.p4 // However in that file it is only for a fixed length IPv4 // header with no options. verify_checksum(true, { hdr.ipv4.version, hdr.ipv4.ihl, hdr.ipv4.diffserv, hdr.ipv4.totalLen, hdr.ipv4.identification, hdr.ipv4.flags, hdr.ipv4.fragOffset, hdr.ipv4.ttl, hdr.ipv4.protocol, hdr.ipv4.srcAddr, hdr.ipv4.dstAddr #ifdef ALLOW_IPV4_OPTIONS , hdr.ipv4.options #endif /* ALLOW_IPV4_OPTIONS */ }, hdr.ipv4.hdrChecksum, HashAlgorithm.csum16); } } control updateChecksum(inout headers hdr, inout metadata meta) { apply { update_checksum(true, { hdr.ipv4.version, hdr.ipv4.ihl, hdr.ipv4.diffserv, hdr.ipv4.totalLen, hdr.ipv4.identification, hdr.ipv4.flags, hdr.ipv4.fragOffset, hdr.ipv4.ttl, hdr.ipv4.protocol, hdr.ipv4.srcAddr, hdr.ipv4.dstAddr #ifdef ALLOW_IPV4_OPTIONS , hdr.ipv4.options #endif /* ALLOW_IPV4_OPTIONS */ }, hdr.ipv4.hdrChecksum, HashAlgorithm.csum16); } } On Wed, May 22, 2024 at 8:34 PM Tom Herbert <tom@sipanda.io> wrote: > > On Wed, May 22, 2024 at 7:30 PM Chris Sommers > <chris.sommers@keysight.com> wrote: > > > > > On Wed, May 22, 2024 at 8:54 PM Tom Herbert <mailto:tom@sipanda.io> wrote: > > > > > > > > On Wed, May 22, 2024 at 5:09 PM Chris Sommers > > > > <mailto:chris.sommers@keysight.com> wrote: > > > > > > > > > > > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <mailto:kuba@kernel.org> wrote: > > > > > > > > > > > > > > Hi Jamal! > > > > > > > > > > > > > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > > > > > > > > At that point(v16) i asked for the series to be applied despite the > > > > > > > > Nacks because, frankly, the Nacks have no merit. Paolo was not > > > > > > > > comfortable applying patches with Nacks and tried to mediate. In his > > > > > > > > mediation effort he asked if we could remove eBPF - and our answer was > > > > > > > > no because after all that time we have become dependent on it and > > > > > > > > frankly there was no technical reason not to use eBPF. > > > > > > > > > > > > > > I'm not fully clear on who you're appealing to, and I may be missing > > > > > > > some points. But maybe it will be more useful than hurtful if I clarify > > > > > > > my point of view. > > > > > > > > > > > > > > AFAIU BPF folks disagree with the use of their subsystem, and they > > > > > > > point out that P4 pipelines can be implemented using BPF in the first > > > > > > > place. > > > > > > > To which you reply that you like (a highly dated type of) a netlink > > > > > > > interface, and (handwavey) ability to configure the data path SW or > > > > > > > HW via the same interface. > > > > > > > > > > > > It's not what I "like" , rather it is a requirement to support both > > > > > > s/w and h/w offload. The TC model is the traditional approach to > > > > > > deploy these models. I addressed the same comment you are making above > > > > > > in #1a and #1b (https://urldefense.com/v3/__https://github.com/p4tc-dev/pushback-patches__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs0-w2zKP2A$). > > > >> > > > > > > > OTOH, "BPF folks disagree with the use of their subsystem" is a > > > > > > problematic statement. Is BPF infra for the kernel community or is it > > > > > > something the ebpf folks can decide, at their whim, to allow who they > > > > > > like to use or not. We are not changing any BPF code. And there's > > > > > > already a case where the interfaces are used exactly as we used them > > > > > > in the conntrack code i pointed to in the page (we literally copied > > > > > > that code). Why is it ok for conntrack code to use exactly the same > > > > > > approach but not us? > > > > > > > > > > > > > AFAICT there's some but not very strong support for P4TC, > > > > > > > > > > > > I dont agree. Paolo asked this question and afaik Intel, AMD (both > > > > > > build P4-native NICs) and the folks interested in the MS DASH project > > > > > > responded saying they are in support. Look at who is being Cced. A lot > > > > > > of these folks who attend biweekly discussion calls on P4TC. Sample: > > > > > > https://urldefense.com/v3/__https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs09TFzoQBw$ > > > >> > > > > > > +1 > > > > > > > and it > > > > > > > doesn't benefit or solve any problems of the broader networking stack > > > > > > > (e.g. expressing or configuring parser graphs in general) > > > > > > > > > > > > > > > > > > > > > > > Huh? As a DSL, P4 has already been proven to be an extremely effective and popular way to express parse graphs, stack manipulation, and stateful programming. Yesterday, I used the P4TC dev branch to implement something in one sitting, which includes parsing RoCEv2 network stacks. I just cut and pasted P4 code originally written for a P4 ASIC into a working P4TC example to add functionality. It took mere seconds to compile and launch it, and a few minutes to test it. I know of no other workflow which provides such quick turnaround and is so accessible. I'd like it to be as ubiquitous as eBPF itself. > > > > > > > > Chris, > > > > > > > > When you say "it took mere seconds to compile and launch" are you > > > > taking into account the ramp up time that it takes to learn P4 and > > > > become proficient to do something interesting? > > > > Hi Tom, thanks for the dialog. To answer your question, it took seconds to compile and deploy, not learn P4. Adding the parsing for several headers took minutes. If you want to compare learning curve, learning to write P4 code and let the framework handle all the painful low-level Linux details is way easier than trying to learn how to write c code for Linux networking. It’s not even close. I’ve written C for 40 years, P4 for 7 years, and dabbled in eBPF so I can attest to the ease of learning and using P4. I’ve onboarded and mentored engineers who barely knew C, to develop complex networking products using P4, and built the automation APIs (REST, gRPC) to manage them. One person can develop an entire commercial product by themselves in months. P4 has expanded the reach of programmers such that both HW and SW engineers can easily learn P4 and become pretty adept at it. I would not expect even experienced c programmers to be able to master Linux internals very quickly. Writing a P4-TC program and injecting it via tc was like magic the first time. > > > > >> Considering that P4 > > > > syntax is very different from typical languages than networking > > > > programmers are typically familiar with, this ramp up time is > > > > non-zero. OTOH, eBPF is ubiquitous because it's primarily programmed > > > > in Restricted C-- this makes it easy for many programmers since they > > > > don't have to learn a completely new language and so the ramp up time > > > > for the average networking programmer is much less for using eBPF. > > > > I think your statement about “typical network programmers” overlooks the fact that since P4 was introduced, it has been taught in many universities to teach networking and possibly enabled a whole new breed of “network engineers” who can solve real problems without even knowing C programming. Without P4 they might never have gone this route. A class in network stack programming using c would have so many prerequisites to even get to parsing, compared to P4, where it could be demonstrated in one lesson. These “networking programmers” are not typical by your standards, but there are many such. They have just as much claim to the title "network programmer” as a C programmer. Similarly, an assembly language programmer is no less than a C or Python programmer. People writing P4 are usually focused on applications, and it is very useful and productive for that. Why should someone have to learn low-level C or eBPF to solve their problem? > > Hio Chris, > > You're comparing learning a completely new language versus programming > in a subset of an established language, they're really not comparable. > When one programs in Restricted-C they just need to understand what > features of C are supported. > > > > > > > > > > > This is really the fundamental problem with DSLs, they require > > > > specialized skill sets in a programming language for a narrow use case > > > > (and specialized compilers, tool chains, debugging, etc)-- this means > > > > a DSL only makes sense if there is no other means to accomplish the > > > > same effects using a commodity language with perhaps a specialized > > > > library (it's not just in the networking realm, consider the > > > > advantages of using CUDA-C instead of a DLS for GPUs). > > > > A pretty strong opinion, but DSLs arise to fill a need and P4 did so. It's still going strong. > > > > >> Personally, I > > > > don't believe that P4 has yet to be proven necessary for programming a > > > > datapath-- for instance we can program a parser in declarative > > > > representation in C, > > > > https://urldefense.com/v3/__https://netdevconf.info/0x16/papers/11/High*20Performance*20Programmable*20Parsers.pdf__;JSUl!!I5pVk4LIGAfnvw!m9zrSDvddfzSt_sMBjOEvqw31RzAwWlEDM4ah5IJ2kqsmq6XtPIVJd-1_ZoGWBXKLyda77RYLvGR83Ginw$. > > > > CPL (slide11) looks like a DSL wrapped in JSON to me. “Solution: Common Parser Language (CPL); Parser representation in declarative .json” So I am confused. It is either a new language a.k.a. DSL, or it's not. Nothing against it, I'm sure it is great, but let's call it what it is. > > Correct, it's not a new language. We've since renamed it Common Parser > Representation. > > > We already have parser representations in declarative p4. And it's used and known worldwide. And has a respectable specification, any users and working groups. And it's formally provable (https://github.com/verified-network-toolchain/petr4) > > > > > > > > > > So unless P4 is proven necessary, then I'm doubtful it will ever be a > > > > ubiquitous way to program the kernel-- it seems much more likely that > > > > people will continue to use C and eBPF, and for those users that want > > > > to use P4 they can use P4->eBPF compiler. > > > > “ubiquitous way to program the kernel” – is not my goal. I don’t even want to know about the kernel when I am writing p4 - it's just a means to an end. I want to manipulate packets on a Linux host. P4DPDK, P4-eBPF, P4-TC – all let me do that. I LOVE the fact that P4-TC would be available in every Linux distro once upstreamed. It would solve so many deployment issues, benefit from regression testing, etc. So much goodness > > > > " and for those users that want to use P4 they can use P4->eBPF compiler." -I'd really like to choose for myself and not have someone make that choice for me. P4-TC checks all the boxes for me. > > Sure, but this is a lot of kernel code and that will require support > and maintenance. It needs to be justified, and the fact that someone > wants it just to have a choice is, frankly, not much of a > justification. I think a justification needs to start with "Why isn't > P4->eBPF sufficient?" (the question has been raised several times, but > it still doesn't seem like there's a strong answer). > > Tom > > > > Thanks for the point of view, it's healthy to debate. > > Cheers, > > Chris > > > > > > > > > > > > Tom, > > > I cant stop the distraction of this thread becoming a discussion on > > > the merits of DSL vs a lower level language (and I know you are not a > > > P4 fan) but please change the subject so we dont loose the main focus > > > which is a discussion on the patches. I have done it for you. Chris if > > > you wish to respond please respond under the new thread subject. > > > > > > cheers, > > > jamal > >
On Fri, May 24, 2024 at 12:50 PM Tom Herbert <tom@sipanda.io> wrote: > > Hi Chris, > > P4 was created to support programming the hardware data path in high > end routers, but P4-TC would enable the use of P4 across all Linux > devices. Since this is potentially a lot of code going into the kernel > to support it, I believe it's entirely fair for us to evaluate and > give feedback on the P4 language and its suitability for the broader > user community including environments where there will never be a need > for P4 hardware. Note that I am questioning the design decisions of P4 > in the context of supporting a DSL in the kernel via P4-TC, if the > P4->eBPF compiler is used then then these concerns are less pertinent. > Nevertheless, I would suggest that the P4 folks take the points being > raised as constructive feedback on the language. > A lot of misleading info there. The P4 PNA architecture is for end hosts not routers. For some NIC vendors you can go as far as writting hardware GRO or TSO offload or variations of your liking using P4 (cretainly not a middle feature). That notwithstanding the idea of offloading match-action via TC is not new and has been widely used/adopted for end hosts. Tom, you want to perhaps disclose that you have a competing product? That will help provide better context on your angle. TBH, I am confused by what your end game is - is your view that a crusade against P4 will make you sell more of your product? I have 3 NICs here with me (from 2 vendors) that are P4 programmable. You can be as negative as you want about P4 but you are not going to make it go away, sorry. I will let Chris or whoever else on Cc respond to the P4 bits if they wishe because there's misunderstanding there as well. cheers, jamal > I took a cursory look at several P4 programs including tutorials, > switch code, firewalls, etc. I have particular interest in variable > length headers, so I'll use > https://github.com/jafingerhut/p4-guide/blob/master/checksum/checksum-ipv4-with-options.p4 > as a reference. > > The first thing I noticed about P4 is that almost everything is > expressed as a bit field. Like bit<8> and bit<32>. I suppose this > arises from the fact that P4 was originally intended to run in non-CPU > hardware where there's no inherent unit of data like bytes. But, CPUs > don't work that way; CPUs work ordinal types of bytes, half words, > words, double words, etc. (__u8, __u16, __u32, __u64). That means that > all mainstream computer languages fundamentally operate on ordinal > types even if the variable types are explicitly declared. If someone > programming in P4 needs to map original types to bit fields in P4, so > if they want a __u32 they need to use a bit<32> in P4 (except they're > not exactly equivalent, a __u32 in C is guaranteed to be byte aligned > and I'm assuming in P4 bit<32> is not guaranteed to be byte aligned-- > this seems like it might be susceptible to programming errors). I'd > also point out that networking protocols are also defined using > ordinal type fields, there are some exceptions, but for the most part > protocol fields try to be in units of bytes (or octets if you want to > be old school!). I believe life would be easier for the programmer if > they could just define variables and fields with ordinal types, the > fix here seems simple enough just add typedefs to P4 like "typedef > __u32 bit<32>". > > In the IP header definition there's "varbit<320> options;". It took > me several seconds to decode this and realize this is space for forty > bytes of IP options (i.e. 8 * 40 == 320). I suppose this follows the > design of using bit fields for everything, but I think this is more > than just an annoyance like the bit fields for ordinal types are. > First off, it's not very readable. I've never heard anyone say that > there's 320 bits of IP options, or seen an RFC specify that. Likewise, > the standard Ethernet MTU is 1500 bytes, not 12,000 bits which would > seem to be how that would be expressed in P4. So this seems very > unreadable to me and potentially prone to errors. The fix for this > also seems easy, why not just add varbyte to P4 so we can do > varbyte<40>, varbyte<87>, varbyte<123>, etc.? > > The next thing I notice about the P4 programs I surveyed is that all > of them seem to define the protocol headers within the protocol. Every > program seems to have "header ethernet_t" and "header ipv4_t" and > other protocols that are used and protocol constants like Ethertypes > also seem to be spelled out in each program. Sometimes these are in > include files within the program. What I don't see is that P4 has a > standard set of include files for defining protocol headers. For > instance, in Linux C we would just do "#include <linux/if_ether.h>" > and "#include <linux/ip.h>" to get the definitions of the Ethernet > header and IPv4 header. In fact, if someone were to submit a patch to > Netdev that included its own definition of Ethernet or an IP header > structure they would almost certainly get pushback. It's a fundamental > programming principle, not just in networking but pretty much > everywhere, to not continuously redefine common and standard > constructs-- just put common things in header files that can be shared > by multiple programs (to do otherwise substantially increases the > possibility of errors, bloats code, and reduces readability). > > Marshalling up common definitions into header files that are common in > the P4 development environment seems simple enough (maybe it's already > done?), but I would also point out that Linux has included files that > describe protocol formats and header structures for almost every > protocol under the sun that are well tested. It would be great if > somehow we could somehow leverage that work. For instance, in the P4 > samples I looked at srcAddr and dstAddr are defined for IP addresses, > but in linux/ip.h their saddr and daddr are the respective field > names. Why not just base the P4 definition on the Linux one? Then when > someone is porting code from Linux to P4 they can use the same field > names-- this makes things a lot easier on the programmer! I'll also > mention that we wrote a little Python script to generate P4 header and > constant definitions from Linux headers. It almost worked, the snag we > hit was that P4 has some limits on nesting structures and unions so we > couldn't translate some of the C structures to P4 (if you're > interested I can provide the details on the problem we hit). > > The IPv4 header checksum code was a real head scratcher for me. Do we > really need to state each field in the IP header just to compute the > checksum? (and not just do this once, but twice :-( ). See code below > for verifyChecksum and updateChecksum. > > In C, verifying and setting the IP header checksum is really easy: > > if (checksum(iphdr, 0, iphdr->ihl << 4)) > goto bad_csum; > > ip->csum = checksum(iphdr, 0, iphdr->ihl << 4); > > Relative to the C code, the P4 code seems very convoluted to me and > prone to errors. What if someone accidentally omits a field? What if > fields become slightly out of order? Also, no one would ever describe > the IPv4 checksum as taking the checksum over the IHL, diffserv, > totalLen, ... That is *way* too complicated for an algorithm that is > really simple-- from RFC791: "The checksum field is the 16 bit one's > complement of the one's complement sum of all 16 bit words in the > header.". Reverse engineering the design, the clue seems to be > HashAlgorithm.csum16. Maybe in P4 the IP checksum is just considered > another form of hash, and I suspect the input to hash computation is > specified as sort of data structure to make things generic (for > instance, how we create a substructure in flow keys in flow_dissector > to compute a SipHash over the TCP and UDP tuple). But, the IPv4 > checksum isn't just another hash-- on a host, we need to compute the > checksum for *every* IPv4 packet. This has to be fast and simple, we > can do this in as few as five instructions or less. So even if the > code below is correct, I have to wonder how easy it is to emit an > efficient executable. Would a compiler easily realize that all the > fields in the pseudo structure are contiguous without holes such that > it can omit those five instructions? > > I don't know how prevalent this method of listing all the fields in a > data structure as arguments to a function is in P4, but, by almost any > objective measure, I have to say that the code below is bad and > bloated. Maybe there's a better way to do it in P4, but if there's not > then this is a deficiency in the P4 language. > > Tom > > control verifyChecksum(inout headers hdr, > inout metadata meta) > { > apply { > // There is code similar to this in Github repo p4lang/p4c in > // file testdata/p4_16_samples/flowlet_switching-bmv2.p4 > // However in that file it is only for a fixed length IPv4 > // header with no options. > verify_checksum(true, > { hdr.ipv4.version, > hdr.ipv4.ihl, > hdr.ipv4.diffserv, > hdr.ipv4.totalLen, > hdr.ipv4.identification, > hdr.ipv4.flags, > hdr.ipv4.fragOffset, > hdr.ipv4.ttl, > hdr.ipv4.protocol, > hdr.ipv4.srcAddr, > hdr.ipv4.dstAddr > #ifdef ALLOW_IPV4_OPTIONS > , hdr.ipv4.options > #endif /* ALLOW_IPV4_OPTIONS */ > }, > hdr.ipv4.hdrChecksum, HashAlgorithm.csum16); > } > } > > control updateChecksum(inout headers hdr, > inout metadata meta) > { > apply { > update_checksum(true, > { hdr.ipv4.version, > hdr.ipv4.ihl, > hdr.ipv4.diffserv, > hdr.ipv4.totalLen, > hdr.ipv4.identification, > hdr.ipv4.flags, > hdr.ipv4.fragOffset, > hdr.ipv4.ttl, > hdr.ipv4.protocol, > hdr.ipv4.srcAddr, > hdr.ipv4.dstAddr > #ifdef ALLOW_IPV4_OPTIONS > , hdr.ipv4.options > #endif /* ALLOW_IPV4_OPTIONS */ > }, > hdr.ipv4.hdrChecksum, HashAlgorithm.csum16); > } > } > > On Wed, May 22, 2024 at 8:34 PM Tom Herbert <tom@sipanda.io> wrote: > > > > On Wed, May 22, 2024 at 7:30 PM Chris Sommers > > <chris.sommers@keysight.com> wrote: > > > > > > > On Wed, May 22, 2024 at 8:54 PM Tom Herbert <mailto:tom@sipanda.io> wrote: > > > > > > > > > > On Wed, May 22, 2024 at 5:09 PM Chris Sommers > > > > > <mailto:chris.sommers@keysight.com> wrote: > > > > > > > > > > > > > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <mailto:kuba@kernel.org> wrote: > > > > > > > > > > > > > > > > Hi Jamal! > > > > > > > > > > > > > > > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > > > > > > > > > At that point(v16) i asked for the series to be applied despite the > > > > > > > > > Nacks because, frankly, the Nacks have no merit. Paolo was not > > > > > > > > > comfortable applying patches with Nacks and tried to mediate. In his > > > > > > > > > mediation effort he asked if we could remove eBPF - and our answer was > > > > > > > > > no because after all that time we have become dependent on it and > > > > > > > > > frankly there was no technical reason not to use eBPF. > > > > > > > > > > > > > > > > I'm not fully clear on who you're appealing to, and I may be missing > > > > > > > > some points. But maybe it will be more useful than hurtful if I clarify > > > > > > > > my point of view. > > > > > > > > > > > > > > > > AFAIU BPF folks disagree with the use of their subsystem, and they > > > > > > > > point out that P4 pipelines can be implemented using BPF in the first > > > > > > > > place. > > > > > > > > To which you reply that you like (a highly dated type of) a netlink > > > > > > > > interface, and (handwavey) ability to configure the data path SW or > > > > > > > > HW via the same interface. > > > > > > > > > > > > > > It's not what I "like" , rather it is a requirement to support both > > > > > > > s/w and h/w offload. The TC model is the traditional approach to > > > > > > > deploy these models. I addressed the same comment you are making above > > > > > > > in #1a and #1b (https://urldefense.com/v3/__https://github.com/p4tc-dev/pushback-patches__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs0-w2zKP2A$). > > > > >> > > > > > > > > OTOH, "BPF folks disagree with the use of their subsystem" is a > > > > > > > problematic statement. Is BPF infra for the kernel community or is it > > > > > > > something the ebpf folks can decide, at their whim, to allow who they > > > > > > > like to use or not. We are not changing any BPF code. And there's > > > > > > > already a case where the interfaces are used exactly as we used them > > > > > > > in the conntrack code i pointed to in the page (we literally copied > > > > > > > that code). Why is it ok for conntrack code to use exactly the same > > > > > > > approach but not us? > > > > > > > > > > > > > > > AFAICT there's some but not very strong support for P4TC, > > > > > > > > > > > > > > I dont agree. Paolo asked this question and afaik Intel, AMD (both > > > > > > > build P4-native NICs) and the folks interested in the MS DASH project > > > > > > > responded saying they are in support. Look at who is being Cced. A lot > > > > > > > of these folks who attend biweekly discussion calls on P4TC. Sample: > > > > > > > https://urldefense.com/v3/__https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs09TFzoQBw$ > > > > >> > > > > > > > +1 > > > > > > > > and it > > > > > > > > doesn't benefit or solve any problems of the broader networking stack > > > > > > > > (e.g. expressing or configuring parser graphs in general) > > > > > > > > > > > > > > > > > > > > > > > > > > > Huh? As a DSL, P4 has already been proven to be an extremely effective and popular way to express parse graphs, stack manipulation, and stateful programming. Yesterday, I used the P4TC dev branch to implement something in one sitting, which includes parsing RoCEv2 network stacks. I just cut and pasted P4 code originally written for a P4 ASIC into a working P4TC example to add functionality. It took mere seconds to compile and launch it, and a few minutes to test it. I know of no other workflow which provides such quick turnaround and is so accessible. I'd like it to be as ubiquitous as eBPF itself. > > > > > > > > > > Chris, > > > > > > > > > > When you say "it took mere seconds to compile and launch" are you > > > > > taking into account the ramp up time that it takes to learn P4 and > > > > > become proficient to do something interesting? > > > > > > Hi Tom, thanks for the dialog. To answer your question, it took seconds to compile and deploy, not learn P4. Adding the parsing for several headers took minutes. If you want to compare learning curve, learning to write P4 code and let the framework handle all the painful low-level Linux details is way easier than trying to learn how to write c code for Linux networking. It’s not even close. I’ve written C for 40 years, P4 for 7 years, and dabbled in eBPF so I can attest to the ease of learning and using P4. I’ve onboarded and mentored engineers who barely knew C, to develop complex networking products using P4, and built the automation APIs (REST, gRPC) to manage them. One person can develop an entire commercial product by themselves in months. P4 has expanded the reach of programmers such that both HW and SW engineers can easily learn P4 and become pretty adept at it. I would not expect even experienced c programmers to be able to master Linux internals very quickly. Writing a P4-TC program and injecting it via tc was like magic the first time. > > > > > > >> Considering that P4 > > > > > syntax is very different from typical languages than networking > > > > > programmers are typically familiar with, this ramp up time is > > > > > non-zero. OTOH, eBPF is ubiquitous because it's primarily programmed > > > > > in Restricted C-- this makes it easy for many programmers since they > > > > > don't have to learn a completely new language and so the ramp up time > > > > > for the average networking programmer is much less for using eBPF. > > > > > > I think your statement about “typical network programmers” overlooks the fact that since P4 was introduced, it has been taught in many universities to teach networking and possibly enabled a whole new breed of “network engineers” who can solve real problems without even knowing C programming. Without P4 they might never have gone this route. A class in network stack programming using c would have so many prerequisites to even get to parsing, compared to P4, where it could be demonstrated in one lesson. These “networking programmers” are not typical by your standards, but there are many such. They have just as much claim to the title "network programmer” as a C programmer. Similarly, an assembly language programmer is no less than a C or Python programmer. People writing P4 are usually focused on applications, and it is very useful and productive for that. Why should someone have to learn low-level C or eBPF to solve their problem? > > > > Hio Chris, > > > > You're comparing learning a completely new language versus programming > > in a subset of an established language, they're really not comparable. > > When one programs in Restricted-C they just need to understand what > > features of C are supported. > > > > > > > > > > > > > > > This is really the fundamental problem with DSLs, they require > > > > > specialized skill sets in a programming language for a narrow use case > > > > > (and specialized compilers, tool chains, debugging, etc)-- this means > > > > > a DSL only makes sense if there is no other means to accomplish the > > > > > same effects using a commodity language with perhaps a specialized > > > > > library (it's not just in the networking realm, consider the > > > > > advantages of using CUDA-C instead of a DLS for GPUs). > > > > > > A pretty strong opinion, but DSLs arise to fill a need and P4 did so. It's still going strong. > > > > > > >> Personally, I > > > > > don't believe that P4 has yet to be proven necessary for programming a > > > > > datapath-- for instance we can program a parser in declarative > > > > > representation in C, > > > > > https://urldefense.com/v3/__https://netdevconf.info/0x16/papers/11/High*20Performance*20Programmable*20Parsers.pdf__;JSUl!!I5pVk4LIGAfnvw!m9zrSDvddfzSt_sMBjOEvqw31RzAwWlEDM4ah5IJ2kqsmq6XtPIVJd-1_ZoGWBXKLyda77RYLvGR83Ginw$. > > > > > > CPL (slide11) looks like a DSL wrapped in JSON to me. “Solution: Common Parser Language (CPL); Parser representation in declarative .json” So I am confused. It is either a new language a.k.a. DSL, or it's not. Nothing against it, I'm sure it is great, but let's call it what it is. > > > > Correct, it's not a new language. We've since renamed it Common Parser > > Representation. > > > > > We already have parser representations in declarative p4. And it's used and known worldwide. And has a respectable specification, any users and working groups. And it's formally provable (https://github.com/verified-network-toolchain/petr4) > > > > > > > > > > > > > So unless P4 is proven necessary, then I'm doubtful it will ever be a > > > > > ubiquitous way to program the kernel-- it seems much more likely that > > > > > people will continue to use C and eBPF, and for those users that want > > > > > to use P4 they can use P4->eBPF compiler. > > > > > > “ubiquitous way to program the kernel” – is not my goal. I don’t even want to know about the kernel when I am writing p4 - it's just a means to an end. I want to manipulate packets on a Linux host. P4DPDK, P4-eBPF, P4-TC – all let me do that. I LOVE the fact that P4-TC would be available in every Linux distro once upstreamed. It would solve so many deployment issues, benefit from regression testing, etc. So much goodness > > > > > > " and for those users that want to use P4 they can use P4->eBPF compiler." -I'd really like to choose for myself and not have someone make that choice for me. P4-TC checks all the boxes for me. > > > > Sure, but this is a lot of kernel code and that will require support > > and maintenance. It needs to be justified, and the fact that someone > > wants it just to have a choice is, frankly, not much of a > > justification. I think a justification needs to start with "Why isn't > > P4->eBPF sufficient?" (the question has been raised several times, but > > it still doesn't seem like there's a strong answer). > > > > Tom > > > > > > Thanks for the point of view, it's healthy to debate. > > > Cheers, > > > Chris > > > > > > > > > > > > > > > > Tom, > > > > I cant stop the distraction of this thread becoming a discussion on > > > > the merits of DSL vs a lower level language (and I know you are not a > > > > P4 fan) but please change the subject so we dont loose the main focus > > > > which is a discussion on the patches. I have done it for you. Chris if > > > > you wish to respond please respond under the new thread subject. > > > > > > > > cheers, > > > > jamal > > >
Oops, resending as plaintext. Sigh... - > On Fri, May 24, 2024 at 12:50 PM Tom Herbert <tom@sipanda.io> wrote: > > > > Hi Chris, > > > > P4 was created to support programming the hardware data path in high > > end routers, but P4-TC would enable the use of P4 across all Linux > > devices. Since this is potentially a lot of code going into the kernel > > to support it, I believe it's entirely fair for us to evaluate and > > give feedback on the P4 language and its suitability for the broader > > user community including environments where there will never be a need > > for P4 hardware. Note that I am questioning the design decisions of P4 > > in the context of supporting a DSL in the kernel via P4-TC, if the > > P4->eBPF compiler is used then then these concerns are less pertinent. > > Nevertheless, I would suggest that the P4 folks take the points being > > raised as constructive feedback on the language. > > Hi Tom, RE: Your observations and feedback on P4 language and prevalent coding practices, the most constructive approach would be to attend P4 working group meetings where your opinions and ideas will be respectfully considered and your offer to help gratefully accepted. You can also file issues or pull requests on GitHub. The Language and Architecture working groups would probably be the best places to participate. We are an open-minded and welcoming group of volunteers from industry and academia who are always looking for new members. It sounds like you have lots of relevant experience and a different point of view which could add hybrid vigor. Chris Sommers Distinguished SW Engineer > > A lot of misleading info there. The P4 PNA architecture is for end > hosts not routers. For some NIC vendors you can go as far as writting > hardware GRO or TSO offload or variations of your liking using P4 > (cretainly not a middle feature). That notwithstanding the idea of > offloading match-action via TC is not new and has been widely > used/adopted for end hosts. > > Tom, you want to perhaps disclose that you have a competing product? > That will help provide better context on your angle. > TBH, I am confused by what your end game is - is your view that a > crusade against P4 will make you sell more of your product? I have 3 > NICs here with me (from 2 vendors) that are P4 programmable. You can > be as negative as you want about P4 but you are not going to make it > go away, sorry. > > I will let Chris or whoever else on Cc respond to the P4 bits if they > wishe because there's misunderstanding there as well. > > cheers, > jamal > > > > I took a cursory look at several P4 programs including tutorials, > > switch code, firewalls, etc. I have particular interest in variable > > length headers, so I'll use > > https://urldefense.com/v3/__https://github.com/jafingerhut/p4-guide/blob/master/checksum/checksum-ipv4-with-options.p4__;!!I5pVk4LIGAfnvw!juhSwk9UTheuI8-0mudbGTSZ_GBx3Z6hmcOAgiaAW14Ecter6K8iJ8DSzakN1d4GCE4uFJ05wkE81N6KNw$ > > as a reference. > > > > The first thing I noticed about P4 is that almost everything is > > expressed as a bit field. Like bit<8> and bit<32>. I suppose this > > arises from the fact that P4 was originally intended to run in non-CPU > > hardware where there's no inherent unit of data like bytes. But, CPUs > > don't work that way; CPUs work ordinal types of bytes, half words, > > words, double words, etc. (__u8, __u16, __u32, __u64). That means that > > all mainstream computer languages fundamentally operate on ordinal > > types even if the variable types are explicitly declared. If someone > > programming in P4 needs to map original types to bit fields in P4, so > > if they want a __u32 they need to use a bit<32> in P4 (except they're > > not exactly equivalent, a __u32 in C is guaranteed to be byte aligned > > and I'm assuming in P4 bit<32> is not guaranteed to be byte aligned-- > > this seems like it might be susceptible to programming errors). I'd > > also point out that networking protocols are also defined using > > ordinal type fields, there are some exceptions, but for the most part > > protocol fields try to be in units of bytes (or octets if you want to > > be old school!). I believe life would be easier for the programmer if > > they could just define variables and fields with ordinal types, the > > fix here seems simple enough just add typedefs to P4 like "typedef > > __u32 bit<32>". > > > > In the IP header definition there's "varbit<320> options;". It took > > me several seconds to decode this and realize this is space for forty > > bytes of IP options (i.e. 8 * 40 == 320). I suppose this follows the > > design of using bit fields for everything, but I think this is more > > than just an annoyance like the bit fields for ordinal types are. > > First off, it's not very readable. I've never heard anyone say that > > there's 320 bits of IP options, or seen an RFC specify that. Likewise, > > the standard Ethernet MTU is 1500 bytes, not 12,000 bits which would > > seem to be how that would be expressed in P4. So this seems very > > unreadable to me and potentially prone to errors. The fix for this > > also seems easy, why not just add varbyte to P4 so we can do > > varbyte<40>, varbyte<87>, varbyte<123>, etc.? > > > > The next thing I notice about the P4 programs I surveyed is that all > > of them seem to define the protocol headers within the protocol. Every > > program seems to have "header ethernet_t" and "header ipv4_t" and > > other protocols that are used and protocol constants like Ethertypes > > also seem to be spelled out in each program. Sometimes these are in > > include files within the program. What I don't see is that P4 has a > > standard set of include files for defining protocol headers. For > > instance, in Linux C we would just do "#include <linux/if_ether.h>" > > and "#include <linux/ip.h>" to get the definitions of the Ethernet > > header and IPv4 header. In fact, if someone were to submit a patch to > > Netdev that included its own definition of Ethernet or an IP header > > structure they would almost certainly get pushback. It's a fundamental > > programming principle, not just in networking but pretty much > > everywhere, to not continuously redefine common and standard > > constructs-- just put common things in header files that can be shared > > by multiple programs (to do otherwise substantially increases the > > possibility of errors, bloats code, and reduces readability). > > > > Marshalling up common definitions into header files that are common in > > the P4 development environment seems simple enough (maybe it's already > > done?), but I would also point out that Linux has included files that > > describe protocol formats and header structures for almost every > > protocol under the sun that are well tested. It would be great if > > somehow we could somehow leverage that work. For instance, in the P4 > > samples I looked at srcAddr and dstAddr are defined for IP addresses, > > but in linux/ip.h their saddr and daddr are the respective field > > names. Why not just base the P4 definition on the Linux one? Then when > > someone is porting code from Linux to P4 they can use the same field > > names-- this makes things a lot easier on the programmer! I'll also > > mention that we wrote a little Python script to generate P4 header and > > constant definitions from Linux headers. It almost worked, the snag we > > hit was that P4 has some limits on nesting structures and unions so we > > couldn't translate some of the C structures to P4 (if you're > > interested I can provide the details on the problem we hit). > > > > The IPv4 header checksum code was a real head scratcher for me. Do we > > really need to state each field in the IP header just to compute the > > checksum? (and not just do this once, but twice :-( ). See code below > > for verifyChecksum and updateChecksum. > > > > In C, verifying and setting the IP header checksum is really easy: > > > > if (checksum(iphdr, 0, iphdr->ihl << 4)) > > goto bad_csum; > > > > ip->csum = checksum(iphdr, 0, iphdr->ihl << 4); > > > > Relative to the C code, the P4 code seems very convoluted to me and > > prone to errors. What if someone accidentally omits a field? What if > > fields become slightly out of order? Also, no one would ever describe > > the IPv4 checksum as taking the checksum over the IHL, diffserv, > > totalLen, ... That is *way* too complicated for an algorithm that is > > really simple-- from RFC791: "The checksum field is the 16 bit one's > > complement of the one's complement sum of all 16 bit words in the > > header.". Reverse engineering the design, the clue seems to be > > HashAlgorithm.csum16. Maybe in P4 the IP checksum is just considered > > another form of hash, and I suspect the input to hash computation is > > specified as sort of data structure to make things generic (for > > instance, how we create a substructure in flow keys in flow_dissector > > to compute a SipHash over the TCP and UDP tuple). But, the IPv4 > > checksum isn't just another hash-- on a host, we need to compute the > > checksum for *every* IPv4 packet. This has to be fast and simple, we > > can do this in as few as five instructions or less. So even if the > > code below is correct, I have to wonder how easy it is to emit an > > efficient executable. Would a compiler easily realize that all the > > fields in the pseudo structure are contiguous without holes such that > > it can omit those five instructions? > > > > I don't know how prevalent this method of listing all the fields in a > > data structure as arguments to a function is in P4, but, by almost any > > objective measure, I have to say that the code below is bad and > > bloated. Maybe there's a better way to do it in P4, but if there's not > > then this is a deficiency in the P4 language. > > > > Tom > > > > control verifyChecksum(inout headers hdr, > > inout metadata meta) > > { > > apply { > > // There is code similar to this in Github repo p4lang/p4c in > > // file testdata/p4_16_samples/flowlet_switching-bmv2.p4 > > // However in that file it is only for a fixed length IPv4 > > // header with no options. > > verify_checksum(true, > > { hdr.ipv4.version, > > hdr.ipv4.ihl, > > hdr.ipv4.diffserv, > > hdr.ipv4.totalLen, > > hdr.ipv4.identification, > > hdr.ipv4.flags, > > hdr.ipv4.fragOffset, > > hdr.ipv4.ttl, > > hdr.ipv4.protocol, > > hdr.ipv4.srcAddr, > > hdr.ipv4.dstAddr > > #ifdef ALLOW_IPV4_OPTIONS > > , hdr.ipv4.options > > #endif /* ALLOW_IPV4_OPTIONS */ > > }, > > hdr.ipv4.hdrChecksum, HashAlgorithm.csum16); > > } > > } > > > > control updateChecksum(inout headers hdr, > > inout metadata meta) > > { > > apply { > > update_checksum(true, > > { hdr.ipv4.version, > > hdr.ipv4.ihl, > > hdr.ipv4.diffserv, > > hdr.ipv4.totalLen, > > hdr.ipv4.identification, > > hdr.ipv4.flags, > > hdr.ipv4.fragOffset, > > hdr.ipv4.ttl, > > hdr.ipv4.protocol, > > hdr.ipv4.srcAddr, > > hdr.ipv4.dstAddr > > #ifdef ALLOW_IPV4_OPTIONS > > , hdr.ipv4.options > > #endif /* ALLOW_IPV4_OPTIONS */ > > }, > > hdr.ipv4.hdrChecksum, HashAlgorithm.csum16); > > } > > } > > > > On Wed, May 22, 2024 at 8:34 PM Tom Herbert <tom@sipanda.io> wrote: > > > > > > On Wed, May 22, 2024 at 7:30 PM Chris Sommers > > > <chris.sommers@keysight.com> wrote: > > > > > > > > > On Wed, May 22, 2024 at 8:54 PM Tom Herbert <mailto:tom@sipanda.io> wrote: > > > > > > > > > > > > On Wed, May 22, 2024 at 5:09 PM Chris Sommers > > > > > > <mailto:chris.sommers@keysight.com> wrote: > > > > > > > > > > > > > > > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <mailto:kuba@kernel.org> wrote: > > > > > > > > > > > > > > > > > > Hi Jamal! > > > > > > > > > > > > > > > > > > On Tue, 21 May 2024 08:35:07 -0400 Jamal Hadi Salim wrote: > > > > > > > > > > At that point(v16) i asked for the series to be applied despite the > > > > > > > > > > Nacks because, frankly, the Nacks have no merit. Paolo was not > > > > > > > > > > comfortable applying patches with Nacks and tried to mediate. In his > > > > > > > > > > mediation effort he asked if we could remove eBPF - and our answer was > > > > > > > > > > no because after all that time we have become dependent on it and > > > > > > > > > > frankly there was no technical reason not to use eBPF. > > > > > > > > > > > > > > > > > > I'm not fully clear on who you're appealing to, and I may be missing > > > > > > > > > some points. But maybe it will be more useful than hurtful if I clarify > > > > > > > > > my point of view. > > > > > > > > > > > > > > > > > > AFAIU BPF folks disagree with the use of their subsystem, and they > > > > > > > > > point out that P4 pipelines can be implemented using BPF in the first > > > > > > > > > place. > > > > > > > > > To which you reply that you like (a highly dated type of) a netlink > > > > > > > > > interface, and (handwavey) ability to configure the data path SW or > > > > > > > > > HW via the same interface. > > > > > > > > > > > > > > > > It's not what I "like" , rather it is a requirement to support both > > > > > > > > s/w and h/w offload. The TC model is the traditional approach to > > > > > > > > deploy these models. I addressed the same comment you are making above > > > > > > > > in #1a and #1b (https://urldefense.com/v3/__https://github.com/p4tc-dev/pushback-patches__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs0-w2zKP2A$). > >> > > >> > > > > > > > > > OTOH, "BPF folks disagree with the use of their subsystem" is a > > > > > > > > problematic statement. Is BPF infra for the kernel community or is it > > > > > > > > something the ebpf folks can decide, at their whim, to allow who they > > > > > > > > like to use or not. We are not changing any BPF code. And there's > > > > > > > > already a case where the interfaces are used exactly as we used them > > > > > > > > in the conntrack code i pointed to in the page (we literally copied > > > > > > > > that code). Why is it ok for conntrack code to use exactly the same > > > > > > > > approach but not us? > > > > > > > > > > > > > > > > > AFAICT there's some but not very strong support for P4TC, > > > > > > > > > > > > > > > > I dont agree. Paolo asked this question and afaik Intel, AMD (both > > > > > > > > build P4-native NICs) and the folks interested in the MS DASH project > > > > > > > > responded saying they are in support. Look at who is being Cced. A lot > > > > > > > > of these folks who attend biweekly discussion calls on P4TC. Sample: > > > > > > > > https://urldefense.com/v3/__https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/__;!!I5pVk4LIGAfnvw!kaZ6EmPxEqGLG8JMw-_L0BgYq48Pe25wj6pHMF6BVei5WsRgwMeLQupmvgvLyN-LgXacKBzzs09TFzoQBw$ > >> > > >> > > > > > > > > +1 > > > > > > > > > and it > > > > > > > > > doesn't benefit or solve any problems of the broader networking stack > > > > > > > > > (e.g. expressing or configuring parser graphs in general) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Huh? As a DSL, P4 has already been proven to be an extremely effective and popular way to express parse graphs, stack manipulation, and stateful programming. Yesterday, I used the P4TC dev branch to implement something in one sitting, which includes parsing RoCEv2 network stacks. I just cut and pasted P4 code originally written for a P4 ASIC into a working P4TC example to add functionality. It took mere seconds to compile and launch it, and a few minutes to test it. I know of no other workflow which provides such quick turnaround and is so accessible. I'd like it to be as ubiquitous as eBPF itself. > > > > > > > > > > > > Chris, > > > > > > > > > > > > When you say "it took mere seconds to compile and launch" are you > > > > > > taking into account the ramp up time that it takes to learn P4 and > > > > > > become proficient to do something interesting? > > > > > > > > Hi Tom, thanks for the dialog. To answer your question, it took seconds to compile and deploy, not learn P4. Adding the parsing for several headers took minutes. If you want to compare learning curve, learning to write P4 code and let the framework handle all the painful low-level Linux details is way easier than trying to learn how to write c code for Linux networking. It’s not even close. I’ve written C for 40 years, P4 for 7 years, and dabbled in eBPF so I can attest to the ease of learning and using P4. I’ve onboarded and mentored engineers who barely knew C, to develop complex networking products using P4, and built the automation APIs (REST, gRPC) to manage them. One person can develop an entire commercial product by themselves in months. P4 has expanded the reach of programmers such that both HW and SW engineers can easily learn P4 and become pretty adept at it. I would not expect even experienced c programmers to be able to master Linux internals very quickly. Writing a P4-TC program and injecting it via tc was like magic the first time. > > > > > > > > >> Considering that P4 > > > > > > syntax is very different from typical languages than networking > > > > > > programmers are typically familiar with, this ramp up time is > > > > > > non-zero. OTOH, eBPF is ubiquitous because it's primarily programmed > > > > > > in Restricted C-- this makes it easy for many programmers since they > > > > > > don't have to learn a completely new language and so the ramp up time > > > > > > for the average networking programmer is much less for using eBPF. > > > > > > > > I think your statement about “typical network programmers” overlooks the fact that since P4 was introduced, it has been taught in many universities to teach networking and possibly enabled a whole new breed of “network engineers” who can solve real problems without even knowing C programming. Without P4 they might never have gone this route. A class in network stack programming using c would have so many prerequisites to even get to parsing, compared to P4, where it could be demonstrated in one lesson. These “networking programmers” are not typical by your standards, but there are many such. They have just as much claim to the title "network programmer” as a C programmer. Similarly, an assembly language programmer is no less than a C or Python programmer. People writing P4 are usually focused on applications, and it is very useful and productive for that. Why should someone have to learn low-level C or eBPF to solve their problem? > > > > > > Hio Chris, > > > > > > You're comparing learning a completely new language versus programming > > > in a subset of an established language, they're really not comparable. > > > When one programs in Restricted-C they just need to understand what > > > features of C are supported. > > > > > > > > > > > > > > > > > > > This is really the fundamental problem with DSLs, they require > > > > > > specialized skill sets in a programming language for a narrow use case > > > > > > (and specialized compilers, tool chains, debugging, etc)-- this means > > > > > > a DSL only makes sense if there is no other means to accomplish the > > > > > > same effects using a commodity language with perhaps a specialized > > > > > > library (it's not just in the networking realm, consider the > > > > > > advantages of using CUDA-C instead of a DLS for GPUs). > > > > > > > > A pretty strong opinion, but DSLs arise to fill a need and P4 did so. It's still going strong. > > > > > > > > >> Personally, I > > > > > > don't believe that P4 has yet to be proven necessary for programming a > > > > > > datapath-- for instance we can program a parser in declarative > > > > > > representation in C, > > > > > > https://urldefense.com/v3/__https://netdevconf.info/0x16/papers/11/High*20Performance*20Programmable*20Parsers.pdf__;JSUl!!I5pVk4LIGAfnvw!m9zrSDvddfzSt_sMBjOEvqw31RzAwWlEDM4ah5IJ2kqsmq6XtPIVJd-1_ZoGWBXKLyda77RYLvGR83Ginw$. > >> > > > > > CPL (slide11) looks like a DSL wrapped in JSON to me. “Solution: Common Parser Language (CPL); Parser representation in declarative .json” So I am confused. It is either a new language a.k.a. DSL, or it's not. Nothing against it, I'm sure it is great, but let's call it what it is. > > > > > > Correct, it's not a new language. We've since renamed it Common Parser > > > Representation. > > > > > > > We already have parser representations in declarative p4. And it's used and known worldwide. And has a respectable specification, any users and working groups. And it's formally provable (https://urldefense.com/v3/__https://github.com/verified-network-toolchain/petr4__;!!I5pVk4LIGAfnvw!juhSwk9UTheuI8-0mudbGTSZ_GBx3Z6hmcOAgiaAW14Ecter6K8iJ8DSzakN1d4GCE4uFJ05wkE9mvn6Vw$) > > > > > > > > > > > > > > > > So unless P4 is proven necessary, then I'm doubtful it will ever be a > > > > > > ubiquitous way to program the kernel-- it seems much more likely that > > > > > > people will continue to use C and eBPF, and for those users that want > > > > > > to use P4 they can use P4->eBPF compiler. > > > > > > > > “ubiquitous way to program the kernel” – is not my goal. I don’t even want to know about the kernel when I am writing p4 - it's just a means to an end. I want to manipulate packets on a Linux host. P4DPDK, P4-eBPF, P4-TC – all let me do that. I LOVE the fact that P4-TC would be available in every Linux distro once upstreamed. It would solve so many deployment issues, benefit from regression testing, etc. So much goodness > > > > > > > > " and for those users that want to use P4 they can use P4->eBPF compiler." -I'd really like to choose for myself and not have someone make that choice for me. P4-TC checks all the boxes for me. > > > > > > Sure, but this is a lot of kernel code and that will require support > > > and maintenance. It needs to be justified, and the fact that someone > > > wants it just to have a choice is, frankly, not much of a > > > justification. I think a justification needs to start with "Why isn't > > > P4->eBPF sufficient?" (the question has been raised several times, but > > > it still doesn't seem like there's a strong answer). > > > > > > Tom > > > > > > > > Thanks for the point of view, it's healthy to debate. > > > > Cheers, > > > > Chris > > > > > > > > > > > > > > > > > > > > Tom, > > > > > I cant stop the distraction of this thread becoming a discussion on > > > > > the merits of DSL vs a lower level language (and I know you are not a > > > > > P4 fan) but please change the subject so we dont loose the main focus > > > > > which is a discussion on the patches. I have done it for you. Chris if > > > > > you wish to respond please respond under the new thread subject. > > > > > > > > > > cheers, > > > > > jamal > > > >
[AMD Official Use Only - AMD Internal Distribution Only] My apologies, earlier email used html and was blocked by the list... My response at the bottom as "VJ>"
Jain, Vipin wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > My apologies, earlier email used html and was blocked by the list... > My response at the bottom as "VJ>" > > ________________________________________ > From: Jain, Vipin <Vipin.Jain@amd.com> > Sent: Friday, May 24, 2024 2:28 PM > To: Singhai, Anjali <anjali.singhai@intel.com>; Hadi Salim, Jamal <jhs@mojatatu.com>; Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com>; Alexei Starovoitov <alexei.starovoitov@gmail.com>; Network Development <netdev@vger.kernel.org>; Chatterjee, Deb <deb.chatterjee@intel.com>; Limaye, Namrata <namrata.limaye@intel.com>; tom Herbert <tom@sipanda.io>; Marcelo Ricardo Leitner <mleitner@redhat.com>; Shirshyad, Mahesh <Mahesh.Shirshyad@amd.com>; Osinski, Tomasz <tomasz.osinski@intel.com>; Jiri Pirko <jiri@resnulli.us>; Cong Wang <xiyou.wangcong@gmail.com>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Vlad Buslov <vladbu@nvidia.com>; Simon Horman <horms@kernel.org>; Khalid Manaa <khalidm@nvidia.com>; Toke Høiland-Jørgensen <toke@redhat.com>; Victor Nogueira <victor@mojatatu.com>; Tammela, Pedro <pctammela@mojatatu.com>; Daly, Dan <dan.daly@intel.com>; Andy Fingerhut <andy.fingerhut@gmail.com>; Sommers, Chris <chris.sommers@keysight.com>; Matty Kadosh <mattyk@nvidia.com>; bpf <bpf@vger.kernel.org>; lwn@lwn.net <lwn@lwn.net> > Subject: Re: On the NACKs on P4TC patches > > [AMD Official Use Only - AMD Internal Distribution Only] > > > I can ascertain (from AMD) that we have stated interest in, and are in full support of P4TC. > > Happy to elaborate more if needed. > > Thank you, > Vipin Jain > Sr Fellow Engineer, AMD > ________________________________________ > From: Singhai, Anjali <anjali.singhai@intel.com> > Sent: Wednesday, May 22, 2024 5:30 PM > To: Hadi Salim, Jamal <jhs@mojatatu.com>; Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com>; Alexei Starovoitov <alexei.starovoitov@gmail.com>; Network Development <netdev@vger.kernel.org>; Chatterjee, Deb <deb.chatterjee@intel.com>; Limaye, Namrata <namrata.limaye@intel.com>; tom Herbert <tom@sipanda.io>; Marcelo Ricardo Leitner <mleitner@redhat.com>; Shirshyad, Mahesh <Mahesh.Shirshyad@amd.com>; Osinski, Tomasz <tomasz.osinski@intel.com>; Jiri Pirko <jiri@resnulli.us>; Cong Wang <xiyou.wangcong@gmail.com>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Vlad Buslov <vladbu@nvidia.com>; Simon Horman <horms@kernel.org>; Khalid Manaa <khalidm@nvidia.com>; Toke Høiland-Jørgensen <toke@redhat.com>; Victor Nogueira <victor@mojatatu.com>; Tammela, Pedro <pctammela@mojatatu.com>; Jain, Vipin <Vipin.Jain@amd.com>; Daly, Dan <dan.daly@intel.com>; Andy Fingerhut <andy.fingerhut@gmail.com>; Sommers, Chris <chris.sommers@keysight.com>; Matty Kadosh <mattyk@nvidia.com>; bpf <bpf@vger.kernel.org>; lwn@lwn.net <lwn@lwn.net> > Subject: RE: On the NACKs on P4TC patches > > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. > > > On Wed, May 22, 2024 at 6:19 PM Jakub Kicinski <kuba@kernel.org> wrote: > > >> AFAICT there's some but not very strong support for P4TC, > > On Wed, May 22, 2024 at 4:04 PM Jamal Hadi Salim <jhs@mojatatu.com > wrote: > >I dont agree. Paolo asked this question and afaik Intel, AMD (both build P4-native NICs) and the folks interested in the MS DASH project >responded saying they are in support. Look at who is being Cced. A lot of these folks who attend biweekly discussion calls on P4TC. >Sample: > >https://lore.kernel.org/netdev/IA0PR17MB7070B51A955FB8595FFBA5FB965E2@IA0PR17MB7070.namprd17.prod.outlook.com/ > > FWIW, Intel is in full support of P4TC as we have stated several times in the past. > VJ> I can ascertain (from AMD) that we have stated interest in, and are in full support of P4TC. Happy to elaborate more if needed. > VJ> Thanks, Vipin Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program these devices? At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally different architecture from mapping P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't see the need for P4TC in this context. If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. .John
>From: John Fastabend <john.fastabend@gmail.com> >Sent: Tuesday, May 28, 2024 1:17 PM >Jain, Vipin wrote: >> [AMD Official Use Only - AMD Internal Distribution Only] >> >> My apologies, earlier email used html and was blocked by the list... >> My response at the bottom as "VJ>" >> >> ________________________________________ >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices? >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context. >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. >.John John, Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware. The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC. These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings. One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths. We feel P4TC approach is the path to add Linux kernel support. The s/w path is needed as well for several reasons. We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath. Anjali
On Tue, May 28, 2024 at 3:17 PM Singhai, Anjali <anjali.singhai@intel.com> wrote: > > >From: John Fastabend <john.fastabend@gmail.com> > >Sent: Tuesday, May 28, 2024 1:17 PM > > >Jain, Vipin wrote: > >> [AMD Official Use Only - AMD Internal Distribution Only] > >> > >> My apologies, earlier email used html and was blocked by the list... > >> My response at the bottom as "VJ>" > >> > >> ________________________________________ > > >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices? > > >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. > > >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping > >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. > >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context. > > >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. > > >.John > > > John, > Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware. > The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC. > These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings. > One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths. > We feel P4TC approach is the path to add Linux kernel support. > > The s/w path is needed as well for several reasons. > We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath. Hi Anjali, Are there any use cases of P4-TC that don't involve P4 hardware? If someone wanted to write one off datapath code for their deployment and they didn't have P4 hardware would you suggest that they write they're code in P4-TC? The reason I ask is because I'm concerned about the performance of P4-TC. Like John said, this is mapping code that is intended to run in specialized hardware into a CPU, and it's also interpreted execution in TC. The performance numbers in https://github.com/p4tc-dev/docs/blob/main/p4-conference-2023/2023P4WorkshopP4TC.pdf seem to show that P4-TC has about half the performance of XDP. Even with a lot of work, it's going to be difficult to substantially close that gap. The risk if we allow this into the kernel is that a vendor might be tempted to point to P4-TC performance as a baseline to justify to customers that they need to buy specialized hardware to get performance, whereas if XDP was used maybe they don't need the performance and cost of hardware. Note, this scenario already happened once before, when the DPDK joined LF they made bogus claims that they got a 100x performance over the kernel-- had they put at least the slightest effort into tuning the kernel that would have dropped the delta by an order of magnitude, and since then we've pretty much closed the gap (actually, this is precisely what motivated the creation of XDP so I guess that story had a happy ending!) . There are circumstances where hardware offload may be warranted, but it needs to be honestly justified by comparing it to an optimized software solution-- so in the case of P4, it should be compared to well written XDP code for instance, not P4-TC. Tom > > > Anjali >
> On Tue, May 28, 2024 at 3:17 PM Singhai, Anjali > <anjali.singhai@intel.com> wrote: > > > > >From: John Fastabend <john.fastabend@gmail.com> > > >Sent: Tuesday, May 28, 2024 1:17 PM > > > > >Jain, Vipin wrote: > > >> [AMD Official Use Only - AMD Internal Distribution Only] > > >> > > >> My apologies, earlier email used html and was blocked by the list... > > >> My response at the bottom as "VJ>" > > >> > > >> ________________________________________ > > > > >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices? > > > > >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. > > > > >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping > > >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. > > >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context. > > > > >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. > > > > >.John > > > > > > John, > > Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware. > > The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC. > > These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings. > > One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths. > > We feel P4TC approach is the path to add Linux kernel support. > > > > The s/w path is needed as well for several reasons. > > We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath. Anjali, thanks for asking. Agreed, I like the flexibility of accommodating a variety of platforms depending upon performance requirements and intended target system. For me, flexibility is important. Some solutions need an inline filter and P4-TC makes it so easy. The fact I will be able to get HW offload means I'm not performance bound. Some other solutions might need DPDK implementation, so P4-DPDK is a choice there as well, and there are acceleration options. Keeping much of the dataplane design in one language (P4) makes it easier for more developers to create products without having to be platform-level experts. As someone who's worked with P4 Tofino, P4-TC, bmv2, etc. I can authoritatively state that all have their proper place. > > Hi Anjali, > > Are there any use cases of P4-TC that don't involve P4 hardware? If > someone wanted to write one off datapath code for their deployment and > they didn't have P4 hardware would you suggest that they write they're > code in P4-TC? The reason I ask is because I'm concerned about the > performance of P4-TC. Like John said, this is mapping code that is > intended to run in specialized hardware into a CPU, and it's also > interpreted execution in TC. The performance numbers in > https://urldefense.com/v3/__https://github.com/p4tc-dev/docs/blob/main/p4-conference-2023/2023P4WorkshopP4TC.pdf__;!!I5pVk4LIGAfnvw!mHilz4xBMimnfapDG8BEgqOuPw_Mn-KiMHb-aNbl8nB8TwfOfSleeIANiNRFQtTc5zfR0aK1TE2J8lT2Fg$ > seem to show that P4-TC has about half the performance of XDP. Even > with a lot of work, it's going to be difficult to substantially close > that gap. AFAIK P4-TC can emit XDP or eBPF code depending upon the situation, someone more knowledgeable should chime in. However, I don't agree that comparing the speeds of XDP vs. P4-TC should even be a deciding factor. If P4-TC is good enough for a lot of applications, that is fine by me and over time it'll only get better. If we held back every innovation because it was slower than something else, progress would suffer. > > The risk if we allow this into the kernel is that a vendor might be > tempted to point to P4-TC performance as a baseline to justify to > customers that they need to buy specialized hardware to get > performance, whereas if XDP was used maybe they don't need the > performance and cost of hardware. I really don't buy this argument, it's FUD. Let's judge P4-TC on its merits, not prejudge it as a ploy to sell vendor hardware. > Note, this scenario already happened > once before, when the DPDK joined LF they made bogus claims that they > got a 100x performance over the kernel-- had they put at least the > slightest effort into tuning the kernel that would have dropped the > delta by an order of magnitude, and since then we've pretty much > closed the gap (actually, this is precisely what motivated the > creation of XDP so I guess that story had a happy ending!) . There are > circumstances where hardware offload may be warranted, but it needs to > be honestly justified by comparing it to an optimized software > solution-- so in the case of P4, it should be compared to well written > XDP code for instance, not P4-TC. I strongly disagree that it "it needs to be honestly justified by comparing it to an optimized software solution." Says who? This is no more factual than saying "C or golang need to be judged by comparing it to assembly language." Today the gap between C and assembly is small, but way back in my career, C was way slower. Over time optimizing compilers have closed the gap. Who's to say P4 technologies won't do the same? P4-TC can be judged on its own merits for its utility and productivity. I can't stress enough that P4 is very productive when applied to certain problems. Note, P4-BMv2 has been used by thousands of developers, researchers and students and it is relatively slow. Yet that doesn't deter users. There is a Google Summer of Code project to add PNA support, rather ambitious. However, P4-TC already partially supports PNA and the gap is closing. I feel like P4-TC could replace the use of BMv2 in a lot of applications and if it were upstreamed, it'd eventually be available on all Linux machines. The ability to write custom externs is very compelling. Eventual HW offload using the same code will be game-changing. Bmv2 is a big c++ program and somewhat intimidating to dig into to make enhancements, especially at the architectural level. There is no HW offload path, and it's not really fast, so it remains mainly a researchy-thing and will stay that way. P4-TC could span the needs from research to production in SW, and performant production with HW offload. > > Tom > > > > > > > Anjali >
Singhai, Anjali wrote: > >From: John Fastabend <john.fastabend@gmail.com> > >Sent: Tuesday, May 28, 2024 1:17 PM > > >Jain, Vipin wrote: > >> [AMD Official Use Only - AMD Internal Distribution Only] > >> > >> My apologies, earlier email used html and was blocked by the list... > >> My response at the bottom as "VJ>" > >> > >> ________________________________________ > > >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices? > > >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. > > >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping > >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. > >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context. > > >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. > > >.John > > > John, > Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware. Maybe more direct what Linux drivers support this? That would be a good first place to start IMO. Similarly what AMD hardware driver supports this. If I have two drivers from two vendors with P4 support this is great. For Intel I assume this is idpf? To be concrete can we start with Linux driver A and P4 program P. Modprobe driver A and push P4 program P so that it does something very simple, and drop a CIDR/Port range into a table. Perhaps this is so obvious in your community the trouble is in the context of a Linux driver its not immediately obvious to me and I would suspect its not obvious to many others. I really think walking through the key steps here would really help? 1. $ p4IntelCompiler p4-dos.p4 -o myp4 2. $ modprobe idpf 3. $ ping -i eth0 10.0.0.1 // good 4. $ p4Load p4-dos.p4 5. -- load cidr into the hardware somehow -- p4rt-ctrl? 6. $ ping -i eth0 10.0.0.1 // dropped This is an honest attempt to help fwiw. Questions would be. For compilation do we need an artifact from Intel it seems so from docs. But maybe a typo not sure. I'm not overly stuck on it but worth mentioning if folks try to follow your docs. For 2 I assume this is just normal every day module load nothing to see. Does it pop something up in /proc or in firmware or...? How do I know its P4 ready? For 4. How does this actually work? Is it a file in a directory the driver pushes into firmware? How does the firmware know I've done this? Does the Linux driver already support this? For 5 (most interesting) how does this work today. How are you currently talking to the driver/firmware to insert rules and discover the tables? And does the idpf driver do this already? Some side channel I guess? This is p4rt-ctrl? I've seen docs for above in ipdk, but they are a bit hard to follow if I'm honest. I assume IPDK is the source folks talk to when we mention there is hardware somewhere. Also it seems there is an IPDK BPF support as well which is interesting. And do you know how the DPDK implementation works? Can we learn from them is it just on top of Flow API which we could easily use in devlink or some other *link I suspect. > The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC. > These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings. > One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths. I think many 1st order and important points have been skipped. How do you program the device is it a firmware blob, a set of firmware commands, something that comes to you on device so only vendor sees this? Maybe I can infer this from some docs and some examples (by the way I ran through some of your DPU docs and such) but its unclear how these map onto Linux networking. Jiri started into this earlier and was cut off because p4tc was not for hardware offload. Now it is apparently. P4 is a good DSL for this sure and it has a runtime already specified which is great. This is not a qdisc/tc its an entire hardware pipeline I don't see the reason to put it in TC at all. > We feel P4TC approach is the path to add Linux kernel support. I disagree with your implementation not your goals to support flexible hardware. > > The s/w path is needed as well for several reasons. > We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath. None of above requires P4TC. For different architectures you build optimal backend compilers. You have a Xilenx backend, an Intel backend, and a Linux CPU based backend. I see no reason to constrain the software case to map to a pipeline model for example. Software running on a CPU has very different characteristics from something running on a TOR, or FPGA. Trying to push all these into one backend "model" will result in suboptimal result for every target. At the end of the day my .02$, P4 is a DSL it needs a target dependent compiler in front of it. I want to optimize my software pipeline the compiler should compress tables as much as possible and search for a O(1) lookup even if getting that key is somewhat expensive. Conversely a TCAM changes the game. An FPGA is going to be flexible and make lots of tradeoffs here of which I'm not an expert. Also by avoiding loading the DSL into the kernel you leave room for others to build new/better/worse DSLs as they please. The P4 community writes control applicatoins on top of the runtime spec right? p4rt-ctl being the thing I found. This should abstract the endpoint away to work with hardware or software or FPGA or anything else. .John
[Public] Inline as <VJ2>... (was html, sorry)
> None of above requires P4TC. For different architectures you > build optimal backend compilers. You have a Xilenx backend, > an Intel backend, and a Linux CPU based backend. I see no > reason to constrain the software case to map to a pipeline > model for example. Software running on a CPU has very different > characteristics from something running on a TOR, or FPGA. > Trying to push all these into one backend "model" will result > in suboptimal result for every target. At the end of the > day my .02$, P4 is a DSL it needs a target dependent compiler > in front of it. I want to optimize my software pipeline the > compiler should compress tables as much as possible and > search for a O(1) lookup even if getting that key is somewhat > expensive. Conversely a TCAM changes the game. An FPGA is > going to be flexible and make lots of tradeoffs here of which > I'm not an expert. Also by avoiding loading the DSL into the kernel > you leave room for others to build new/better/worse DSLs as they > please. > I think the general ask here is to define an Intermediate Representation that describes a programmed data path where it's a combination of declarative and imperative elements (parsers and table descriptions are better in declarative representation, functional logic seems more imperative). We also want references to accelerators with dynamic runtime binding to hardware (there are some interesting tricks we can do in the loader for a CPU target-- will talk about at Netdev). With a good IR we can decouple the frontend from the backend target which enables mixing and matching programming languages with arbitrary HW or SW targets. So a good IR potentially enables a lot of flexibility and freedom on both sides of the equation. An IR also facilitates reasonable kernel offload via signing images with a hash of the IR. So for instance, a frontend compiler could compile a P4 program into the IR. That code could then be compiled into a SW target, say eBPF, and maybe P4 hardware. Each image has the hash of the IR. At runtime, the eBPF code could be loaded into the kernel. The hardware image can be loaded into the device using a side band mechanism. To offload, we would query the device-- if the hash reported by the device matches the hash in the eBPF then we know that the offload is viable. No jits, no pushing firmware bits through the kernel, no need for device capabilities flags, and avoids the pitfalls of TC flower. There is one challenge here in how to deal with offloads that are already integrated into the kernel. I think GRO is a great example. GRO has been especially elusive as an offload since it requires a device to autonomously parse packets on input. We really want a GRO offload that parses the same exact protocols the kernel does (including encapsulations), but also implements the exact same logic in timers and pushing reassembled segments. So this needs to be programmable. The problem with the technique I described is that GRO is integrated into the kernel so we have no basis for a hash. I think the answer here is to start replacing fixed kernel C code with eBPF even in the critical path (we already talked about replacing flow dissector with eBPF). Anyway, we have been working on this. There's Common Parser Representation in json (formerly known CPL that we talked about at Netdev). For execution logic, LLVM IR seems fine (btrw, MLIR is really useful by the way!). We're just starting to look at tables (probably also json). If there's interest I could share more... Tom
Not sure why my email was tagged as html and blocked, but here goes again: On Tue, May 28, 2024 at 7:43 PM Chris Sommers <chris.sommers@keysight.com> wrote: > > > On Tue, May 28, 2024 at 3:17 PM Singhai, Anjali > > <anjali.singhai@intel.com> wrote: > > > > > > >From: John Fastabend <john.fastabend@gmail.com> > > > >Sent: Tuesday, May 28, 2024 1:17 PM > > > > > > >Jain, Vipin wrote: > > > >> [AMD Official Use Only - AMD Internal Distribution Only] > > > >> > > > >> My apologies, earlier email used html and was blocked by the list... > > > >> My response at the bottom as "VJ>" > > > >> > > > >> ________________________________________ > > > > > > >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices? > > > > > > >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. > > > > > > >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping > > > >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. > > > >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context. > > > > > > >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. > > > > > > >.John > > > > > > > > > John, > > > Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware. > > > The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC. > > > These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings. > > > One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths. > > > We feel P4TC approach is the path to add Linux kernel support. > > > > > > The s/w path is needed as well for several reasons. > > > We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath. > > Anjali, thanks for asking. Agreed, I like the flexibility of accommodating a variety of platforms depending upon performance requirements and intended target system. For me, flexibility is important. Some solutions need an inline filter and P4-TC makes it so easy. The fact I will be able to get HW offload means I'm not performance bound. Some other solutions might need DPDK implementation, so P4-DPDK is a choice there as well, and there are acceleration options. Keeping much of the dataplane design in one language (P4) makes it easier for more developers to create products without having to be platform-level experts. As someone who's worked with P4 Tofino, P4-TC, bmv2, etc. I can authoritatively state that all have their proper place. > > > > Hi Anjali, > > > > Are there any use cases of P4-TC that don't involve P4 hardware? If > > someone wanted to write one off datapath code for their deployment and > > they didn't have P4 hardware would you suggest that they write they're > > code in P4-TC? The reason I ask is because I'm concerned about the > > performance of P4-TC. Like John said, this is mapping code that is > > intended to run in specialized hardware into a CPU, and it's also > > interpreted execution in TC. The performance numbers in > > https://urldefense.com/v3/__https://github.com/p4tc-dev/docs/blob/main/p4-conference-2023/2023P4WorkshopP4TC.pdf__;!!I5pVk4LIGAfnvw!mHilz4xBMimnfapDG8BEgqOuPw_Mn-KiMHb-aNbl8nB8TwfOfSleeIANiNRFQtTc5zfR0aK1TE2J8lT2Fg$ > > seem to show that P4-TC has about half the performance of XDP. Even > > with a lot of work, it's going to be difficult to substantially close > > that gap. > > AFAIK P4-TC can emit XDP or eBPF code depending upon the situation, someone more knowledgeable should chime in. > However, I don't agree that comparing the speeds of XDP vs. P4-TC should even be a deciding factor. > If P4-TC is good enough for a lot of applications, that is fine by me and over time it'll only get better. > If we held back every innovation because it was slower than something else, progress would suffer. Yes, XDP can be emitted based on compiler options (and was a motivation factor in considering use of eBPF). Tom's comment above seems to confuse the fact that XDP tends to be faster than TC with eBPF as the fault of P4TC. In any case this statement falls under: https://github.com/p4tc-dev/pushback-patches?tab=readme-ov-file#2b-comment-but--it-is-not-performant On Tom's theory that the vendors are going to push inferior s/w for the sake of selling h/w - I would argues that we are not in the 90s anymore and I dont believe there's any vendor conspiracy theory here ;-> a single port can do 100s of Gbps, and of course if you want to do high speed you need to offload, no general purpose CPU will save you. And really the arguement that "offload=evil" holds no water anymore. cheers, jamal
On Tue, May 28, 2024 at 7:45 PM John Fastabend <john.fastabend@gmail.com> wrote: > > Singhai, Anjali wrote: > > >From: John Fastabend <john.fastabend@gmail.com> > > >Sent: Tuesday, May 28, 2024 1:17 PM > > > > >Jain, Vipin wrote: > > >> [AMD Official Use Only - AMD Internal Distribution Only] > > >> > > >> My apologies, earlier email used html and was blocked by the list... > > >> My response at the bottom as "VJ>" > > >> > > >> ________________________________________ > > > > >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices? > > > > >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. > > > > >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping > > >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. > > >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context. > > > > >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. > > > > >.John > > > > > > John, > > Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware. > > Maybe more direct what Linux drivers support this? That would be > a good first place to start IMO. Similarly what AMD hardware > driver supports this. If I have two drivers from two vendors > with P4 support this is great. > > For Intel I assume this is idpf? > > To be concrete can we start with Linux driver A and P4 program > P. Modprobe driver A and push P4 program P so that it does > something very simple, and drop a CIDR/Port range into a table. > Perhaps this is so obvious in your community the trouble is in > the context of a Linux driver its not immediately obvious to me > and I would suspect its not obvious to many others. > > I really think walking through the key steps here would > really help? > > 1. $ p4IntelCompiler p4-dos.p4 -o myp4 > 2. $ modprobe idpf > 3. $ ping -i eth0 10.0.0.1 // good > 4. $ p4Load p4-dos.p4 > 5. -- load cidr into the hardware somehow -- p4rt-ctrl? > 6. $ ping -i eth0 10.0.0.1 // dropped > > This is an honest attempt to help fwiw. Questions would be. > > For compilation do we need an artifact from Intel it seems > so from docs. But maybe a typo not sure. I'm not overly stuck > on it but worth mentioning if folks try to follow your docs. > > For 2 I assume this is just normal every day module load nothing > to see. Does it pop something up in /proc or in firmware or...? > How do I know its P4 ready? > > For 4. How does this actually work? Is it a file in a directory > the driver pushes into firmware? How does the firmware know > I've done this? Does the Linux driver already support this? > > For 5 (most interesting) how does this work today. How are > you currently talking to the driver/firmware to insert rules > and discover the tables? And does the idpf driver do this > already? Some side channel I guess? This is p4rt-ctrl? > > I've seen docs for above in ipdk, but they are a bit hard > to follow if I'm honest. > > I assume IPDK is the source folks talk to when we mention there > is hardware somewhere. Also it seems there is an IPDK BPF support > as well which is interesting. > > And do you know how the DPDK implementation works? Can we > learn from them is it just on top of Flow API which we > could easily use in devlink or some other *link I suspect. > > > The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC. > > These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings. > > One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths. > > I think many 1st order and important points have been skipped. How do you > program the device is it a firmware blob, a set of firmware commands, > something that comes to you on device so only vendor sees this? Maybe > I can infer this from some docs and some examples (by the way I ran > through some of your DPU docs and such) but its unclear how these > map onto Linux networking. Jiri started into this earlier and was > cut off because p4tc was not for hardware offload. Now it is apparently. > > P4 is a good DSL for this sure and it has a runtime already specified > which is great. > > This is not a qdisc/tc its an entire hardware pipeline I don't see > the reason to put it in TC at all. > > > We feel P4TC approach is the path to add Linux kernel support. > > I disagree with your implementation not your goals to support > flexible hardware. > > > > > The s/w path is needed as well for several reasons. > > We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath. > > None of above requires P4TC. For different architectures you > build optimal backend compilers. You have a Xilenx backend, > an Intel backend, and a Linux CPU based backend. I see no > reason to constrain the software case to map to a pipeline > model for example. Software running on a CPU has very different > characteristics from something running on a TOR, or FPGA. > Trying to push all these into one backend "model" will result > in suboptimal result for every target. At the end of the > day my .02$, P4 is a DSL it needs a target dependent compiler > in front of it. I want to optimize my software pipeline the > compiler should compress tables as much as possible and > search for a O(1) lookup even if getting that key is somewhat > expensive. Conversely a TCAM changes the game. An FPGA is > going to be flexible and make lots of tradeoffs here of which > I'm not an expert. Also by avoiding loading the DSL into the kernel > you leave room for others to build new/better/worse DSLs as they > please. > > The P4 community writes control applicatoins on top of the > runtime spec right? p4rt-ctl being the thing I found. This > should abstract the endpoint away to work with hardware or > software or FPGA or anything else. > For the record, _every single patchset we have posted_ specified our requirements as being s/w + h/w. A simpler version of the requirements is listed here: https://github.com/p4tc-dev/pushback-patches?tab=readme-ov-file#summary-of-our-requirements John's content variant above is described in: https://github.com/p4tc-dev/pushback-patches?tab=readme-ov-file#summary-of-our-requirements According to him we should not bother with the kernel at all. It's what is commonly referred to as a monday-morning quarterbacking or arm-chair lawyering "lets just do it my way and it will all be great". It's 90% of these discussions and one of the reasons I put up that page. cheers, jamal
On Wed, May 29, 2024 at 7:21 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On Tue, May 28, 2024 at 7:45 PM John Fastabend <john.fastabend@gmail.com> wrote: > > > > Singhai, Anjali wrote: > > > >From: John Fastabend <john.fastabend@gmail.com> > > > >Sent: Tuesday, May 28, 2024 1:17 PM > > > > > > >Jain, Vipin wrote: > > > >> [AMD Official Use Only - AMD Internal Distribution Only] > > > >> > > > >> My apologies, earlier email used html and was blocked by the list... > > > >> My response at the bottom as "VJ>" > > > >> > > > >> ________________________________________ > > > > > > >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices? > > > > > > >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. > > > > > > >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping > > > >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. > > > >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context. > > > > > > >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. > > > > > > >.John > > > > > > > > > John, > > > Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware. > > > > Maybe more direct what Linux drivers support this? That would be > > a good first place to start IMO. Similarly what AMD hardware > > driver supports this. If I have two drivers from two vendors > > with P4 support this is great. > > > > For Intel I assume this is idpf? > > > > To be concrete can we start with Linux driver A and P4 program > > P. Modprobe driver A and push P4 program P so that it does > > something very simple, and drop a CIDR/Port range into a table. > > Perhaps this is so obvious in your community the trouble is in > > the context of a Linux driver its not immediately obvious to me > > and I would suspect its not obvious to many others. > > > > I really think walking through the key steps here would > > really help? > > > > 1. $ p4IntelCompiler p4-dos.p4 -o myp4 > > 2. $ modprobe idpf > > 3. $ ping -i eth0 10.0.0.1 // good > > 4. $ p4Load p4-dos.p4 > > 5. -- load cidr into the hardware somehow -- p4rt-ctrl? > > 6. $ ping -i eth0 10.0.0.1 // dropped > > > > This is an honest attempt to help fwiw. Questions would be. > > > > For compilation do we need an artifact from Intel it seems > > so from docs. But maybe a typo not sure. I'm not overly stuck > > on it but worth mentioning if folks try to follow your docs. > > > > For 2 I assume this is just normal every day module load nothing > > to see. Does it pop something up in /proc or in firmware or...? > > How do I know its P4 ready? > > > > For 4. How does this actually work? Is it a file in a directory > > the driver pushes into firmware? How does the firmware know > > I've done this? Does the Linux driver already support this? > > > > For 5 (most interesting) how does this work today. How are > > you currently talking to the driver/firmware to insert rules > > and discover the tables? And does the idpf driver do this > > already? Some side channel I guess? This is p4rt-ctrl? > > > > I've seen docs for above in ipdk, but they are a bit hard > > to follow if I'm honest. > > > > I assume IPDK is the source folks talk to when we mention there > > is hardware somewhere. Also it seems there is an IPDK BPF support > > as well which is interesting. > > > > And do you know how the DPDK implementation works? Can we > > learn from them is it just on top of Flow API which we > > could easily use in devlink or some other *link I suspect. > > > > > The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC. > > > These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings. > > > One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths. > > > > I think many 1st order and important points have been skipped. How do you > > program the device is it a firmware blob, a set of firmware commands, > > something that comes to you on device so only vendor sees this? Maybe > > I can infer this from some docs and some examples (by the way I ran > > through some of your DPU docs and such) but its unclear how these > > map onto Linux networking. Jiri started into this earlier and was > > cut off because p4tc was not for hardware offload. Now it is apparently. > > > > P4 is a good DSL for this sure and it has a runtime already specified > > which is great. > > > > This is not a qdisc/tc its an entire hardware pipeline I don't see > > the reason to put it in TC at all. > > > > > We feel P4TC approach is the path to add Linux kernel support. > > > > I disagree with your implementation not your goals to support > > flexible hardware. > > > > > > > > The s/w path is needed as well for several reasons. > > > We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath. > > > > None of above requires P4TC. For different architectures you > > build optimal backend compilers. You have a Xilenx backend, > > an Intel backend, and a Linux CPU based backend. I see no > > reason to constrain the software case to map to a pipeline > > model for example. Software running on a CPU has very different > > characteristics from something running on a TOR, or FPGA. > > Trying to push all these into one backend "model" will result > > in suboptimal result for every target. At the end of the > > day my .02$, P4 is a DSL it needs a target dependent compiler > > in front of it. I want to optimize my software pipeline the > > compiler should compress tables as much as possible and > > search for a O(1) lookup even if getting that key is somewhat > > expensive. Conversely a TCAM changes the game. An FPGA is > > going to be flexible and make lots of tradeoffs here of which > > I'm not an expert. Also by avoiding loading the DSL into the kernel > > you leave room for others to build new/better/worse DSLs as they > > please. > > > > The P4 community writes control applicatoins on top of the > > runtime spec right? p4rt-ctl being the thing I found. This > > should abstract the endpoint away to work with hardware or > > software or FPGA or anything else. > > > > For the record, _every single patchset we have posted_ specified our > requirements as being s/w + h/w. A simpler version of the requirements > is listed here: > https://github.com/p4tc-dev/pushback-patches?tab=readme-ov-file#summary-of-our-requirements > > John's content variant above is described in: > https://github.com/p4tc-dev/pushback-patches?tab=readme-ov-file#summary-of-our-requirements Correction: https://github.com/p4tc-dev/pushback-patches?tab=readme-ov-file#3-comment-but-you-did-it-wrong-heres-how-you-do-it cheers, jamal > According to him we should not bother with the kernel at all. It's > what is commonly referred to as a monday-morning quarterbacking or > arm-chair lawyering "lets just do it my way and it will all be great". > It's 90% of these discussions and one of the reasons I put up that > page. > > cheers, > jamal
On Wed, May 29, 2024 at 4:01 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > On Tue, May 28, 2024 at 7:43 PM Chris Sommers <chris.sommers@keysight.com> wrote: >> >> > On Tue, May 28, 2024 at 3:17 PM Singhai, Anjali >> > <anjali.singhai@intel.com> wrote: >> > > >> > > >From: John Fastabend <john.fastabend@gmail.com> >> > > >Sent: Tuesday, May 28, 2024 1:17 PM >> > > >> > > >Jain, Vipin wrote: >> > > >> [AMD Official Use Only - AMD Internal Distribution Only] >> > > >> >> > > >> My apologies, earlier email used html and was blocked by the list... >> > > >> My response at the bottom as "VJ>" >> > > >> >> > > >> ________________________________________ >> > > >> > > >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices? >> > > >> > > >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. >> > > >> > > >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping >> > > >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. >> > > >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context. >> > > >> > > >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. >> > > >> > > >.John >> > > >> > > >> > > John, >> > > Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware. >> > > The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC. >> > > These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings. >> > > One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths. >> > > We feel P4TC approach is the path to add Linux kernel support. >> > > >> > > The s/w path is needed as well for several reasons. >> > > We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath. >> >> Anjali, thanks for asking. Agreed, I like the flexibility of accommodating a variety of platforms depending upon performance requirements and intended target system. For me, flexibility is important. Some solutions need an inline filter and P4-TC makes it so easy. The fact I will be able to get HW offload means I'm not performance bound. Some other solutions might need DPDK implementation, so P4-DPDK is a choice there as well, and there are acceleration options. Keeping much of the dataplane design in one language (P4) makes it easier for more developers to create products without having to be platform-level experts. As someone who's worked with P4 Tofino, P4-TC, bmv2, etc. I can authoritatively state that all have their proper place. >> > >> > Hi Anjali, >> > >> > Are there any use cases of P4-TC that don't involve P4 hardware? If >> > someone wanted to write one off datapath code for their deployment and >> > they didn't have P4 hardware would you suggest that they write they're >> > code in P4-TC? The reason I ask is because I'm concerned about the >> > performance of P4-TC. Like John said, this is mapping code that is >> > intended to run in specialized hardware into a CPU, and it's also >> > interpreted execution in TC. The performance numbers in >> > https://urldefense.com/v3/__https://github.com/p4tc-dev/docs/blob/main/p4-conference-2023/2023P4WorkshopP4TC.pdf__;!!I5pVk4LIGAfnvw!mHilz4xBMimnfapDG8BEgqOuPw_Mn-KiMHb-aNbl8nB8TwfOfSleeIANiNRFQtTc5zfR0aK1TE2J8lT2Fg$ >> > seem to show that P4-TC has about half the performance of XDP. Even >> > with a lot of work, it's going to be difficult to substantially close >> > that gap. >> >> AFAIK P4-TC can emit XDP or eBPF code depending upon the situation, someone more knowledgeable should chime in. >> However, I don't agree that comparing the speeds of XDP vs. P4-TC should even be a deciding factor. >> If P4-TC is good enough for a lot of applications, that is fine by me and over time it'll only get better. >> If we held back every innovation because it was slower than something else, progress would suffer. >> > > > > Yes, XDP can be emitted based on compiler options (and was a motivation factor in considering use of eBPF). Tom's comment above seems to confuse the fact that XDP tends to be faster than TC with eBPF as the fault of P4TC. > In any case this statement falls under: https://github.com/p4tc-dev/pushback-patches?tab=readme-ov-file#2b-comment-but--it-is-not-performant Jamal, From that: "My response has always consistently been: performance is a lower priority to P4 correctness and expressibility." That might be true for P4, but not for the kernel. CPU performance is important, and your statement below that justifies offloads on the basis that "no general purpose CPU will save you" confirms that. Please be more upfront about what the performance is like including performance numbers in the cover letter for the next patch set. This is the best way to avoid confusion and rampant speculation, and if performance isn't stellar being open about it in the community is the best way to figure out how to improve it. > > On Tom's theory that the vendors are going to push inferior s/w for the sake of selling h/w: we are not in the 90s anymore and there's no vendor conspiracy theory here: a single port can do 100s of Gbps, and of course if you want to do high speed you need to offload, no general purpose CPU will save you. Let's not pretend that offloads are a magic bullet that just makes everything better, if that were true then we'd all be using TOE by now! There are a myriad of factors to consider whether offloading is worth it. What is "high speed", is this small packets or big packets, are we terminating TCP, are we doing some sort of fast/slow path split which might work great in the lab but on the Internet can become a DOS vector? What's the application? Are we just trying to offload parts of the datapath, TCP, RDMA, memcached, ML reduce operations? Are we trying to do line rate encryption, compression, trying to do a billion PCB lookups a second? Are we taking into account continuing advancements in the CPU that have in the past made offloads obsolete (for instance, AES instructions pretty much obsoleted initial attempts to obsolete IPsec)? How simple is the programming model, how debuggable is it, what's the TCO? I do believe offload is part of the solution. And the good news is that programmable devices facilitate that. IMO, our challenge is to create a facility in the kernel to kernel offloads in a much better way (I don't believe there's disagreement with these points). Tom > > cheers, > jamal > >> >> > The risk if we allow this into the kernel is that a vendor might be >> > tempted to point to P4-TC performance as a baseline to justify to >> > customers that they need to buy specialized hardware to get >> > performance, whereas if XDP was used maybe they don't need the >> > performance and cost of hardware. >> >> I really don't buy this argument, it's FUD. Let's judge P4-TC on its merits, not prejudge it as a ploy to sell vendor hardware. >> >> > Note, this scenario already happened >> > once before, when the DPDK joined LF they made bogus claims that they >> > got a 100x performance over the kernel-- had they put at least the >> > slightest effort into tuning the kernel that would have dropped the >> > delta by an order of magnitude, and since then we've pretty much >> > closed the gap (actually, this is precisely what motivated the >> > creation of XDP so I guess that story had a happy ending!) . There are >> > circumstances where hardware offload may be warranted, but it needs to >> > be honestly justified by comparing it to an optimized software >> > solution-- so in the case of P4, it should be compared to well written >> > XDP code for instance, not P4-TC. >> >> I strongly disagree that it "it needs to be honestly justified by comparing it to an optimized software solution." >> Says who? This is no more factual than saying "C or golang need to be judged by comparing it to assembly language." >> Today the gap between C and assembly is small, but way back in my career, C was way slower. >> Over time optimizing compilers have closed the gap. Who's to say P4 technologies won't do the same? >> P4-TC can be judged on its own merits for its utility and productivity. I can't stress enough that P4 is very productive when applied to certain problems. >> >> Note, P4-BMv2 has been used by thousands of developers, researchers and students and it is relatively slow. Yet that doesn't deter users. >> There is a Google Summer of Code project to add PNA support, rather ambitious. However, P4-TC already partially supports PNA and the gap is closing. >> I feel like P4-TC could replace the use of BMv2 in a lot of applications and if it were upstreamed, it'd eventually be available on all Linux machines. The ability to write custom externs >> is very compelling. Eventual HW offload using the same code will be game-changing. Bmv2 is a big c++ program and somewhat intimidating to dig into to make enhancements, especially at the architectural level. >> There is no HW offload path, and it's not really fast, so it remains mainly a researchy-thing and will stay that way. P4-TC could span the needs from research to production in SW, and performant production with HW offload. >> > >> > Tom >> > >> > > >> > > >> > > Anjali >> >
On Wed, May 29, 2024 at 10:46 AM Tom Herbert <tom@sipanda.io> wrote: > > On Wed, May 29, 2024 at 4:01 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > > > On Tue, May 28, 2024 at 7:43 PM Chris Sommers <chris.sommers@keysight.com> wrote: > >> > >> > On Tue, May 28, 2024 at 3:17 PM Singhai, Anjali > >> > <anjali.singhai@intel.com> wrote: > >> > > > >> > > >From: John Fastabend <john.fastabend@gmail.com> > >> > > >Sent: Tuesday, May 28, 2024 1:17 PM > >> > > > >> > > >Jain, Vipin wrote: > >> > > >> [AMD Official Use Only - AMD Internal Distribution Only] > >> > > >> > >> > > >> My apologies, earlier email used html and was blocked by the list... > >> > > >> My response at the bottom as "VJ>" > >> > > >> > >> > > >> ________________________________________ > >> > > > >> > > >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices? > >> > > > >> > > >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. > >> > > > >> > > >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping > >> > > >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. > >> > > >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context. > >> > > > >> > > >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. > >> > > > >> > > >.John > >> > > > >> > > > >> > > John, > >> > > Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware. > >> > > The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC. > >> > > These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings. > >> > > One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths. > >> > > We feel P4TC approach is the path to add Linux kernel support. > >> > > > >> > > The s/w path is needed as well for several reasons. > >> > > We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath. > >> > >> Anjali, thanks for asking. Agreed, I like the flexibility of accommodating a variety of platforms depending upon performance requirements and intended target system. For me, flexibility is important. Some solutions need an inline filter and P4-TC makes it so easy. The fact I will be able to get HW offload means I'm not performance bound. Some other solutions might need DPDK implementation, so P4-DPDK is a choice there as well, and there are acceleration options. Keeping much of the dataplane design in one language (P4) makes it easier for more developers to create products without having to be platform-level experts. As someone who's worked with P4 Tofino, P4-TC, bmv2, etc. I can authoritatively state that all have their proper place. > >> > > >> > Hi Anjali, > >> > > >> > Are there any use cases of P4-TC that don't involve P4 hardware? If > >> > someone wanted to write one off datapath code for their deployment and > >> > they didn't have P4 hardware would you suggest that they write they're > >> > code in P4-TC? The reason I ask is because I'm concerned about the > >> > performance of P4-TC. Like John said, this is mapping code that is > >> > intended to run in specialized hardware into a CPU, and it's also > >> > interpreted execution in TC. The performance numbers in > >> > https://urldefense.com/v3/__https://github.com/p4tc-dev/docs/blob/main/p4-conference-2023/2023P4WorkshopP4TC.pdf__;!!I5pVk4LIGAfnvw!mHilz4xBMimnfapDG8BEgqOuPw_Mn-KiMHb-aNbl8nB8TwfOfSleeIANiNRFQtTc5zfR0aK1TE2J8lT2Fg$ > >> > seem to show that P4-TC has about half the performance of XDP. Even > >> > with a lot of work, it's going to be difficult to substantially close > >> > that gap. > >> > >> AFAIK P4-TC can emit XDP or eBPF code depending upon the situation, someone more knowledgeable should chime in. > >> However, I don't agree that comparing the speeds of XDP vs. P4-TC should even be a deciding factor. > >> If P4-TC is good enough for a lot of applications, that is fine by me and over time it'll only get better. > >> If we held back every innovation because it was slower than something else, progress would suffer. > >> > > > > > > > Yes, XDP can be emitted based on compiler options (and was a motivation factor in considering use of eBPF). Tom's comment above seems to confuse the fact that XDP tends to be faster than TC with eBPF as the fault of P4TC. > > In any case this statement falls under: https://github.com/p4tc-dev/pushback-patches?tab=readme-ov-file#2b-comment-but--it-is-not-performant > > Jamal, > > From that: "My response has always consistently been: performance is a > lower priority to P4 correctness and expressibility." That might be > true for P4, but not for the kernel. CPU performance is important, and > your statement below that justifies offloads on the basis that "no > general purpose CPU will save you" confirms that. Please be more > upfront about what the performance is like including performance > numbers in the cover letter for the next patch set. This is the best > way to avoid confusion and rampant speculation, and if performance > isn't stellar being open about it in the community is the best way to > figure out how to improve it. I believe you are misreading those graphs or maybe you are mixing it with the original u32/pedit script approach? The tests are run at TC and XDP layers. Pay particular attention to the results of the handcoded/tuned eBPF datapath at TC and at XDP compared to analogous ones generated by the compiler. You will notice +/-5% or so differences. That is with the current compiler generated code. We are looking to improve that - but do note that is generated code, nothing to do with the kernel. As the P4 program becomes more complex (many tables, longer keys, more entries, more complex actions) then we become compute bound, so no difference really. Now having said that: yes - s/w performance is certainly _not our highest priority feature_ and that is not saying we dont care but as the text said If i am getting 2Mpps using handcoding vs 1.84Mpps using generated code(per those graphs) and i can generate code and execute it in 5 minutes (Chris who is knowledgeable in P4 was able to do it in less time), then _i pick the code generation any day of the week_. Tooling, tooling, tooling. To re-iterate, the most important requirement is the abstraction, meaning: I can take the same P4 program I am running in s/w and generate using a different backend for AMD or Intel offload equivalent and get several magnitude improvements in performance because it is now running in h/w. I still get to use the same application controlling either s/w and/or hardware, etc TBH, I am indifferent and could add some numbers but it is missing the emphasis of what we are trying to achieve, the cover letter is already half a novel - with the short attention span most people have it will be just muddying the waters. > > > > On Tom's theory that the vendors are going to push inferior s/w for the sake of selling h/w: we are not in the 90s anymore and there's no vendor conspiracy theory here: a single port can do 100s of Gbps, and of course if you want to do high speed you need to offload, no general purpose CPU will save you. > > Let's not pretend that offloads are a magic bullet that just makes > everything better, if that were true then we'd all be using TOE by > now! There are a myriad of factors to consider whether offloading is > worth it. What is "high speed", is this small packets or big packets, > are we terminating TCP, are we doing some sort of fast/slow path split > which might work great in the lab but on the Internet can become a DOS > vector? What's the application? Are we just trying to offload parts of > the datapath, TCP, RDMA, memcached, ML reduce operations? Are we > trying to do line rate encryption, compression, trying to do a billion > PCB lookups a second? Are we taking into account continuing > advancements in the CPU that have in the past made offloads obsolete > (for instance, AES instructions pretty much obsoleted initial attempts > to obsolete IPsec)? How simple is the programming model, how > debuggable is it, what's the TCO? > > I do believe offload is part of the solution. And the good news is > that programmable devices facilitate that. IMO, our challenge is to > create a facility in the kernel to kernel offloads in a much better > way (I don't believe there's disagreement with these points). > This is about a MAT(match-action table) model whose offloads are covered via TC and is well understood and is very specific. We are not trying to solve "the world of offloads" which includes TOEs. P4 aware NICs are in the market and afaik those ASICs are not solving TOE. I thought you understand the scope but if not start by reading this: https://github.com/p4tc-dev/docs/blob/main/why-p4tc.md cheers, jamal > Tom > > > > > > > > > cheers, > > jamal > > > >> > >> > The risk if we allow this into the kernel is that a vendor might be > >> > tempted to point to P4-TC performance as a baseline to justify to > >> > customers that they need to buy specialized hardware to get > >> > performance, whereas if XDP was used maybe they don't need the > >> > performance and cost of hardware. > >> > >> I really don't buy this argument, it's FUD. Let's judge P4-TC on its merits, not prejudge it as a ploy to sell vendor hardware. > >> > >> > Note, this scenario already happened > >> > once before, when the DPDK joined LF they made bogus claims that they > >> > got a 100x performance over the kernel-- had they put at least the > >> > slightest effort into tuning the kernel that would have dropped the > >> > delta by an order of magnitude, and since then we've pretty much > >> > closed the gap (actually, this is precisely what motivated the > >> > creation of XDP so I guess that story had a happy ending!) . There are > >> > circumstances where hardware offload may be warranted, but it needs to > >> > be honestly justified by comparing it to an optimized software > >> > solution-- so in the case of P4, it should be compared to well written > >> > XDP code for instance, not P4-TC. > >> > >> I strongly disagree that it "it needs to be honestly justified by comparing it to an optimized software solution." > >> Says who? This is no more factual than saying "C or golang need to be judged by comparing it to assembly language." > >> Today the gap between C and assembly is small, but way back in my career, C was way slower. > >> Over time optimizing compilers have closed the gap. Who's to say P4 technologies won't do the same? > >> P4-TC can be judged on its own merits for its utility and productivity. I can't stress enough that P4 is very productive when applied to certain problems. > >> > >> Note, P4-BMv2 has been used by thousands of developers, researchers and students and it is relatively slow. Yet that doesn't deter users. > >> There is a Google Summer of Code project to add PNA support, rather ambitious. However, P4-TC already partially supports PNA and the gap is closing. > >> I feel like P4-TC could replace the use of BMv2 in a lot of applications and if it were upstreamed, it'd eventually be available on all Linux machines. The ability to write custom externs > >> is very compelling. Eventual HW offload using the same code will be game-changing. Bmv2 is a big c++ program and somewhat intimidating to dig into to make enhancements, especially at the architectural level. > >> There is no HW offload path, and it's not really fast, so it remains mainly a researchy-thing and will stay that way. P4-TC could span the needs from research to production in SW, and performant production with HW offload. > >> > > >> > Tom > >> > > >> > > > >> > > > >> > > Anjali > >> >
On Thu, May 30, 2024 at 9:59 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On Wed, May 29, 2024 at 10:46 AM Tom Herbert <tom@sipanda.io> wrote: > > > > On Wed, May 29, 2024 at 4:01 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > > > > > > > > On Tue, May 28, 2024 at 7:43 PM Chris Sommers <chris.sommers@keysight.com> wrote: > > >> > > >> > On Tue, May 28, 2024 at 3:17 PM Singhai, Anjali > > >> > <anjali.singhai@intel.com> wrote: > > >> > > > > >> > > >From: John Fastabend <john.fastabend@gmail.com> > > >> > > >Sent: Tuesday, May 28, 2024 1:17 PM > > >> > > > > >> > > >Jain, Vipin wrote: > > >> > > >> [AMD Official Use Only - AMD Internal Distribution Only] > > >> > > >> > > >> > > >> My apologies, earlier email used html and was blocked by the list... > > >> > > >> My response at the bottom as "VJ>" > > >> > > >> > > >> > > >> ________________________________________ > > >> > > > > >> > > >Anjali and Vipin is your support for HW support of P4 or a Linux SW implementation of P4. If its for HW support what drivers would we want to support? Can you describe how to program >these devices? > > >> > > > > >> > > >At the moment there hasn't been any movement on Linux hardware P4 support side as far as I can tell. Yes there are some SDKs and build kits floating around for FPGAs. For example >maybe start with what drivers in kernel tree run the DPUs that have this support? I think this would be a productive direction to go if we in fact have hardware support in the works. > > >> > > > > >> > > >If you want a SW implementation in Linux my opinion is still pushing a DSL into the kernel datapath via qdisc/tc is the wrong direction. Mapping P4 onto hardware blocks is fundamentally >different architecture from mapping > > >> > > >P4 onto general purpose CPU and registers. My opinion -- to handle this you need a per architecture backend/JIT to compile the P4 to native instructions. > > >> > > >This will give you the most flexibility to define new constructs, best performance, and lowest overhead runtime. We have a P4 BPF backend already and JITs for most architectures I don't >see the need for P4TC in this context. > > >> > > > > >> > > >If the end goal is a hardware offload control plane I'm skeptical we even need something specific just for SW datapath. I would propose a devlink or new infra to program the device directly >vs overhead and complexity of abstracting through 'tc'. If you want to emulate your device use BPF or user space datapath. > > >> > > > > >> > > >.John > > >> > > > > >> > > > > >> > > John, > > >> > > Let me start by saying production hardware exists i think Jamal posted some links but i can point you to our hardware. > > >> > > The hardware devices under discussion are capable of being abstracted using the P4 match-action paradigm so that's why we chose TC. > > >> > > These devices are programmed using the TC/netlink interface i.e the standard TC control-driver ops apply. While it is clear to us that the P4TC abstraction suffices, we are currently discussing details that will cater for all vendors in our biweekly meetings. > > >> > > One big requirement is we want to avoid the flower trap - we dont want to be changing kernel/user/driver code every time we add new datapaths. > > >> > > We feel P4TC approach is the path to add Linux kernel support. > > >> > > > > >> > > The s/w path is needed as well for several reasons. > > >> > > We need the same P4 program to run either in software or hardware or in both using skip_sw/skip_hw. It could be either in split mode or as an exception path as it is done today in flower or u32. Also it is common now in the P4 community that people define their datapath using their program and will write a control application that works for both hardware and software datapaths. They could be using the software datapath for testing as you said but also for the split/exception path. Chris can probably add more comments on the software datapath. > > >> > > >> Anjali, thanks for asking. Agreed, I like the flexibility of accommodating a variety of platforms depending upon performance requirements and intended target system. For me, flexibility is important. Some solutions need an inline filter and P4-TC makes it so easy. The fact I will be able to get HW offload means I'm not performance bound. Some other solutions might need DPDK implementation, so P4-DPDK is a choice there as well, and there are acceleration options. Keeping much of the dataplane design in one language (P4) makes it easier for more developers to create products without having to be platform-level experts. As someone who's worked with P4 Tofino, P4-TC, bmv2, etc. I can authoritatively state that all have their proper place. > > >> > > > >> > Hi Anjali, > > >> > > > >> > Are there any use cases of P4-TC that don't involve P4 hardware? If > > >> > someone wanted to write one off datapath code for their deployment and > > >> > they didn't have P4 hardware would you suggest that they write they're > > >> > code in P4-TC? The reason I ask is because I'm concerned about the > > >> > performance of P4-TC. Like John said, this is mapping code that is > > >> > intended to run in specialized hardware into a CPU, and it's also > > >> > interpreted execution in TC. The performance numbers in > > >> > https://urldefense.com/v3/__https://github.com/p4tc-dev/docs/blob/main/p4-conference-2023/2023P4WorkshopP4TC.pdf__;!!I5pVk4LIGAfnvw!mHilz4xBMimnfapDG8BEgqOuPw_Mn-KiMHb-aNbl8nB8TwfOfSleeIANiNRFQtTc5zfR0aK1TE2J8lT2Fg$ > > >> > seem to show that P4-TC has about half the performance of XDP. Even > > >> > with a lot of work, it's going to be difficult to substantially close > > >> > that gap. > > >> > > >> AFAIK P4-TC can emit XDP or eBPF code depending upon the situation, someone more knowledgeable should chime in. > > >> However, I don't agree that comparing the speeds of XDP vs. P4-TC should even be a deciding factor. > > >> If P4-TC is good enough for a lot of applications, that is fine by me and over time it'll only get better. > > >> If we held back every innovation because it was slower than something else, progress would suffer. > > >> > > > > > > > > > > Yes, XDP can be emitted based on compiler options (and was a motivation factor in considering use of eBPF). Tom's comment above seems to confuse the fact that XDP tends to be faster than TC with eBPF as the fault of P4TC. > > > In any case this statement falls under: https://github.com/p4tc-dev/pushback-patches?tab=readme-ov-file#2b-comment-but--it-is-not-performant > > > > Jamal, > > > > From that: "My response has always consistently been: performance is a > > lower priority to P4 correctness and expressibility." That might be > > true for P4, but not for the kernel. CPU performance is important, and > > your statement below that justifies offloads on the basis that "no > > general purpose CPU will save you" confirms that. Please be more > > upfront about what the performance is like including performance > > numbers in the cover letter for the next patch set. This is the best > > way to avoid confusion and rampant speculation, and if performance > > isn't stellar being open about it in the community is the best way to > > figure out how to improve it. > > I believe you are misreading those graphs or maybe you are mixing it > with the original u32/pedit script approach? The tests are run at TC > and XDP layers. Pay particular attention to the results of the > handcoded/tuned eBPF datapath at TC and at XDP compared to analogous > ones generated by the compiler. You will notice +/-5% or so > differences. That is with the current compiler generated code. We are > looking to improve that - but do note that is generated code, nothing > to do with the kernel. As the P4 program becomes more complex (many > tables, longer keys, more entries, more complex actions) then we > become compute bound, so no difference really. > > Now having said that: yes - s/w performance is certainly _not our > highest priority feature_ and that is not saying we dont care but as > the text said If i am getting 2Mpps using handcoding vs 1.84Mpps using > generated code(per those graphs) and i can generate code and execute > it in 5 minutes (Chris who is knowledgeable in P4 was able to do it in > less time), then _i pick the code generation any day of the week_. > Tooling, tooling, tooling. > To re-iterate, the most important requirement is the abstraction, meaning: > I can take the same P4 program I am running in s/w and generate using > a different backend for AMD or Intel offload equivalent and get > several magnitude improvements in performance because it is now > running in h/w. I still get to use the same application controlling > either s/w and/or hardware, etc Jamal, I believe you're making contradictory points here. On one hand you're saying that performance isn't a high priority and that it's enough to get the abstraction right. On the other hand you seem to be making the argument that we need hardware offload because performance of software in a CPU is so bad. I can't rectify these statements. Also, when you claim that hardware is going to deliver "several magnitude improvements in performance" over an implementation that has not been optimized for performance in a CPU, then you are heading down the path of justifying hardware offload on the basis that it performs better than baseline software which has not been at all optimized. IMO, that is not valid justification and I believe it would be a disservice to our users if they buy into hardware where a software solution would have been sufficient had someone put in the effort to optimize it. > > TBH, I am indifferent and could add some numbers but it is missing the > emphasis of what we are trying to achieve, the cover letter is already > half a novel - with the short attention span most people have it will > be just muddying the waters. This is putting code in the kernel that runs in the Linux networking data path. It shouldn't be any surprise that we're asking for some quantification and analysis of performance in the patch description. Tom > > > > > > > On Tom's theory that the vendors are going to push inferior s/w for the sake of selling h/w: we are not in the 90s anymore and there's no vendor conspiracy theory here: a single port can do 100s of Gbps, and of course if you want to do high speed you need to offload, no general purpose CPU will save you. > > > > Let's not pretend that offloads are a magic bullet that just makes > > everything better, if that were true then we'd all be using TOE by > > now! There are a myriad of factors to consider whether offloading is > > worth it. What is "high speed", is this small packets or big packets, > > are we terminating TCP, are we doing some sort of fast/slow path split > > which might work great in the lab but on the Internet can become a DOS > > vector? What's the application? Are we just trying to offload parts of > > the datapath, TCP, RDMA, memcached, ML reduce operations? Are we > > trying to do line rate encryption, compression, trying to do a billion > > PCB lookups a second? Are we taking into account continuing > > advancements in the CPU that have in the past made offloads obsolete > > (for instance, AES instructions pretty much obsoleted initial attempts > > to obsolete IPsec)? How simple is the programming model, how > > debuggable is it, what's the TCO? > > > > I do believe offload is part of the solution. And the good news is > > that programmable devices facilitate that. IMO, our challenge is to > > create a facility in the kernel to kernel offloads in a much better > > way (I don't believe there's disagreement with these points). > > > > This is about a MAT(match-action table) model whose offloads are > covered via TC and is well understood and is very specific. > We are not trying to solve "the world of offloads" which includes > TOEs. P4 aware NICs are in the market and afaik those ASICs are not > solving TOE. I thought you understand the scope but if not start by > reading this: https://github.com/p4tc-dev/docs/blob/main/why-p4tc.md > > cheers, > jamal > > > Tom > > > > > > > > > > > > > > > > cheers, > > > jamal > > > > > >> > > >> > The risk if we allow this into the kernel is that a vendor might be > > >> > tempted to point to P4-TC performance as a baseline to justify to > > >> > customers that they need to buy specialized hardware to get > > >> > performance, whereas if XDP was used maybe they don't need the > > >> > performance and cost of hardware. > > >> > > >> I really don't buy this argument, it's FUD. Let's judge P4-TC on its merits, not prejudge it as a ploy to sell vendor hardware. > > >> > > >> > Note, this scenario already happened > > >> > once before, when the DPDK joined LF they made bogus claims that they > > >> > got a 100x performance over the kernel-- had they put at least the > > >> > slightest effort into tuning the kernel that would have dropped the > > >> > delta by an order of magnitude, and since then we've pretty much > > >> > closed the gap (actually, this is precisely what motivated the > > >> > creation of XDP so I guess that story had a happy ending!) . There are > > >> > circumstances where hardware offload may be warranted, but it needs to > > >> > be honestly justified by comparing it to an optimized software > > >> > solution-- so in the case of P4, it should be compared to well written > > >> > XDP code for instance, not P4-TC. > > >> > > >> I strongly disagree that it "it needs to be honestly justified by comparing it to an optimized software solution." > > >> Says who? This is no more factual than saying "C or golang need to be judged by comparing it to assembly language." > > >> Today the gap between C and assembly is small, but way back in my career, C was way slower. > > >> Over time optimizing compilers have closed the gap. Who's to say P4 technologies won't do the same? > > >> P4-TC can be judged on its own merits for its utility and productivity. I can't stress enough that P4 is very productive when applied to certain problems. > > >> > > >> Note, P4-BMv2 has been used by thousands of developers, researchers and students and it is relatively slow. Yet that doesn't deter users. > > >> There is a Google Summer of Code project to add PNA support, rather ambitious. However, P4-TC already partially supports PNA and the gap is closing. > > >> I feel like P4-TC could replace the use of BMv2 in a lot of applications and if it were upstreamed, it'd eventually be available on all Linux machines. The ability to write custom externs > > >> is very compelling. Eventual HW offload using the same code will be game-changing. Bmv2 is a big c++ program and somewhat intimidating to dig into to make enhancements, especially at the architectural level. > > >> There is no HW offload path, and it's not really fast, so it remains mainly a researchy-thing and will stay that way. P4-TC could span the needs from research to production in SW, and performant production with HW offload. > > >> > > > >> > Tom > > >> > > > >> > > > > >> > > > > >> > > Anjali > > >> >
Since the inevitable LWN article has been written, let me put more detail into what I already mentioned here: https://lore.kernel.org/all/20240301090020.7c9ebc1d@kernel.org/ for the benefit of non-networking people. On Wed, 10 Apr 2024 10:01:26 -0400 Jamal Hadi Salim wrote: > P4TC builds on top of many years of Linux TC experiences of a netlink > control path interface coupled with a software datapath with an equivalent > offloadable hardware datapath. The point of having SW datapath is to provide a blueprint for the behavior. This is completely moot for P4 which comes as a standard. Besides we already have 5 (or more) flow offloads, we don't need a 6th, completely disconnected from the existing ones. Leaving users guessing which one to use, and how they interact. In my opinion, reasonable way to implement programmable parser for Linux is: 1. User writes their parser in whatever DSL they want 2. User compiles the parser in user space 2.1 Compiler embeds a representation of the graph in the blob 3. User puts the blob in /lib/firmware 4. devlink dev $dev reload action parser-fetch $filename 5. devlink loads the file, parses it to extract the representation from 2.1, and passes the blob to the driver 5.1 driver/fw reinitializes the HW parser 5.2 user can inspect the graph by dumping the common representation from 2.1 (via something like devlink dpipe, perhaps) 6. The parser tables are annotated with Linux offload targets (routes, classic ntuple, nftables, flower etc.) with some tables being left as "raw"* (* better name would be great) 7. ethtool ntuple is extended to support insertion of arbitrary rules into the "raw" tables 8. The other tables can only be inserted into using the subsystem they are annotated for This builds on how some devices _already_ operate. Gives the benefits of expressing parser information and ability to insert rules for uncommon protocols also for devices which are not programmable. And it uses ethtool ntuple, which SW people actually want to use. Before the tin foil hats gather - we have no use for any of this at Meta, I'm not trying to twist the design to fit the use cases of big bad hyperscalers.
On Tue, Jun 11, 2024 at 10:21 AM Jakub Kicinski <kuba@kernel.org> wrote: > > Since the inevitable LWN article has been written, let me put more > detail into what I already mentioned here: > > https://lore.kernel.org/all/20240301090020.7c9ebc1d@kernel.org/ > > for the benefit of non-networking people. > > On Wed, 10 Apr 2024 10:01:26 -0400 Jamal Hadi Salim wrote: > > P4TC builds on top of many years of Linux TC experiences of a netlink > > control path interface coupled with a software datapath with an equivalent > > offloadable hardware datapath. > > The point of having SW datapath is to provide a blueprint for the > behavior. This is completely moot for P4 which comes as a standard. > > Besides we already have 5 (or more) flow offloads, we don't need > a 6th, completely disconnected from the existing ones. Leaving > users guessing which one to use, and how they interact. > > In my opinion, reasonable way to implement programmable parser for You have mentioned "parser" before - are you referring to the DDP patches earlier from Intel? In P4 the parser is just one of the objects. > Linux is: > > 1. User writes their parser in whatever DSL they want > 2. User compiles the parser in user space > 2.1 Compiler embeds a representation of the graph in the blob > 3. User puts the blob in /lib/firmware > 4. devlink dev $dev reload action parser-fetch $filename > 5. devlink loads the file, parses it to extract the representation > from 2.1, and passes the blob to the driver > 5.1 driver/fw reinitializes the HW parser > 5.2 user can inspect the graph by dumping the common representation > from 2.1 (via something like devlink dpipe, perhaps) > 6. The parser tables are annotated with Linux offload targets (routes, > classic ntuple, nftables, flower etc.) with some tables being left > as "raw"* (* better name would be great) > 7. ethtool ntuple is extended to support insertion of arbitrary rules > into the "raw" tables > 8. The other tables can only be inserted into using the subsystem they > are annotated for > > This builds on how some devices _already_ operate. Gives the benefits > of expressing parser information and ability to insert rules for > uncommon protocols also for devices which are not programmable. > And it uses ethtool ntuple, which SW people actually want to use. > > Before the tin foil hats gather - we have no use for any of this at > Meta, I'm not trying to twist the design to fit the use cases of big > bad hyperscalers. The scope is much bigger than just parsers though, it is about P4 in which the parser is but one object. Limiting what we can do just to fit a narrow definition of "offload" is not the right direction. P4 is well understood, hardware exists for P4 and is used to specify hardware specs and is deployed(See Vipin's comment). cheers, jamal
On Tue, 11 Jun 2024 11:10:35 -0400 Jamal Hadi Salim wrote: > > Before the tin foil hats gather - we have no use for any of this at > > Meta, I'm not trying to twist the design to fit the use cases of big > > bad hyperscalers. > > The scope is much bigger than just parsers though, it is about P4 in > which the parser is but one object. For me it's very much not "about P4". I don't care what DSL user prefers and whether the device the offloads targets is built by a P4 vendor. > Limiting what we can do just to fit a narrow definition of "offload" > is not the right direction. This is how Linux development works. You implement small, useful slice which helps the overall project. Then you implement the next, and another. On the technical level, putting the code into devlink rather than TC does not impose any meaningful limitations. But I really don't want you to lift and shift the entire pile of code at once. > P4 is well understood, hardware exists for P4 and is used to specify > hardware specs and is deployed(See Vipin's comment). "Hardware exists for P4" is about as meaningful as "hardware exists for C++".
On Tue, Jun 11, 2024 at 11:33 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 11 Jun 2024 11:10:35 -0400 Jamal Hadi Salim wrote: > > > Before the tin foil hats gather - we have no use for any of this at > > > Meta, I'm not trying to twist the design to fit the use cases of big > > > bad hyperscalers. > > > > The scope is much bigger than just parsers though, it is about P4 in > > which the parser is but one object. > > For me it's very much not "about P4". I don't care what DSL user prefers > and whether the device the offloads targets is built by a P4 vendor. > I think it is an important detail though. You wouldnt say PSP shouldnt start small by first taking care of TLS or IPSec because it is not the target. > > Limiting what we can do just to fit a narrow definition of "offload" > > is not the right direction. > > This is how Linux development works. You implement small, useful slice > which helps the overall project. Then you implement the next, and > another. > > On the technical level, putting the code into devlink rather than TC > does not impose any meaningful limitations. But I really don't want > you to lift and shift the entire pile of code at once. > Yes, the binary blob is going via devlink or some other scheme. > > P4 is well understood, hardware exists for P4 and is used to specify > > hardware specs and is deployed(See Vipin's comment). > > "Hardware exists for P4" is about as meaningful as "hardware exists > for C++". We'll have to agree to disagree. Take a look at this for example. https://www.servethehome.com/pensando-distributed-services-architecture-smartnic/ cheers, jamal
On Tue, Jun 11, 2024 at 8:53 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On Tue, Jun 11, 2024 at 11:33 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Tue, 11 Jun 2024 11:10:35 -0400 Jamal Hadi Salim wrote: > > > > Before the tin foil hats gather - we have no use for any of this at > > > > Meta, I'm not trying to twist the design to fit the use cases of big > > > > bad hyperscalers. > > > > > > The scope is much bigger than just parsers though, it is about P4 in > > > which the parser is but one object. > > > > For me it's very much not "about P4". I don't care what DSL user prefers > > and whether the device the offloads targets is built by a P4 vendor. > > > > I think it is an important detail though. > You wouldnt say PSP shouldnt start small by first taking care of TLS > or IPSec because it is not the target. > > > > Limiting what we can do just to fit a narrow definition of "offload" > > > is not the right direction. Jamal, I think you might be missing Jakub's point. His plan wouldn't narrow the definition of "offload", but actually would increase applicability and use cases of offload. The best way to do an offload is allow flexibility on both sides of the equation: Let the user write their data path code in whatever language they want, and allow them offload to arbitrary software or programmable hardware targets. For example, if a user already has P4 hardware for their high end server then by all means they should write their datapath in P4. But, there might also be a user that wants to offload TCP keepalive to a lower powered CPU on a Smartphone; in this case a simple C program maybe running in eBPF on the CPU should do the trick-- forcing them to write their program in P4 or even worse force them to put P4 hardware into their smartphone is not good. We should be able to define a common offload infrastructure to be both language and target agnostic that would handle both these use cases of offload and everything in between. P4 could certainly be one option for both programming language and offload target, but it shouldn't be the only option. Tom > > > > This is how Linux development works. You implement small, useful slice > > which helps the overall project. Then you implement the next, and > > another. > > > > On the technical level, putting the code into devlink rather than TC > > does not impose any meaningful limitations. But I really don't want > > you to lift and shift the entire pile of code at once. > > > > Yes, the binary blob is going via devlink or some other scheme. > > > > P4 is well understood, hardware exists for P4 and is used to specify > > > hardware specs and is deployed(See Vipin's comment). > > > > "Hardware exists for P4" is about as meaningful as "hardware exists > > for C++". > > We'll have to agree to disagree. Take a look at this for example. > https://www.servethehome.com/pensando-distributed-services-architecture-smartnic/ > > cheers, > jamal
Tom Herbert wrote: > On Tue, Jun 11, 2024 at 8:53 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > On Tue, Jun 11, 2024 at 11:33 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > On Tue, 11 Jun 2024 11:10:35 -0400 Jamal Hadi Salim wrote: > > > > > Before the tin foil hats gather - we have no use for any of this at > > > > > Meta, I'm not trying to twist the design to fit the use cases of big > > > > > bad hyperscalers. > > > > > > > > The scope is much bigger than just parsers though, it is about P4 in > > > > which the parser is but one object. > > > > > > For me it's very much not "about P4". I don't care what DSL user prefers > > > and whether the device the offloads targets is built by a P4 vendor. > > > > > > > I think it is an important detail though. > > You wouldnt say PSP shouldnt start small by first taking care of TLS > > or IPSec because it is not the target. > > > > > > Limiting what we can do just to fit a narrow definition of "offload" > > > > is not the right direction. > > Jamal, > > I think you might be missing Jakub's point. His plan wouldn't narrow > the definition of "offload", but actually would increase applicability > and use cases of offload. The best way to do an offload is allow > flexibility on both sides of the equation: Let the user write their > data path code in whatever language they want, and allow them offload > to arbitrary software or programmable hardware targets. +1. > > For example, if a user already has P4 hardware for their high end > server then by all means they should write their datapath in P4. But, > there might also be a user that wants to offload TCP keepalive to a > lower powered CPU on a Smartphone; in this case a simple C program > maybe running in eBPF on the CPU should do the trick-- forcing them to > write their program in P4 or even worse force them to put P4 hardware > into their smartphone is not good. We should be able to define a > common offload infrastructure to be both language and target agnostic > that would handle both these use cases of offload and everything in > between. P4 could certainly be one option for both programming > language and offload target, but it shouldn't be the only option. Agree major benefit of proposal here is it doesn't dictate the language. My DSL preference is P4 but no need to push that here. > > Tom My $.02 Jakub's proposal is a very pragmatic way to get support for P4 enabled hardware I'm all for it. I can't actually think up anything in the P4 hardware side that couldn't go through the table notion in (7). We might want bulk updates and the likes at some point, but starting with basics should be good enough. > > > > > > > This is how Linux development works. You implement small, useful slice > > > which helps the overall project. Then you implement the next, and > > > another. +1. > > > > > > On the technical level, putting the code into devlink rather than TC > > > does not impose any meaningful limitations. But I really don't want > > > you to lift and shift the entire pile of code at once. > > > devlink or an improved n_tuple (n_table?) mechanism would be great. Happy to help here. > > > > Yes, the binary blob is going via devlink or some other scheme. > > > > > > P4 is well understood, hardware exists for P4 and is used to specify > > > > hardware specs and is deployed(See Vipin's comment). > > > > > > "Hardware exists for P4" is about as meaningful as "hardware exists > > > for C++". > > > > We'll have to agree to disagree. Take a look at this for example. > > https://www.servethehome.com/pensando-distributed-services-architecture-smartnic/ > > > > cheers, > > jamal >
On Tue, 11 Jun 2024 11:53:28 -0400 Jamal Hadi Salim wrote: > > For me it's very much not "about P4". I don't care what DSL user prefers > > and whether the device the offloads targets is built by a P4 vendor. > > I think it is an important detail though. > You wouldnt say PSP shouldnt start small by first taking care of TLS > or IPSec because it is not the target. I really don't see any parallel with PSP. And it _is_ small, 4kLoC. First you complain that community is "political" and doesn't give you technical feedback, and then when you get technical feedback you attack the work of the maintainer helping you. Do you not see how these kind of retaliatory responses are exactly the reason why people were afraid to give you clear feedback earlier? Maybe one of the upcoming conferences should give out mirrors instead of t-shirts as swag.
On Tue, Jun 11, 2024 at 1:53 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 11 Jun 2024 11:53:28 -0400 Jamal Hadi Salim wrote: > > > For me it's very much not "about P4". I don't care what DSL user prefers > > > and whether the device the offloads targets is built by a P4 vendor. > > > > I think it is an important detail though. > > You wouldnt say PSP shouldnt start small by first taking care of TLS > > or IPSec because it is not the target. > > I really don't see any parallel with PSP. And it _is_ small, 4kLoC. > > First you complain that community is "political" and doesn't give you > technical feedback, and then when you get technical feedback you attack > the work of the maintainer helping you. > You made a proposal saying it was a "start small" approach. I responded saying that it doesnt really cover our requirements and pointed to a sample h/w to show why. I only used PSP to illustrate why "start small" doesnt work for what we are targeting. I was not in any way attacking your work. We are not trying to cover the whole world of offloads. It is a very specific niche -P4- which uses the existing tc model because that's how match-action tables are offloaded today. The actions and tables are dynamically defined by the users P4 program whereas in flower they are hardcoded in the kernel. I dont see any other way to achieve these goals with flower or other existing approaches. Flower for example could be written as a single P4 program and the goal here is to support a wider range of programs without making kernel changes. cheers, jamal