Message ID | 20210117080223.2107288-1-idosch@idosch.org (mailing list archive) |
---|---|
Headers | show |
Series | mlxsw: Add support for RED qevent "mark" | expand |
On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote: > From: Ido Schimmel <idosch@nvidia.com> > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The > filters added to the block bound to the "early_drop" qevent are executed on > packets for which the RED algorithm decides that they should be > early-dropped. The "mark" filters are similarly executed on ECT packets > that are marked as ECN-CE (Congestion Encountered). > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and > later, provided that the classifier used is "matchall", that the action > used is either "trap" or "mirred", and a handful or further limitations. For early_drop trap or mirred makes obvious sense, no explanation needed. But for marked as a user I'd like to see a _copy_ of the packet, while the original continues on its marry way to the destination. I'd venture to say that e.g. for a DCTCP deployment mark+trap is unusable, at least for tracing, because it distorts the operation by effectively dropping instead of marking. Am I reading this right? If that is the case and you really want to keep the mark+trap functionality - I feel like at least better documentation is needed. The current two liner should also be rewritten, quoting from patch 1: > * - ``ecn_mark`` > - ``drop`` > - Traps ECN-capable packets that were marked with CE (Congestion > Encountered) code point by RED algorithm instead of being dropped That needs to say that the trap is for datagrams trapped by a qevent. Otherwise "Traps ... instead of being dropped" is too much of a thought-shortcut, marked packets are not dropped. (I'd also think that trap is better documented next to early_drop, let's look at it from the reader's perspective)
On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote: > On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote: > > From: Ido Schimmel <idosch@nvidia.com> > > > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The > > filters added to the block bound to the "early_drop" qevent are executed on > > packets for which the RED algorithm decides that they should be > > early-dropped. The "mark" filters are similarly executed on ECT packets > > that are marked as ECN-CE (Congestion Encountered). > > > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and > > later, provided that the classifier used is "matchall", that the action > > used is either "trap" or "mirred", and a handful or further limitations. > > For early_drop trap or mirred makes obvious sense, no explanation > needed. > > But for marked as a user I'd like to see a _copy_ of the packet, > while the original continues on its marry way to the destination. > I'd venture to say that e.g. for a DCTCP deployment mark+trap is > unusable, at least for tracing, because it distorts the operation > by effectively dropping instead of marking. > > Am I reading this right? You get a copy of the packet as otherwise it will create a lot of problems (like you wrote). > > If that is the case and you really want to keep the mark+trap > functionality - I feel like at least better documentation is needed. > The current two liner should also be rewritten, quoting from patch 1: > > > * - ``ecn_mark`` > > - ``drop`` > > - Traps ECN-capable packets that were marked with CE (Congestion > > Encountered) code point by RED algorithm instead of being dropped > > That needs to say that the trap is for datagrams trapped by a qevent. > Otherwise "Traps ... instead of being dropped" is too much of a > thought-shortcut, marked packets are not dropped. > > (I'd also think that trap is better documented next to early_drop, > let's look at it from the reader's perspective) How about: "Traps a copy of ECN-capable packets that were marked with CE (Congestion Encountered) code point by RED algorithm instead of being dropped. The trap is enabled by attaching a filter with action 'trap' to the 'mark' qevent of the RED qdisc." In addition, this output: $ devlink trap show pci/0000:06:00.0 trap ecn_mark pci/0000:06:00.0: name ecn_mark type drop generic true action trap group buffer_drops Can be converted to: $ devlink trap show pci/0000:06:00.0 trap ecn_mark pci/0000:06:00.0: name ecn_mark type drop generic true action mirror group buffer_drops "mirror: The packet is forwarded by the underlying device and a copy is sent to the CPU." In this case the action is static and you cannot change it.
On Wed, 20 Jan 2021 11:14:37 +0200 Ido Schimmel wrote: > On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote: > > On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote: > > > From: Ido Schimmel <idosch@nvidia.com> > > > > > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The > > > filters added to the block bound to the "early_drop" qevent are executed on > > > packets for which the RED algorithm decides that they should be > > > early-dropped. The "mark" filters are similarly executed on ECT packets > > > that are marked as ECN-CE (Congestion Encountered). > > > > > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and > > > later, provided that the classifier used is "matchall", that the action > > > used is either "trap" or "mirred", and a handful or further limitations. > > > > For early_drop trap or mirred makes obvious sense, no explanation > > needed. > > > > But for marked as a user I'd like to see a _copy_ of the packet, > > while the original continues on its marry way to the destination. > > I'd venture to say that e.g. for a DCTCP deployment mark+trap is > > unusable, at least for tracing, because it distorts the operation > > by effectively dropping instead of marking. > > > > Am I reading this right? > > You get a copy of the packet as otherwise it will create a lot of > problems (like you wrote). Hm, so am I missing some background on semantics on TC_ACT_TRAP? Or perhaps you use a different action code? AFAICT the code in the kernel is: struct sk_buff *tcf_qevent_handle(... case TC_ACT_STOLEN: case TC_ACT_QUEUED: case TC_ACT_TRAP: __qdisc_drop(skb, to_free); *ret = __NET_XMIT_STOLEN; return NULL; Having TRAP mean DROP makes sense for filters, but in case of qevents shouldn't they be a no-op? Looking at sch_red looks like TRAP being a no-op would actually give us the expected behavior. > > If that is the case and you really want to keep the mark+trap > > functionality - I feel like at least better documentation is needed. > > The current two liner should also be rewritten, quoting from patch 1: > > > > > * - ``ecn_mark`` > > > - ``drop`` > > > - Traps ECN-capable packets that were marked with CE (Congestion > > > Encountered) code point by RED algorithm instead of being dropped > > > > That needs to say that the trap is for datagrams trapped by a qevent. > > Otherwise "Traps ... instead of being dropped" is too much of a > > thought-shortcut, marked packets are not dropped. > > > > (I'd also think that trap is better documented next to early_drop, > > let's look at it from the reader's perspective) > > How about: > > "Traps a copy of ECN-capable packets that were marked with CE I think "Traps copies" or "Traps the copy of .. packet"? I'm not a native speaker but there seems to be a grammatical mix here. > (Congestion Encountered) code point by RED algorithm instead of being > dropped. The trap is enabled by attaching a filter with action 'trap' to ... instead of those copies being dropped. > the 'mark' qevent of the RED qdisc." > > In addition, this output: > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark > pci/0000:06:00.0: > name ecn_mark type drop generic true action trap group buffer_drops > > Can be converted to: > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark > pci/0000:06:00.0: > name ecn_mark type drop generic true action mirror group buffer_drops > > "mirror: The packet is forwarded by the underlying device and a copy is sent to > the CPU." > > In this case the action is static and you cannot change it. Oh yes, that's nice, I thought mirror in traps means mirror to another port. Are there already traps which implement the mirroring / trapping a clone? Quick grep yields nothing of substance.
On Wed, Jan 20, 2021 at 04:45:08PM -0800, Jakub Kicinski wrote: > On Wed, 20 Jan 2021 11:14:37 +0200 Ido Schimmel wrote: > > On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote: > > > On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote: > > > > From: Ido Schimmel <idosch@nvidia.com> > > > > > > > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The > > > > filters added to the block bound to the "early_drop" qevent are executed on > > > > packets for which the RED algorithm decides that they should be > > > > early-dropped. The "mark" filters are similarly executed on ECT packets > > > > that are marked as ECN-CE (Congestion Encountered). > > > > > > > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and > > > > later, provided that the classifier used is "matchall", that the action > > > > used is either "trap" or "mirred", and a handful or further limitations. > > > > > > For early_drop trap or mirred makes obvious sense, no explanation > > > needed. > > > > > > But for marked as a user I'd like to see a _copy_ of the packet, > > > while the original continues on its marry way to the destination. > > > I'd venture to say that e.g. for a DCTCP deployment mark+trap is > > > unusable, at least for tracing, because it distorts the operation > > > by effectively dropping instead of marking. > > > > > > Am I reading this right? > > > > You get a copy of the packet as otherwise it will create a lot of > > problems (like you wrote). > > Hm, so am I missing some background on semantics on TC_ACT_TRAP? > Or perhaps you use a different action code? Well, to make it really clear, we can add TC_ACT_TRAP_MIRROR. TC_ACT_TRAP: Sole copy goes to the CPU TC_ACT_TRAP_MIRROR: The packet is forwarded by the underlying device and a copy is sent to the CPU And only allow (in mlxsw) attaching filters with TC_ACT_TRAP_MIRROR to the "mark" qevent. > > AFAICT the code in the kernel is: > > struct sk_buff *tcf_qevent_handle(... > > case TC_ACT_STOLEN: > case TC_ACT_QUEUED: > case TC_ACT_TRAP: > __qdisc_drop(skb, to_free); > *ret = __NET_XMIT_STOLEN; > return NULL; > > Having TRAP mean DROP makes sense for filters, but in case of qevents > shouldn't they be a no-op? > > Looking at sch_red looks like TRAP being a no-op would actually give us > the expected behavior. I'm not sure it makes sense to try to interpret these actions in software (I expect they will be used with "skip_sw" filters), but TC_ACT_TRAP_MIRROR can be a no-op like you suggested. > > > > If that is the case and you really want to keep the mark+trap > > > functionality - I feel like at least better documentation is needed. > > > The current two liner should also be rewritten, quoting from patch 1: > > > > > > > * - ``ecn_mark`` > > > > - ``drop`` > > > > - Traps ECN-capable packets that were marked with CE (Congestion > > > > Encountered) code point by RED algorithm instead of being dropped > > > > > > That needs to say that the trap is for datagrams trapped by a qevent. > > > Otherwise "Traps ... instead of being dropped" is too much of a > > > thought-shortcut, marked packets are not dropped. > > > > > > (I'd also think that trap is better documented next to early_drop, > > > let's look at it from the reader's perspective) > > > > How about: > > > > "Traps a copy of ECN-capable packets that were marked with CE > > I think "Traps copies" or "Traps the copy of .. packet"? > I'm not a native speaker but there seems to be a grammatical mix here. > > > (Congestion Encountered) code point by RED algorithm instead of being > > dropped. The trap is enabled by attaching a filter with action 'trap' to > > ... instead of those copies being dropped. Will reword > > > the 'mark' qevent of the RED qdisc." > > > > In addition, this output: > > > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark > > pci/0000:06:00.0: > > name ecn_mark type drop generic true action trap group buffer_drops > > > > Can be converted to: > > > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark > > pci/0000:06:00.0: > > name ecn_mark type drop generic true action mirror group buffer_drops > > > > "mirror: The packet is forwarded by the underlying device and a copy is sent to > > the CPU." > > > > In this case the action is static and you cannot change it. > > Oh yes, that's nice, I thought mirror in traps means mirror to another > port. Are there already traps which implement the mirroring / trapping > a clone? Quick grep yields nothing of substance. Yes. That's why we have the 'offload_fwd_mark' and 'offload_l3_fwd_mark' bits in the skb. For example, we let the hardware flood ARP requests ('arp_request'), but also send a copy to the CPU in case it needs to update its neighbour table. The trapping happens at L2, so we only set the 'offload_fwd_mark' bit. It will tell the bridge driver to not flood the packet again. The 'offload_l3_fwd_mark' bit is mainly used to support one-armed router use cases where a packet is forwarded through the same interface through which it was received ('uc_loopback'). We do the forwarding in hardware, but also send a copy to the CPU to give the kernel the chance to generate an ICMP redirect if it was not disabled by the user. See more info in commit 55827458e058 ("Merge branch 'mlxsw-Add-one-armed-router-support'"). I also want to explain how the qevent stuff works in hardware to make sure it is all clear. We have the ability to bind different triggers to a mirroring (SPAN) agent. The agent can point to a physical port / virtual interface (e.g., gretap for ERSPAN) or to the CPU port. The first is programmed via the mirred action and the second using the trap action. The triggers can be simple such as Rx/Tx packet (matchall + mirred) or policy engine (flower + mirred). The more advanced triggers are various buffer events such as early drops ('early_drop' qevent) and ECN marking ('mark' qevent). Currently, it is only possible to bind these triggers to a mirroring agent which is why we only support (in mlxsw) attaching matchall filters to these qevents. In the future we might be able to bind ACLs to these triggers in which case we will allow attaching flower filters. devlink-trap is really only a read-only interface in this case, meant to tell you why you go the packet from the hardware datapath. The enablement / disablement is done by tc which gives us feature parity with the software datapath.
On Thu, 21 Jan 2021 12:23:18 +0200 Ido Schimmel wrote: > On Wed, Jan 20, 2021 at 04:45:08PM -0800, Jakub Kicinski wrote: > > On Wed, 20 Jan 2021 11:14:37 +0200 Ido Schimmel wrote: > > > On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote: > > > > On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote: > > > > > From: Ido Schimmel <idosch@nvidia.com> > > > > > > > > > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The > > > > > filters added to the block bound to the "early_drop" qevent are executed on > > > > > packets for which the RED algorithm decides that they should be > > > > > early-dropped. The "mark" filters are similarly executed on ECT packets > > > > > that are marked as ECN-CE (Congestion Encountered). > > > > > > > > > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and > > > > > later, provided that the classifier used is "matchall", that the action > > > > > used is either "trap" or "mirred", and a handful or further limitations. > > > > > > > > For early_drop trap or mirred makes obvious sense, no explanation > > > > needed. > > > > > > > > But for marked as a user I'd like to see a _copy_ of the packet, > > > > while the original continues on its marry way to the destination. > > > > I'd venture to say that e.g. for a DCTCP deployment mark+trap is > > > > unusable, at least for tracing, because it distorts the operation > > > > by effectively dropping instead of marking. > > > > > > > > Am I reading this right? > > > > > > You get a copy of the packet as otherwise it will create a lot of > > > problems (like you wrote). > > > > Hm, so am I missing some background on semantics on TC_ACT_TRAP? > > Or perhaps you use a different action code? > > Well, to make it really clear, we can add TC_ACT_TRAP_MIRROR. > > TC_ACT_TRAP: Sole copy goes to the CPU > TC_ACT_TRAP_MIRROR: The packet is forwarded by the underlying device and > a copy is sent to the CPU > > And only allow (in mlxsw) attaching filters with TC_ACT_TRAP_MIRROR to > the "mark" qevent. > > > > > AFAICT the code in the kernel is: > > > > struct sk_buff *tcf_qevent_handle(... > > > > case TC_ACT_STOLEN: > > case TC_ACT_QUEUED: > > case TC_ACT_TRAP: > > __qdisc_drop(skb, to_free); > > *ret = __NET_XMIT_STOLEN; > > return NULL; > > > > Having TRAP mean DROP makes sense for filters, but in case of qevents > > shouldn't they be a no-op? > > > > Looking at sch_red looks like TRAP being a no-op would actually give us > > the expected behavior. > > I'm not sure it makes sense to try to interpret these actions in > software (I expect they will be used with "skip_sw" filters), but > TC_ACT_TRAP_MIRROR can be a no-op like you suggested. Well our paradigm is SW defines the behavior, we can't have HW forward and copy, while the SW drops the frame. Some engineer will try to implement this some day in their switch driver, look at the SW behavior and scratch their head. > > > the 'mark' qevent of the RED qdisc." > > > > > > In addition, this output: > > > > > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark > > > pci/0000:06:00.0: > > > name ecn_mark type drop generic true action trap group buffer_drops > > > > > > Can be converted to: > > > > > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark > > > pci/0000:06:00.0: > > > name ecn_mark type drop generic true action mirror group buffer_drops > > > > > > "mirror: The packet is forwarded by the underlying device and a copy is sent to > > > the CPU." > > > > > > In this case the action is static and you cannot change it. > > > > Oh yes, that's nice, I thought mirror in traps means mirror to another > > port. Are there already traps which implement the mirroring / trapping > > a clone? Quick grep yields nothing of substance. > > Yes. That's why we have the 'offload_fwd_mark' and 'offload_l3_fwd_mark' > bits in the skb. For example, we let the hardware flood ARP requests > ('arp_request'), but also send a copy to the CPU in case it needs to > update its neighbour table. The trapping happens at L2, so we only set > the 'offload_fwd_mark' bit. It will tell the bridge driver to not flood > the packet again. > > The 'offload_l3_fwd_mark' bit is mainly used to support one-armed router > use cases where a packet is forwarded through the same interface through > which it was received ('uc_loopback'). We do the forwarding in hardware, > but also send a copy to the CPU to give the kernel the chance to > generate an ICMP redirect if it was not disabled by the user. See more > info in commit 55827458e058 ("Merge branch > 'mlxsw-Add-one-armed-router-support'"). I see, thanks for the example, but just to be clear those are "internal traps", they don't have any impact on the devlink trap uAPI (in case we want to change the definition of MIRRED since nothing is using it). > I also want to explain how the qevent stuff works in hardware to make > sure it is all clear. We have the ability to bind different triggers to > a mirroring (SPAN) agent. The agent can point to a physical port / > virtual interface (e.g., gretap for ERSPAN) or to the CPU port. The > first is programmed via the mirred action and the second using the trap > action. > > The triggers can be simple such as Rx/Tx packet (matchall + mirred) or > policy engine (flower + mirred). The more advanced triggers are various > buffer events such as early drops ('early_drop' qevent) and ECN marking > ('mark' qevent). Currently, it is only possible to bind these triggers > to a mirroring agent which is why we only support (in mlxsw) attaching > matchall filters to these qevents. In the future we might be able to > bind ACLs to these triggers in which case we will allow attaching flower > filters. devlink-trap is really only a read-only interface in this case, > meant to tell you why you go the packet from the hardware datapath. The > enablement / disablement is done by tc which gives us feature parity > with the software datapath. Thanks for the explanation. I feel more and more convinced now that we should have TC_ACT_TRAP_MIRROR and the devlink trap should only be on/off :S Current model of "if ACT_TRAP consult devlink for trap configuration" is impossible to model in SW since it doesn't have a equivalent of devlink traps. Or we need that equivalent..
On Thu, Jan 21, 2021 at 09:19:40AM -0800, Jakub Kicinski wrote: > On Thu, 21 Jan 2021 12:23:18 +0200 Ido Schimmel wrote: > > On Wed, Jan 20, 2021 at 04:45:08PM -0800, Jakub Kicinski wrote: > > > On Wed, 20 Jan 2021 11:14:37 +0200 Ido Schimmel wrote: > > > > On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote: > > > > > On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote: > > > > > > From: Ido Schimmel <idosch@nvidia.com> > > > > > > > > > > > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The > > > > > > filters added to the block bound to the "early_drop" qevent are executed on > > > > > > packets for which the RED algorithm decides that they should be > > > > > > early-dropped. The "mark" filters are similarly executed on ECT packets > > > > > > that are marked as ECN-CE (Congestion Encountered). > > > > > > > > > > > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and > > > > > > later, provided that the classifier used is "matchall", that the action > > > > > > used is either "trap" or "mirred", and a handful or further limitations. > > > > > > > > > > For early_drop trap or mirred makes obvious sense, no explanation > > > > > needed. > > > > > > > > > > But for marked as a user I'd like to see a _copy_ of the packet, > > > > > while the original continues on its marry way to the destination. > > > > > I'd venture to say that e.g. for a DCTCP deployment mark+trap is > > > > > unusable, at least for tracing, because it distorts the operation > > > > > by effectively dropping instead of marking. > > > > > > > > > > Am I reading this right? > > > > > > > > You get a copy of the packet as otherwise it will create a lot of > > > > problems (like you wrote). > > > > > > Hm, so am I missing some background on semantics on TC_ACT_TRAP? > > > Or perhaps you use a different action code? > > > > Well, to make it really clear, we can add TC_ACT_TRAP_MIRROR. > > > > TC_ACT_TRAP: Sole copy goes to the CPU > > TC_ACT_TRAP_MIRROR: The packet is forwarded by the underlying device and > > a copy is sent to the CPU > > > > And only allow (in mlxsw) attaching filters with TC_ACT_TRAP_MIRROR to > > the "mark" qevent. > > > > > > > > AFAICT the code in the kernel is: > > > > > > struct sk_buff *tcf_qevent_handle(... > > > > > > case TC_ACT_STOLEN: > > > case TC_ACT_QUEUED: > > > case TC_ACT_TRAP: > > > __qdisc_drop(skb, to_free); > > > *ret = __NET_XMIT_STOLEN; > > > return NULL; > > > > > > Having TRAP mean DROP makes sense for filters, but in case of qevents > > > shouldn't they be a no-op? > > > > > > Looking at sch_red looks like TRAP being a no-op would actually give us > > > the expected behavior. > > > > I'm not sure it makes sense to try to interpret these actions in > > software (I expect they will be used with "skip_sw" filters), but > > TC_ACT_TRAP_MIRROR can be a no-op like you suggested. > > Well our paradigm is SW defines the behavior, we can't have HW forward > and copy, while the SW drops the frame. Some engineer will try to > implement this some day in their switch driver, look at the SW behavior > and scratch their head. OK, TC_ACT_TRAP_MIRROR will be a no-op in software > > > > > the 'mark' qevent of the RED qdisc." > > > > > > > > In addition, this output: > > > > > > > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark > > > > pci/0000:06:00.0: > > > > name ecn_mark type drop generic true action trap group buffer_drops > > > > > > > > Can be converted to: > > > > > > > > $ devlink trap show pci/0000:06:00.0 trap ecn_mark > > > > pci/0000:06:00.0: > > > > name ecn_mark type drop generic true action mirror group buffer_drops > > > > > > > > "mirror: The packet is forwarded by the underlying device and a copy is sent to > > > > the CPU." > > > > > > > > In this case the action is static and you cannot change it. > > > > > > Oh yes, that's nice, I thought mirror in traps means mirror to another > > > port. Are there already traps which implement the mirroring / trapping > > > a clone? Quick grep yields nothing of substance. > > > > Yes. That's why we have the 'offload_fwd_mark' and 'offload_l3_fwd_mark' > > bits in the skb. For example, we let the hardware flood ARP requests > > ('arp_request'), but also send a copy to the CPU in case it needs to > > update its neighbour table. The trapping happens at L2, so we only set > > the 'offload_fwd_mark' bit. It will tell the bridge driver to not flood > > the packet again. > > > > The 'offload_l3_fwd_mark' bit is mainly used to support one-armed router > > use cases where a packet is forwarded through the same interface through > > which it was received ('uc_loopback'). We do the forwarding in hardware, > > but also send a copy to the CPU to give the kernel the chance to > > generate an ICMP redirect if it was not disabled by the user. See more > > info in commit 55827458e058 ("Merge branch > > 'mlxsw-Add-one-armed-router-support'"). > > I see, thanks for the example, but just to be clear those are "internal > traps", they don't have any impact on the devlink trap uAPI (in case we > want to change the definition of MIRRED since nothing is using it). It's not MIRRED, but MIRROR. Anyway, these are not internal traps: $ devlink trap show pci/0000:01:00.0 trap arp_request pci/0000:01:00.0: name arp_request type control generic true action mirror group neigh_discovery > > > I also want to explain how the qevent stuff works in hardware to make > > sure it is all clear. We have the ability to bind different triggers to > > a mirroring (SPAN) agent. The agent can point to a physical port / > > virtual interface (e.g., gretap for ERSPAN) or to the CPU port. The > > first is programmed via the mirred action and the second using the trap > > action. > > > > The triggers can be simple such as Rx/Tx packet (matchall + mirred) or > > policy engine (flower + mirred). The more advanced triggers are various > > buffer events such as early drops ('early_drop' qevent) and ECN marking > > ('mark' qevent). Currently, it is only possible to bind these triggers > > to a mirroring agent which is why we only support (in mlxsw) attaching > > matchall filters to these qevents. In the future we might be able to > > bind ACLs to these triggers in which case we will allow attaching flower > > filters. devlink-trap is really only a read-only interface in this case, > > meant to tell you why you go the packet from the hardware datapath. The > > enablement / disablement is done by tc which gives us feature parity > > with the software datapath. > > Thanks for the explanation. I feel more and more convinced now that > we should have TC_ACT_TRAP_MIRROR and the devlink trap should only > be on/off :S Current model of "if ACT_TRAP consult devlink for trap > configuration" is impossible to model in SW since it doesn't have a > equivalent of devlink traps. Or we need that equivalent.. Wait, the current model is not "if ACT_TRAP consult devlink for trap configuration". 'ecn_mark' action is always 'trap' ('mirror' in v2) and can't be changed. Such packets can always be sent to the CPU, but the decision of whether to send them or not is based on the presence of tc filters attached to RED's 'mark' qevent with TC_ACT_TRAP (TC_ACT_TRAP_MIRROR in v2). I believe that with the proposed changes in v2 it should be perfectly clear that ECN marked packets are forwarded in hardware and a copy is sent to the CPU.
On Sat, 23 Jan 2021 17:28:02 +0200 Ido Schimmel wrote: > > Thanks for the explanation. I feel more and more convinced now that > > we should have TC_ACT_TRAP_MIRROR and the devlink trap should only > > be on/off :S Current model of "if ACT_TRAP consult devlink for trap > > configuration" is impossible to model in SW since it doesn't have a > > equivalent of devlink traps. Or we need that equivalent.. > > Wait, the current model is not "if ACT_TRAP consult devlink for trap > configuration". 'ecn_mark' action is always 'trap' ('mirror' in v2) and > can't be changed. Such packets can always be sent to the CPU, but the > decision of whether to send them or not is based on the presence of tc > filters attached to RED's 'mark' qevent with TC_ACT_TRAP > (TC_ACT_TRAP_MIRROR in v2). I see, missed that, but I think my point conceptually stands, right? Part of forwarding behavior was (in v1) only expressed in control plane (devlink) not dataplane (tc). > I believe that with the proposed changes in v2 it should be perfectly > clear that ECN marked packets are forwarded in hardware and a copy is > sent to the CPU. Yup, sounds good.
On Sat, Jan 23, 2021 at 11:55:27AM -0800, Jakub Kicinski wrote: > On Sat, 23 Jan 2021 17:28:02 +0200 Ido Schimmel wrote: > > > Thanks for the explanation. I feel more and more convinced now that > > > we should have TC_ACT_TRAP_MIRROR and the devlink trap should only > > > be on/off :S Current model of "if ACT_TRAP consult devlink for trap > > > configuration" is impossible to model in SW since it doesn't have a > > > equivalent of devlink traps. Or we need that equivalent.. > > > > Wait, the current model is not "if ACT_TRAP consult devlink for trap > > configuration". 'ecn_mark' action is always 'trap' ('mirror' in v2) and > > can't be changed. Such packets can always be sent to the CPU, but the > > decision of whether to send them or not is based on the presence of tc > > filters attached to RED's 'mark' qevent with TC_ACT_TRAP > > (TC_ACT_TRAP_MIRROR in v2). > > I see, missed that, but I think my point conceptually stands, right? > Part of forwarding behavior was (in v1) only expressed in control > plane (devlink) not dataplane (tc). I don't think so? The action was set to 'trap' in both devlink and tc. > > > I believe that with the proposed changes in v2 it should be perfectly > > clear that ECN marked packets are forwarded in hardware and a copy is > > sent to the CPU. > > Yup, sounds good. Thanks!
From: Ido Schimmel <idosch@nvidia.com> Petr says: The RED qdisc currently supports two qevents: "early_drop" and "mark". The filters added to the block bound to the "early_drop" qevent are executed on packets for which the RED algorithm decides that they should be early-dropped. The "mark" filters are similarly executed on ECT packets that are marked as ECN-CE (Congestion Encountered). A previous patchset has offloaded "early_drop" filters on Spectrum-2 and later, provided that the classifier used is "matchall", that the action used is either "trap" or "mirred", and a handful or further limitations. This patchset similarly offloads "mark" filters. Patch set overview: Patches #1 and #2 add the trap, under which packets will be reported to the CPU, if the qevent filter uses the action "trap". Patch #3 then recognizes FLOW_BLOCK_BINDER_TYPE_RED_MARK as a binder type, and offloads the attached filters similarly to _EARLY_DROP. Patch #4 cleans up some unused variables in a selftest, and patch #5 adds a new selftest for the RED "mark" qevent offload. Petr Machata (5): devlink: Add ecn_mark trap mlxsw: spectrum_trap: Add ecn_mark trap mlxsw: spectrum_qdisc: Offload RED qevent mark selftests: mlxsw: sch_red_core: Drop two unused variables selftests: mlxsw: RED: Add selftests for the mark qevent .../networking/devlink/devlink-trap.rst | 4 + .../net/ethernet/mellanox/mlxsw/spectrum.c | 2 + .../net/ethernet/mellanox/mlxsw/spectrum.h | 2 + .../ethernet/mellanox/mlxsw/spectrum_qdisc.c | 14 +++- .../ethernet/mellanox/mlxsw/spectrum_span.c | 16 ++++ .../ethernet/mellanox/mlxsw/spectrum_span.h | 1 + .../ethernet/mellanox/mlxsw/spectrum_trap.c | 9 ++ include/net/devlink.h | 3 + net/core/devlink.c | 1 + .../drivers/net/mlxsw/sch_red_core.sh | 84 ++++++++++++++++++- .../drivers/net/mlxsw/sch_red_ets.sh | 74 ++++++++++++++-- 11 files changed, 200 insertions(+), 10 deletions(-)