Message ID | 20230213181541.26114-1-paulb@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | net/sched: cls_api: Support hardware miss to tc action | expand |
On Mon, Feb 13, 2023 at 08:15:34PM +0200, Paul Blakey wrote: > Hi, > > This series adds support for hardware miss to instruct tc to continue execution > in a specific tc action instance on a filter's action list. The mlx5 driver patch > (besides the refactors) shows its usage instead of using just chain restore. > > Currently a filter's action list must be executed all together or > not at all as driver are only able to tell tc to continue executing from a > specific tc chain, and not a specific filter/action. > > This is troublesome with regards to action CT, where new connections should > be sent to software (via tc chain restore), and established connections can > be handled in hardware. > > Checking for new connections is done when executing the ct action in hardware > (by checking the packet's tuple against known established tuples). > But if there is a packet modification (pedit) action before action CT and the > checked tuple is a new connection, hardware will need to revert the previous > packet modifications before sending it back to software so it can > re-match the same tc filter in software and re-execute its CT action. > > The following is an example configuration of stateless nat > on mlx5 driver that isn't supported before this patchet: > > #Setup corrosponding mlx5 VFs in namespaces > $ ip netns add ns0 > $ ip netns add ns1 > $ ip link set dev enp8s0f0v0 netns ns0 > $ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up > $ ip link set dev enp8s0f0v1 netns ns1 > $ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up > > #Setup tc arp and ct rules on mxl5 VF representors > $ tc qdisc add dev enp8s0f0_0 ingress > $ tc qdisc add dev enp8s0f0_1 ingress > $ ifconfig enp8s0f0_0 up > $ ifconfig enp8s0f0_1 up > > #Original side > $ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \ > ct_state -trk ip_proto tcp dst_port 8888 \ > action pedit ex munge tcp dport set 5001 pipe \ > action csum ip tcp pipe \ > action ct pipe \ > action goto chain 1 > $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \ > ct_state +trk+est \ > action mirred egress redirect dev enp8s0f0_1 > $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \ > ct_state +trk+new \ > action ct commit pipe \ > action mirred egress redirect dev enp8s0f0_1 > $ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \ > action mirred egress redirect dev enp8s0f0_1 > > #Reply side > $ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \ > action mirred egress redirect dev enp8s0f0_0 > $ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \ > ct_state -trk ip_proto tcp \ > action ct pipe \ > action pedit ex munge tcp sport set 8888 pipe \ > action csum ip tcp pipe \ > action mirred egress redirect dev enp8s0f0_0 > > #Run traffic > $ ip netns exec ns1 iperf -s -p 5001& > $ sleep 2 #wait for iperf to fully open > $ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888 > > #dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets: > $ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt" > Sent hardware 9310116832 bytes 6149672 pkt > Sent hardware 9310116832 bytes 6149672 pkt > Sent hardware 9310116832 bytes 6149672 pkt > > A new connection executing the first filter in hardware will first rewrite > the dst port to the new port, and then the ct action is executed, > because this is a new connection, hardware will need to be send this back > to software, on chain 0, to execute the first filter again in software. > The dst port needs to be reverted otherwise it won't re-match the old > dst port in the first filter. Because of that, currently mlx5 driver will > reject offloading the above action ct rule. > > This series adds supports partial offload of a filter's action list, As I said on v9, this sentence is very confusing and leads to the interpretation can some actions can be in_hw and some not. Please change it, so we don't have to keep fighting this misinterpretation later on. > and letting tc software continue processing in the specific action instance > where hardware left off (in the above case after the "action pedit ex munge tcp > dport... of the first rule") allowing support for scenarios such as the above. >