Message ID | cover.1661260787.git.leonro@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Extend XFRM core to allow full offload configuration | expand |
On Tue, 23 Aug 2022 16:31:57 +0300 Leon Romanovsky wrote: > * I didn't hear any suggestion what term to use instead of > "full offload", so left it as is. It is used in commit messages > and documentation only and easy to rename. > * Added performance data and background info to cover letter > * Reused xfrm_output_resume() function to support multiple XFRM transformations > * Add PMTU check in addition to driver .xdo_dev_offload_ok validation > * Documentation is in progress, but not part of this series yet. Since the use case is somewhat in question, perhaps switch to RFC postings until the drivers side incl. tc forwarding is implemented? Also the perf traces, I don't see them here.
On Thu, Aug 25, 2022 at 02:36:10PM -0700, Jakub Kicinski wrote: > On Tue, 23 Aug 2022 16:31:57 +0300 Leon Romanovsky wrote: > > * I didn't hear any suggestion what term to use instead of > > "full offload", so left it as is. It is used in commit messages > > and documentation only and easy to rename. > > * Added performance data and background info to cover letter > > * Reused xfrm_output_resume() function to support multiple XFRM transformations > > * Add PMTU check in addition to driver .xdo_dev_offload_ok validation > > * Documentation is in progress, but not part of this series yet. > > Since the use case is somewhat in question, perhaps switch to RFC > postings until the drivers side incl. tc forwarding is implemented? Proposed driver implementation works fine with eswitch representors. All our flow steering magic is performed on local table entry and it ensures that representors receives/sends "clean" traffic. We are using the following configuration snippet to achieve that. --------------------------------------------------------------------- #!/bin/bash P0_OUTER_REMOTE_IP=192.168.50.2 P0_OUTER_LOCAL_IP=192.168.50.1 PF0=enp8s0f0 VF0_REP=enp8s0f0_0 set -v # Configure IP and turn VF_REP on ifconfig $PF0 $P0_OUTER_LOCAL_IP/24 up ifconfig $VF0_REP up # Clean all TC rules, start fresh tc qdisc del dev enp8s0f0 ingress >/dev/null 2>&1 tc qdisc del dev enp8s0f0_0 ingress >/dev/null 2>&1 # Make sure steering mode is dmfs(FW) and eswitch encap is none devlink dev param set pci/0000:08:00.0 name flow_steering_mode value dmfs cmode runtime devlink dev eswitch set pci/0000:08:00.0 mode legacy devlink dev eswitch set pci/0000:08:00.0 encap none devlink dev eswitch set pci/0000:08:00.0 mode switchdev sleep 2 tc qdisc add dev enp8s0f0 ingress tc qdisc add dev enp8s0f0_0 ingress # Add TC rules tc filter add dev $PF0 parent ffff: protocol 802.1q chain 0 flower vlan_id 10 vlan_ethtype 802.1q cvlan_id 5 action vlan pop action vlan pop action mirred egress redirect dev $VF0_REP tc filter add dev $VF0_REP parent ffff: protocol all chain 0 flower action vlan push protocol 802.1q id 5 action vlan push protocol 802.1q id 10 action mirred egress redirect dev $PF0 tc filter show dev $PF0 ingress ---------------------------------------------------------------------------------------------------- We also don't offload anything related to routing as we can't differentiate between local traffic. > Also the perf traces, I don't see them here. It is worth to separate it to standalone discussion with a title: "why crypto is not fast enough?". I don't think that mixed discussions about full offload which Steffen said that he is interested and research about crypto bottlenecks will be productive. These discussions are orthogonal. Thanks
On Fri, 26 Aug 2022 09:26:57 +0300 Leon Romanovsky wrote: > On Thu, Aug 25, 2022 at 02:36:10PM -0700, Jakub Kicinski wrote: > > On Tue, 23 Aug 2022 16:31:57 +0300 Leon Romanovsky wrote: > > > * I didn't hear any suggestion what term to use instead of > > > "full offload", so left it as is. It is used in commit messages > > > and documentation only and easy to rename. > > > * Added performance data and background info to cover letter > > > * Reused xfrm_output_resume() function to support multiple XFRM transformations > > > * Add PMTU check in addition to driver .xdo_dev_offload_ok validation > > > * Documentation is in progress, but not part of this series yet. > > > > Since the use case is somewhat in question, perhaps switch to RFC > > postings until the drivers side incl. tc forwarding is implemented? > > Proposed driver implementation works fine with eswitch representors. > All our flow steering magic is performed on local table entry and it > ensures that representors receives/sends "clean" traffic. > > We are using the following configuration snippet to achieve that. > > --------------------------------------------------------------------- > #!/bin/bash > P0_OUTER_REMOTE_IP=192.168.50.2 > P0_OUTER_LOCAL_IP=192.168.50.1 > PF0=enp8s0f0 > VF0_REP=enp8s0f0_0 > > set -v > # Configure IP and turn VF_REP on > ifconfig $PF0 $P0_OUTER_LOCAL_IP/24 up > ifconfig $VF0_REP up > > # Clean all TC rules, start fresh > tc qdisc del dev enp8s0f0 ingress >/dev/null 2>&1 > tc qdisc del dev enp8s0f0_0 ingress >/dev/null 2>&1 > > # Make sure steering mode is dmfs(FW) and eswitch encap is none > devlink dev param set pci/0000:08:00.0 name flow_steering_mode value dmfs cmode runtime > devlink dev eswitch set pci/0000:08:00.0 mode legacy > devlink dev eswitch set pci/0000:08:00.0 encap none > devlink dev eswitch set pci/0000:08:00.0 mode switchdev > > sleep 2 > > tc qdisc add dev enp8s0f0 ingress > tc qdisc add dev enp8s0f0_0 ingress > > # Add TC rules > tc filter add dev $PF0 parent ffff: protocol 802.1q chain 0 flower vlan_id 10 vlan_ethtype 802.1q cvlan_id 5 action vlan pop action vlan pop action mirred egress redirect dev $VF0_REP > tc filter add dev $VF0_REP parent ffff: protocol all chain 0 flower action vlan push protocol 802.1q id 5 action vlan push protocol 802.1q id 10 action mirred egress redirect dev $PF0 > tc filter show dev $PF0 ingress > ---------------------------------------------------------------------------------------------------- > > We also don't offload anything related to routing as we can't > differentiate between local traffic. Yeah, nah, that's not what I'm asking for. I said forwarding, not sending traffic thru a different virtual interface. The TC rules must forward from or two the IPSec ifc. That was the use case Jason mentioned. > > Also the perf traces, I don't see them here. > > It is worth to separate it to standalone discussion with a title: > "why crypto is not fast enough?". I don't think that mixed discussions > about full offload which Steffen said that he is interested and > research about crypto bottlenecks will be productive. These discussions > are orthogonal. What do you mean by crypto bottlenecks? Please use more precise language. crypto here may mean "crypto only offload" or "crypto as done by CPU". I have no idea which one you mean. We are very much interested in the former, the latter is indeed out of scope here.
On Fri, Aug 26, 2022 at 04:45:22PM -0700, Jakub Kicinski wrote: > On Fri, 26 Aug 2022 09:26:57 +0300 Leon Romanovsky wrote: > > On Thu, Aug 25, 2022 at 02:36:10PM -0700, Jakub Kicinski wrote: > > > On Tue, 23 Aug 2022 16:31:57 +0300 Leon Romanovsky wrote: > > > > * I didn't hear any suggestion what term to use instead of > > > > "full offload", so left it as is. It is used in commit messages > > > > and documentation only and easy to rename. > > > > * Added performance data and background info to cover letter > > > > * Reused xfrm_output_resume() function to support multiple XFRM transformations > > > > * Add PMTU check in addition to driver .xdo_dev_offload_ok validation > > > > * Documentation is in progress, but not part of this series yet. > > > > > > Since the use case is somewhat in question, perhaps switch to RFC > > > postings until the drivers side incl. tc forwarding is implemented? <...> > > We also don't offload anything related to routing as we can't > > differentiate between local traffic. > > Yeah, nah, that's not what I'm asking for. > I said forwarding, not sending traffic thru a different virtual > interface. The TC rules must forward from or two the IPSec ifc. > > That was the use case Jason mentioned. I see, word "TC" confused me, sorry about that. My next mlx5-related task after this IPsec full offload will be accepted is to extend mlx5 with extra eswitch logic. There is no change in API, xfrm code or behavior, just internal change where IPsec flow steering tables will be created and how they will be created/destroyed. Unfortunately, this "just.." change is a complicated task due to mlx5 core internal implementation and will take time, but as I said, I will do it. > > > > Also the perf traces, I don't see them here. > > > > It is worth to separate it to standalone discussion with a title: > > "why crypto is not fast enough?". I don't think that mixed discussions > > about full offload which Steffen said that he is interested and > > research about crypto bottlenecks will be productive. These discussions > > are orthogonal. > > What do you mean by crypto bottlenecks? I think that I used same language all the time. * IPsec SW - software path * IPsec crypto - HW offload of crypto part * IPsec full offload - state and policy offloads to the HW. I will try to be more clear next time. > > Please use more precise language. crypto here may mean "crypto only > offload" or "crypto as done by CPU". I have no idea which one you mean. > > We are very much interested in the former, the latter is indeed out of > scope here.
On Tue, Aug 23, 2022 at 04:31:57PM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@nvidia.com> > > Changelog: > v3: > * I didn't hear any suggestion what term to use instead of > "full offload", so left it as is. It is used in commit messages > and documentation only and easy to rename. > * Added performance data and background info to cover letter > * Reused xfrm_output_resume() function to support multiple XFRM transformations > * Add PMTU check in addition to driver .xdo_dev_offload_ok validation > * Documentation is in progress, but not part of this series yet. > v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com > * Rebased to latest 6.0-rc1 > * Add an extra check in TX datapath patch to validate packets before > forwarding to HW. > * Added policy cleanup logic in case of netdev down event > v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com > * Moved comment to be before if (...) in third patch. > v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com > ----------------------------------------------------------------------- > > The following series extends XFRM core code to handle a new type of IPsec > offload - full offload. > > In this mode, the HW is going to be responsible for the whole data path, > so both policy and state should be offloaded. > > IPsec full offload is an improved version of IPsec crypto mode, > In full mode, HW is responsible to trim/add headers in addition to > decrypt/encrypt. In this mode, the packet arrives to the stack as already > decrypted and vice versa for TX (exits to HW as not-encrypted). > > Devices that implement IPsec full offload mode offload policies too. > In the RX path, it causes the situation that HW can't effectively > handle mixed SW and HW priorities unless users make sure that HW offloaded > policies have higher priorities. > > To make sure that users have a coherent picture, we require that > HW offloaded policies have always (both RX and TX) higher priorities > than SW ones. > > To not over-engineer the code, HW policies are treated as SW ones and > don't take into account netdev to allow reuse of the same priorities for > different devices. > > There are several deliberate limitations: > * No software fallback > * Fragments are dropped, both in RX and TX > * No sockets policies > * Only IPsec transport mode is implemented ... and you still have not answered the fundamental questions: As implemented, the software does not hold any state. I.e. there is no sync between hardware and software regarding stats, liftetime, lifebyte, packet counts and replay window. IKE rekeying and auditing is based on these, how should this be done? How can tunnel mode work with this offload? I want to see the full picture before I consider to apply this.
On Fri, Aug 26, 2022 at 09:26:57AM +0300, Leon Romanovsky wrote: > On Thu, Aug 25, 2022 at 02:36:10PM -0700, Jakub Kicinski wrote: > > On Tue, 23 Aug 2022 16:31:57 +0300 Leon Romanovsky wrote: > > > * I didn't hear any suggestion what term to use instead of > > > "full offload", so left it as is. It is used in commit messages > > > and documentation only and easy to rename. > > > * Added performance data and background info to cover letter > > > * Reused xfrm_output_resume() function to support multiple XFRM transformations > > > * Add PMTU check in addition to driver .xdo_dev_offload_ok validation > > > * Documentation is in progress, but not part of this series yet. > > > > Since the use case is somewhat in question, perhaps switch to RFC > > postings until the drivers side incl. tc forwarding is implemented? +1 Please mark it as RFC as long as the full picture is not yet clear. This is still far away from being ready for merging. > > > Also the perf traces, I don't see them here. > > It is worth to separate it to standalone discussion with a title: > "why crypto is not fast enough?". This is not the question. I want to know whether performance comes from encapsulation/decapsulation offload, or lookup offload, or something else. Please provide the traces. > I don't think that mixed discussions > about full offload which Steffen said that he is interested and > research about crypto bottlenecks will be productive. These discussions > are orthogonal. I'm interested in a 'full offload' but we need to make sure this 'full offload' is able to support most of the common usecases. It is still not clear whether this particular offload you are proposing is the one that can make it.
On Mon, Aug 29, 2022 at 09:54:03AM +0200, Steffen Klassert wrote: > On Tue, Aug 23, 2022 at 04:31:57PM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > Changelog: > > v3: > > * I didn't hear any suggestion what term to use instead of > > "full offload", so left it as is. It is used in commit messages > > and documentation only and easy to rename. > > * Added performance data and background info to cover letter > > * Reused xfrm_output_resume() function to support multiple XFRM transformations > > * Add PMTU check in addition to driver .xdo_dev_offload_ok validation > > * Documentation is in progress, but not part of this series yet. > > v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com > > * Rebased to latest 6.0-rc1 > > * Add an extra check in TX datapath patch to validate packets before > > forwarding to HW. > > * Added policy cleanup logic in case of netdev down event > > v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com > > * Moved comment to be before if (...) in third patch. > > v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com > > ----------------------------------------------------------------------- > > > > The following series extends XFRM core code to handle a new type of IPsec > > offload - full offload. > > > > In this mode, the HW is going to be responsible for the whole data path, > > so both policy and state should be offloaded. > > > > IPsec full offload is an improved version of IPsec crypto mode, > > In full mode, HW is responsible to trim/add headers in addition to > > decrypt/encrypt. In this mode, the packet arrives to the stack as already > > decrypted and vice versa for TX (exits to HW as not-encrypted). > > > > Devices that implement IPsec full offload mode offload policies too. > > In the RX path, it causes the situation that HW can't effectively > > handle mixed SW and HW priorities unless users make sure that HW offloaded > > policies have higher priorities. > > > > To make sure that users have a coherent picture, we require that > > HW offloaded policies have always (both RX and TX) higher priorities > > than SW ones. > > > > To not over-engineer the code, HW policies are treated as SW ones and > > don't take into account netdev to allow reuse of the same priorities for > > different devices. > > > > There are several deliberate limitations: > > * No software fallback > > * Fragments are dropped, both in RX and TX > > * No sockets policies > > * Only IPsec transport mode is implemented > > ... and you still have not answered the fundamental questions: > > As implemented, the software does not hold any state. > I.e. there is no sync between hardware and software > regarding stats, liftetime, lifebyte, packet counts > and replay window. IKE rekeying and auditing is based > on these, how should this be done? > > How can tunnel mode work with this offload? > > I want to see the full picture before I consider to > apply this. Hi, Unfortunately, due to vacation time, it takes more time to get internal answers than I anticipated. I will continue to collect the needed data and add everything to cover letter/documentation when respin. Thanks
On Fri, Aug 26, 2022 at 04:45:22PM -0700, Jakub Kicinski wrote: > > # Add TC rules > > tc filter add dev $PF0 parent ffff: protocol 802.1q chain 0 flower vlan_id 10 vlan_ethtype 802.1q cvlan_id 5 action vlan pop action vlan pop action mirred egress redirect dev $VF0_REP > > tc filter add dev $VF0_REP parent ffff: protocol all chain 0 flower action vlan push protocol 802.1q id 5 action vlan push protocol 802.1q id 10 action mirred egress redirect dev $PF0 > > tc filter show dev $PF0 ingress > > ---------------------------------------------------------------------------------------------------- > > > > We also don't offload anything related to routing as we can't > > differentiate between local traffic. > > Yeah, nah, that's not what I'm asking for. > I said forwarding, not sending traffic thru a different virtual > interface. The TC rules must forward from or two the IPSec ifc. > > That was the use case Jason mentioned. I was meaning rather generically handling the packets in the hypervisor side without involving the CPU. We have customers deploying many different models for this in their hypervisor, including a significant deployment using a model like the above. It achieves a kind of connectivity to a VM with 0 hypervisor CPU involvement with the vlan push/pop done in HW. We other use-models, like the offloaded OVS switching model you are alluding to, that is Leon has as a followup. Jason
On Tue, 30 Aug 2022 14:36:52 -0300 Jason Gunthorpe wrote: > I was meaning rather generically handling the packets in the > hypervisor side without involving the CPU. > > We have customers deploying many different models for this in their > hypervisor, including a significant deployment using a model like the > above. > > It achieves a kind of connectivity to a VM with 0 hypervisor CPU > involvement with the vlan push/pop done in HW. I don't see how that necessitates the full IPsec offload. Sorry, perhaps I assumed we were on the same page too hastily. I was referring to a model where the TC rules redirect to a IPsec tunnel which is then routed to the uplink. All offloaded. Like we do today for VxLAN encap/decap. That necessitates full offload, and therefore can be used as a strong argument for having a full offload in netdev. > We other use-models, like the offloaded OVS switching model you are > alluding to, that is Leon has as a followup.
On Tue, Aug 30, 2022 at 11:56:27AM -0700, Jakub Kicinski wrote: > On Tue, 30 Aug 2022 14:36:52 -0300 Jason Gunthorpe wrote: > > I was meaning rather generically handling the packets in the > > hypervisor side without involving the CPU. > > > > We have customers deploying many different models for this in their > > hypervisor, including a significant deployment using a model like the > > above. > > > > It achieves a kind of connectivity to a VM with 0 hypervisor CPU > > involvement with the vlan push/pop done in HW. > > I don't see how that necessitates the full IPsec offload. I think it would help if Leon showed the xfrm commands in his configuration snippet and explained the traffic flow that is achieved by it. He has a configuration that is as I described, 0 hypervisor CPU involvement, IPSEC, and VMs. > Sorry, perhaps I assumed we were on the same page too hastily. Well, I think we all agree that if a hypervisor environment is controlling the IPSEC then we'd like a path similar to others where the hypervisor doesn't process the packets in the CPU - it just flows in the HW right out of the VM. It seems obvious, but this means that the hypervisor controls HW that does the crypto, the anti-replay, the counting and limiting "autonomously" from the hypervisor CPU. (the so-called "full offload") > I was referring to a model where the TC rules redirect to a IPsec > tunnel which is then routed to the uplink. All offloaded. Like > we do today for VxLAN encap/decap. I think there are several models with TC that can do this. It is probably worth talking about exactly which models should and need to have it. eg what exactly do you mean when you say "TC rules redirect to a IPsec tunnel" ? How does a TC rule redirect to an xfrm? Jason
On Mon, Aug 29, 2022 at 09:54:03AM +0200, Steffen Klassert wrote: > On Tue, Aug 23, 2022 at 04:31:57PM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > Changelog: > > v3: > > * I didn't hear any suggestion what term to use instead of > > "full offload", so left it as is. It is used in commit messages > > and documentation only and easy to rename. > > * Added performance data and background info to cover letter > > * Reused xfrm_output_resume() function to support multiple XFRM transformations > > * Add PMTU check in addition to driver .xdo_dev_offload_ok validation > > * Documentation is in progress, but not part of this series yet. > > v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com > > * Rebased to latest 6.0-rc1 > > * Add an extra check in TX datapath patch to validate packets before > > forwarding to HW. > > * Added policy cleanup logic in case of netdev down event > > v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com > > * Moved comment to be before if (...) in third patch. > > v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com > > ----------------------------------------------------------------------- > > > > The following series extends XFRM core code to handle a new type of IPsec > > offload - full offload. > > > > In this mode, the HW is going to be responsible for the whole data path, > > so both policy and state should be offloaded. > > > > IPsec full offload is an improved version of IPsec crypto mode, > > In full mode, HW is responsible to trim/add headers in addition to > > decrypt/encrypt. In this mode, the packet arrives to the stack as already > > decrypted and vice versa for TX (exits to HW as not-encrypted). > > > > Devices that implement IPsec full offload mode offload policies too. > > In the RX path, it causes the situation that HW can't effectively > > handle mixed SW and HW priorities unless users make sure that HW offloaded > > policies have higher priorities. > > > > To make sure that users have a coherent picture, we require that > > HW offloaded policies have always (both RX and TX) higher priorities > > than SW ones. > > > > To not over-engineer the code, HW policies are treated as SW ones and > > don't take into account netdev to allow reuse of the same priorities for > > different devices. > > > > There are several deliberate limitations: > > * No software fallback > > * Fragments are dropped, both in RX and TX > > * No sockets policies > > * Only IPsec transport mode is implemented > > ... and you still have not answered the fundamental questions: > > As implemented, the software does not hold any state. > I.e. there is no sync between hardware and software > regarding stats, liftetime, lifebyte, packet counts > and replay window. IKE rekeying and auditing is based > on these, how should this be done? I hope that the patch added in v4 clarifies it. There is a sync between HW and core in regarding of packet counts. The HW generates event and calls to xfrm_state_check_expire() to make sure that already existing logic will do rekeying. The replay window will be handled in similar way. HW will generate an event. > > How can tunnel mode work with this offload? I don't see any issues here. Same rules will apply here. > > I want to see the full picture before I consider to > apply this.
From: Leon Romanovsky <leonro@nvidia.com> Changelog: v3: * I didn't hear any suggestion what term to use instead of "full offload", so left it as is. It is used in commit messages and documentation only and easy to rename. * Added performance data and background info to cover letter * Reused xfrm_output_resume() function to support multiple XFRM transformations * Add PMTU check in addition to driver .xdo_dev_offload_ok validation * Documentation is in progress, but not part of this series yet. v2: https://lore.kernel.org/all/cover.1660639789.git.leonro@nvidia.com * Rebased to latest 6.0-rc1 * Add an extra check in TX datapath patch to validate packets before forwarding to HW. * Added policy cleanup logic in case of netdev down event v1: https://lore.kernel.org/all/cover.1652851393.git.leonro@nvidia.com * Moved comment to be before if (...) in third patch. v0: https://lore.kernel.org/all/cover.1652176932.git.leonro@nvidia.com ----------------------------------------------------------------------- The following series extends XFRM core code to handle a new type of IPsec offload - full offload. In this mode, the HW is going to be responsible for the whole data path, so both policy and state should be offloaded. IPsec full offload is an improved version of IPsec crypto mode, In full mode, HW is responsible to trim/add headers in addition to decrypt/encrypt. In this mode, the packet arrives to the stack as already decrypted and vice versa for TX (exits to HW as not-encrypted). Devices that implement IPsec full offload mode offload policies too. In the RX path, it causes the situation that HW can't effectively handle mixed SW and HW priorities unless users make sure that HW offloaded policies have higher priorities. To make sure that users have a coherent picture, we require that HW offloaded policies have always (both RX and TX) higher priorities than SW ones. To not over-engineer the code, HW policies are treated as SW ones and don't take into account netdev to allow reuse of the same priorities for different devices. There are several deliberate limitations: * No software fallback * Fragments are dropped, both in RX and TX * No sockets policies * Only IPsec transport mode is implemented ================================================================================ Configuration: iproute2: https://lore.kernel.org/netdev/cover.1652179360.git.leonro@nvidia.com/ Full offload mode: ip xfrm state offload full dev <if-name> dir <in|out> ip xfrm policy .... offload full dev <if-name> Crypto offload mode: ip xfrm state offload crypto dev <if-name> dir <in|out> or (backward compatibility) ip xfrm state offload dev <if-name> dir <in|out> ================================================================================ Performance results: TCP multi-stream, using iperf3 instance per-CPU. +----------------------+--------+--------+--------+--------+---------+---------+ | | 1 CPU | 2 CPUs | 4 CPUs | 8 CPUs | 16 CPUs | 32 CPUs | | +--------+--------+--------+--------+---------+---------+ | | BW (Gbps) | +----------------------+--------+--------+-------+---------+---------+---------+ | Baseline | 27.9 | 59 | 93.1 | 92.8 | 93.7 | 94.4 | +----------------------+--------+--------+-------+---------+---------+---------+ | Software IPsec | 6 | 11.9 | 23.3 | 45.9 | 83.8 | 91.8 | +----------------------+--------+--------+-------+---------+---------+---------+ | IPsec crypto offload | 15 | 29.7 | 58.5 | 89.6 | 90.4 | 90.8 | +----------------------+--------+--------+-------+---------+---------+---------+ | IPsec full offload | 28 | 57 | 90.7 | 91 | 91.3 | 91.9 | +----------------------+--------+--------+-------+---------+---------+---------+ IPsec full offload mode behaves as baseline and reaches linerate with same amount of CPUs. Setups details (similar for both sides): * NIC: ConnectX6-DX dual port, 100 Gbps each. Single port used in the tests. * CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz ================================================================================ Series together with mlx5 part: https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=xfrm-next Thanks Leon Romanovsky (6): xfrm: add new full offload flag xfrm: allow state full offload mode xfrm: add an interface to offload policy xfrm: add TX datapath support for IPsec full offload mode xfrm: add RX datapath protection for IPsec full offload mode xfrm: enforce separation between priorities of HW/SW policies .../inline_crypto/ch_ipsec/chcr_ipsec.c | 4 + .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 5 + drivers/net/ethernet/intel/ixgbevf/ipsec.c | 5 + .../mellanox/mlx5/core/en_accel/ipsec.c | 4 + drivers/net/netdevsim/ipsec.c | 5 + include/linux/netdevice.h | 3 + include/net/netns/xfrm.h | 8 +- include/net/xfrm.h | 104 +++++++--- include/uapi/linux/xfrm.h | 6 + net/xfrm/xfrm_device.c | 102 +++++++++- net/xfrm/xfrm_output.c | 12 +- net/xfrm/xfrm_policy.c | 181 ++++++++++++++++++ net/xfrm/xfrm_user.c | 19 ++ 13 files changed, 425 insertions(+), 33 deletions(-)