diff mbox series

xfrm: add SA information to the offloaded packet

Message ID 20240822200252.472298-1-wangfe@google.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series xfrm: add SA information to the offloaded packet | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 4 maintainers not CCed: pabeni@redhat.com kuba@kernel.org edumazet@google.com herbert@gondor.apana.org.au
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 17 this patch: 17
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 33 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-25--21-00 (tests: 714)

Commit Message

Feng Wang Aug. 22, 2024, 8:02 p.m. UTC
From: wangfe <wangfe@google.com>

In packet offload mode, append Security Association (SA) information
to each packet, replicating the crypto offload implementation.
The XFRM_XMIT flag is set to enable packet to be returned immediately
from the validate_xmit_xfrm function, thus aligning with the existing
code path for packet offload mode.

This SA info helps HW offload match packets to their correct security
policies. The XFRM interface ID is included, which is crucial in setups
with multiple XFRM interfaces where source/destination addresses alone
can't pinpoint the right policy.

Signed-off-by: wangfe <wangfe@google.com>
---
v2:
  - Add why HW offload requires the SA info to the commit message
v1: https://lore.kernel.org/all/20240812182317.1962756-1-wangfe@google.com/
---
 net/xfrm/xfrm_output.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Steffen Klassert Aug. 28, 2024, 5:32 a.m. UTC | #1
On Thu, Aug 22, 2024 at 01:02:52PM -0700, Feng Wang wrote:
> From: wangfe <wangfe@google.com>
> 
> In packet offload mode, append Security Association (SA) information
> to each packet, replicating the crypto offload implementation.
> The XFRM_XMIT flag is set to enable packet to be returned immediately
> from the validate_xmit_xfrm function, thus aligning with the existing
> code path for packet offload mode.
> 
> This SA info helps HW offload match packets to their correct security
> policies. The XFRM interface ID is included, which is crucial in setups
> with multiple XFRM interfaces where source/destination addresses alone
> can't pinpoint the right policy.
> 
> Signed-off-by: wangfe <wangfe@google.com>

Applied to ipsec-next, thanks Feng!
Leon Romanovsky Aug. 28, 2024, 11:26 a.m. UTC | #2
On Wed, Aug 28, 2024 at 07:32:47AM +0200, Steffen Klassert wrote:
> On Thu, Aug 22, 2024 at 01:02:52PM -0700, Feng Wang wrote:
> > From: wangfe <wangfe@google.com>
> > 
> > In packet offload mode, append Security Association (SA) information
> > to each packet, replicating the crypto offload implementation.
> > The XFRM_XMIT flag is set to enable packet to be returned immediately
> > from the validate_xmit_xfrm function, thus aligning with the existing
> > code path for packet offload mode.
> > 
> > This SA info helps HW offload match packets to their correct security
> > policies. The XFRM interface ID is included, which is crucial in setups
> > with multiple XFRM interfaces where source/destination addresses alone
> > can't pinpoint the right policy.
> > 
> > Signed-off-by: wangfe <wangfe@google.com>
> 
> Applied to ipsec-next, thanks Feng!

Stephen, can you please explain why do you think that this is correct
thing to do?

There are no in-tree any drivers which is using this information, and it
is unclear to me how state is released and it has controversial code
around validity of xfrm_offload() too.

For example:
+		sp->olen++;
+		sp->xvec[sp->len++] = x;
+		xfrm_state_hold(x);
+
+		xo = xfrm_offload(skb);
+		if (!xo) { <--- previous code handled this case perfectly in validate_xmit_xfrm
+			secpath_reset(skb);
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
+			kfree_skb(skb);
+			return -EINVAL; <--- xfrm state leak
+		}


Can you please revert/drop this patch for now?

Thanks
Feng Wang Aug. 28, 2024, 9:25 p.m. UTC | #3
Hi Leon,

Thank you for your insightful questions and comments.

Just like in crypto offload mode, now pass SA (Security Association)
information to the driver in packet offload mode. This helps the
driver quickly identify the packet's source and destination, rather
than having to parse the packet itself. The xfrm interface ID is
especially useful here. As you can see in the kernel code
(https://elixir.bootlin.com/linux/v6.10/source/net/xfrm/xfrm_policy.c#L1993),
it's used to differentiate between various policies in complex network
setups.

During my testing of packet offload mode without the patch, I observed
that the sec_path was NULL. Consequently, xo was also NULL when
validate_xmit_xfrm was called, leading to an immediate return at line
125 (https://elixir.bootlin.com/linux/v6.10/source/net/xfrm/xfrm_device.c#L125).

Regarding the potential xfrm_state leak, upon further investigation,
it may not be the case. It seems that both secpath_reset and kfree_skb
invoke the xfrm_state_put function, which should be responsible for
releasing the state. The call flow appears to be as follows: kfree_skb
-> skb_release_all -> skb_release_head_state -> skb_ext_put ->
skb_ext_put_sp -> xfrm_state_put.

Please let me know if you have any further questions or concerns. I
appreciate your valuable feedback!

Feng

On Wed, Aug 28, 2024 at 4:26 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Wed, Aug 28, 2024 at 07:32:47AM +0200, Steffen Klassert wrote:
> > On Thu, Aug 22, 2024 at 01:02:52PM -0700, Feng Wang wrote:
> > > From: wangfe <wangfe@google.com>
> > >
> > > In packet offload mode, append Security Association (SA) information
> > > to each packet, replicating the crypto offload implementation.
> > > The XFRM_XMIT flag is set to enable packet to be returned immediately
> > > from the validate_xmit_xfrm function, thus aligning with the existing
> > > code path for packet offload mode.
> > >
> > > This SA info helps HW offload match packets to their correct security
> > > policies. The XFRM interface ID is included, which is crucial in setups
> > > with multiple XFRM interfaces where source/destination addresses alone
> > > can't pinpoint the right policy.
> > >
> > > Signed-off-by: wangfe <wangfe@google.com>
> >
> > Applied to ipsec-next, thanks Feng!
>
> Stephen, can you please explain why do you think that this is correct
> thing to do?
>
> There are no in-tree any drivers which is using this information, and it
> is unclear to me how state is released and it has controversial code
> around validity of xfrm_offload() too.
>
> For example:
> +               sp->olen++;
> +               sp->xvec[sp->len++] = x;
> +               xfrm_state_hold(x);
> +
> +               xo = xfrm_offload(skb);
> +               if (!xo) { <--- previous code handled this case perfectly in validate_xmit_xfrm
> +                       secpath_reset(skb);
> +                       XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
> +                       kfree_skb(skb);
> +                       return -EINVAL; <--- xfrm state leak
> +               }
>
>
> Can you please revert/drop this patch for now?
>
> Thanks
Leon Romanovsky Aug. 29, 2024, 10:38 a.m. UTC | #4
On Wed, Aug 28, 2024 at 02:25:22PM -0700, Feng Wang wrote:
> Hi Leon,

Please don't top-post your replies when you are replying to a mailing
list. It makes it hard to follow the conversation.

> 
> Thank you for your insightful questions and comments.
> 
> Just like in crypto offload mode, now pass SA (Security Association)
> information to the driver in packet offload mode. This helps the
> driver quickly identify the packet's source and destination, rather
> than having to parse the packet itself. The xfrm interface ID is
> especially useful here. As you can see in the kernel code
> (https://elixir.bootlin.com/linux/v6.10/source/net/xfrm/xfrm_policy.c#L1993),
> it's used to differentiate between various policies in complex network
> setups.

Which in-kernel driver use this information?

> 
> During my testing of packet offload mode without the patch, I observed
> that the sec_path was NULL. Consequently, xo was also NULL when
> validate_xmit_xfrm was called, leading to an immediate return at line
> 125 (https://elixir.bootlin.com/linux/v6.10/source/net/xfrm/xfrm_device.c#L125).

It is intentional, because the packet is forwarded to HW as plain text
and it is not offloaded (doesn't have xfrm_offload()).

> 
> Regarding the potential xfrm_state leak, upon further investigation,
> it may not be the case. It seems that both secpath_reset and kfree_skb
> invoke the xfrm_state_put function, which should be responsible for
> releasing the state. The call flow appears to be as follows: kfree_skb
> -> skb_release_all -> skb_release_head_state -> skb_ext_put ->
> skb_ext_put_sp -> xfrm_state_put.

You are trying to make same flow as for crypto, but it is not the same,
in crypto case secpath_reset() was called to release SKB extensions and
perform cleanup, first and only after that new xfrm_state_hold() was
called, but in new code SKB is not reset.

Thanks

> 
> Please let me know if you have any further questions or concerns. I
> appreciate your valuable feedback!
> 
> Feng
> 
> On Wed, Aug 28, 2024 at 4:26 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Wed, Aug 28, 2024 at 07:32:47AM +0200, Steffen Klassert wrote:
> > > On Thu, Aug 22, 2024 at 01:02:52PM -0700, Feng Wang wrote:
> > > > From: wangfe <wangfe@google.com>
> > > >
> > > > In packet offload mode, append Security Association (SA) information
> > > > to each packet, replicating the crypto offload implementation.
> > > > The XFRM_XMIT flag is set to enable packet to be returned immediately
> > > > from the validate_xmit_xfrm function, thus aligning with the existing
> > > > code path for packet offload mode.
> > > >
> > > > This SA info helps HW offload match packets to their correct security
> > > > policies. The XFRM interface ID is included, which is crucial in setups
> > > > with multiple XFRM interfaces where source/destination addresses alone
> > > > can't pinpoint the right policy.
> > > >
> > > > Signed-off-by: wangfe <wangfe@google.com>
> > >
> > > Applied to ipsec-next, thanks Feng!
> >
> > Stephen, can you please explain why do you think that this is correct
> > thing to do?
> >
> > There are no in-tree any drivers which is using this information, and it
> > is unclear to me how state is released and it has controversial code
> > around validity of xfrm_offload() too.
> >
> > For example:
> > +               sp->olen++;
> > +               sp->xvec[sp->len++] = x;
> > +               xfrm_state_hold(x);
> > +
> > +               xo = xfrm_offload(skb);
> > +               if (!xo) { <--- previous code handled this case perfectly in validate_xmit_xfrm
> > +                       secpath_reset(skb);
> > +                       XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
> > +                       kfree_skb(skb);
> > +                       return -EINVAL; <--- xfrm state leak
> > +               }
> >
> >
> > Can you please revert/drop this patch for now?
> >
> > Thanks
Feng Wang Aug. 29, 2024, 9:19 p.m. UTC | #5
Hi Leon,

Thank you again for your thoughtful questions and comments. I'd like
to provide further clarification and address your points:

SA Information Usage:

There are several instances in the kernel code where it's used, such
as in esp4(6)_offload.c and xfrm.c. This clearly demonstrates how SA
information is used. Moreover, passing this information to the driver
shouldn't negatively impact those drivers that don't require it.
Regarding a driver example, the function mlx5e_ipsec_feature_check
caught my attention.
https://elixir.bootlin.com/linux/v6.10/source/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h#L89)
As you're more familiar with this codebase, I defer to your expertise
on whether it's an appropriate sample. However, the crucial point is
that including this information empowers certain drivers to leverage
it without affecting those that don't need it.

validate_xmit_xfrm Function:
My primary goal in discussing the validate_xmit_xfrm function is to
assure you that my patch maintains the existing packet offload code
flow, avoiding any unintended disruption.

State Release:
I've noticed that secpath_reset() is called before xfrm_output(). The
sequence seems to be: xfrmi_xmit2 -> xfrmi_scrub_packet ->
secpath_reset(), followed by xfrmi_xmit2 calling dst_output, which is
essentially xfrm_output().
I'm also open to moving the xfrm_state_hold(x) after the if (!xo)
check block. This would ensure the state is held only when everything
is ok. I'll gladly make this adjustment if you believe it's the better
option.

Thank you once again for your valuable insights and collaboration.
Your feedback is greatly appreciated!

Feng
Leon Romanovsky Aug. 30, 2024, 2:30 p.m. UTC | #6
On Thu, Aug 29, 2024 at 02:19:25PM -0700, Feng Wang wrote:
> Hi Leon,
> 
> Thank you again for your thoughtful questions and comments. I'd like
> to provide further clarification and address your points:
> 
> SA Information Usage:
> 
> There are several instances in the kernel code where it's used, such
> as in esp4(6)_offload.c and xfrm.c. This clearly demonstrates how SA
> information is used. Moreover, passing this information to the driver
> shouldn't negatively impact those drivers that don't require it.
> Regarding a driver example, the function mlx5e_ipsec_feature_check
> caught my attention.
> https://elixir.bootlin.com/linux/v6.10/source/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.h#L89)
> As you're more familiar with this codebase, I defer to your expertise
> on whether it's an appropriate sample. 

This function is not involved when packet is going to be offloaded for "IPsec packet offload".

> However, the crucial point is that including this information empowers certain drivers to leverage
> it without affecting those that don't need it.

Can you please provide a list of drivers that will benefit from this change?
Can you give a complete flow (including driver) which didn't work before
and will work after this change?

> 
> validate_xmit_xfrm Function:
> My primary goal in discussing the validate_xmit_xfrm function is to
> assure you that my patch maintains the existing packet offload code
> flow, avoiding any unintended disruption.

The whole idea of packet offload is to skip everything in XFRM stack and
present packet as plain text.

> 
> State Release:
> I've noticed that secpath_reset() is called before xfrm_output(). The
> sequence seems to be: xfrmi_xmit2 -> xfrmi_scrub_packet ->
> secpath_reset(), followed by xfrmi_xmit2 calling dst_output, which is
> essentially xfrm_output().
> I'm also open to moving the xfrm_state_hold(x) after the if (!xo)
> check block. This would ensure the state is held only when everything
> is ok. I'll gladly make this adjustment if you believe it's the better
> option.
> 
> Thank you once again for your valuable insights and collaboration.
> Your feedback is greatly appreciated!
> 
> Feng
>
Feng Wang Aug. 31, 2024, 12:27 a.m. UTC | #7
Hi Leon,

I believe you are right about the mlx5e_ipsec_feature_check function.
And it shows that the driver can indeed make use of the SA
information. Similarly, in packet offload mode, drivers can
potentially leverage this information for their own purposes. The
patch is designed to be non-intrusive, so drivers that don't utilize
this information won't be affected in any way.

I'm also curious about why the mlx driver doesn't seem to use the XFRM
interface ID in the same way that xfrm_policy_match() does.
https://elixir.bootlin.com/linux/v6.10.7/source/net/xfrm/xfrm_policy.c#L1993
This ID is critical in scenarios with multiple IPsec tunnels, where
source and destination addresses alone might not be sufficient to
identify the correct security policy. Perhaps there's a specific
reason or design choice behind this in the mlx driver?

Thank you once again for your valuable insights and collaboration.

Feng
Leon Romanovsky Aug. 31, 2024, 5:36 p.m. UTC | #8
On Fri, Aug 30, 2024 at 05:27:29PM -0700, Feng Wang wrote:
> Hi Leon,
> 
> I believe you are right about the mlx5e_ipsec_feature_check function.
> And it shows that the driver can indeed make use of the SA
> information. Similarly, in packet offload mode, drivers can
> potentially leverage this information for their own purposes. The
> patch is designed to be non-intrusive, so drivers that don't utilize
> this information won't be affected in any way.

I asked about examples of such drivers. Can you please provide them?

> 
> I'm also curious about why the mlx driver doesn't seem to use the XFRM
> interface ID in the same way that xfrm_policy_match() does.
> https://elixir.bootlin.com/linux/v6.10.7/source/net/xfrm/xfrm_policy.c#L1993a

HW offload is always last in packet TX traversal and it means that if HW
catches that packet and it meets the HW offload requirements, it will be
encrypted. The main idea is that routing (sending to right if_id) is handled
by the upper layers and HW offload is just a final step.

> This ID is critical in scenarios with multiple IPsec tunnels, where
> source and destination addresses alone might not be sufficient to
> identify the correct security policy. Perhaps there's a specific
> reason or design choice behind this in the mlx driver?

It is not specific to mlx5, but to all HW offload drivers. They should
implement both policy and SA offloading. It is violation of current mailing
list deign to do not offload policy. If you offload both policy and SA, you
won't need if_id at all.

> 
> Thank you once again for your valuable insights and collaboration.
> 
> Feng
Leon Romanovsky Aug. 31, 2024, 5:39 p.m. UTC | #9
On Wed, Aug 28, 2024 at 07:32:47AM +0200, Steffen Klassert wrote:
> On Thu, Aug 22, 2024 at 01:02:52PM -0700, Feng Wang wrote:
> > From: wangfe <wangfe@google.com>
> > 
> > In packet offload mode, append Security Association (SA) information
> > to each packet, replicating the crypto offload implementation.
> > The XFRM_XMIT flag is set to enable packet to be returned immediately
> > from the validate_xmit_xfrm function, thus aligning with the existing
> > code path for packet offload mode.
> > 
> > This SA info helps HW offload match packets to their correct security
> > policies. The XFRM interface ID is included, which is crucial in setups
> > with multiple XFRM interfaces where source/destination addresses alone
> > can't pinpoint the right policy.
> > 
> > Signed-off-by: wangfe <wangfe@google.com>
> 
> Applied to ipsec-next, thanks Feng!

Steffen,

What is your position on this patch?
It is the same patch (logically) as the one that was rejected before?
https://lore.kernel.org/all/ZfpnCIv+8eYd7CpO@gauss3.secunet.de/

Thanks
Steffen Klassert Sept. 2, 2024, 7:44 a.m. UTC | #10
Sorry for the delay. I'm on vacation, so responses will take a bit
longer during the next two weeks.

On Sat, Aug 31, 2024 at 08:39:34PM +0300, Leon Romanovsky wrote:
> On Wed, Aug 28, 2024 at 07:32:47AM +0200, Steffen Klassert wrote:
> > On Thu, Aug 22, 2024 at 01:02:52PM -0700, Feng Wang wrote:
> > > From: wangfe <wangfe@google.com>
> > > 
> > > In packet offload mode, append Security Association (SA) information
> > > to each packet, replicating the crypto offload implementation.
> > > The XFRM_XMIT flag is set to enable packet to be returned immediately
> > > from the validate_xmit_xfrm function, thus aligning with the existing
> > > code path for packet offload mode.
> > > 
> > > This SA info helps HW offload match packets to their correct security
> > > policies. The XFRM interface ID is included, which is crucial in setups
> > > with multiple XFRM interfaces where source/destination addresses alone
> > > can't pinpoint the right policy.
> > > 
> > > Signed-off-by: wangfe <wangfe@google.com>
> > 
> > Applied to ipsec-next, thanks Feng!
> 
> Steffen,
> 
> What is your position on this patch?
> It is the same patch (logically) as the one that was rejected before?
> https://lore.kernel.org/all/ZfpnCIv+8eYd7CpO@gauss3.secunet.de/

This is an infrastructure patch to support routing based IPsec
with xfrm interfaces. I just did not notice it because it was not
mentioned in the commit message of the first patchset. This should have
been included into the packet offload API patchset, but I overlooked
that xfrm interfaces can't work with packet offload mode. The stack
infrastructure should be complete, so that drivers can implement
that without the need to fix the stack before.

In case the patch has issues, we should fix it.
Steffen Klassert Sept. 2, 2024, 7:47 a.m. UTC | #11
On Thu, Aug 22, 2024 at 01:02:52PM -0700, Feng Wang wrote:
> From: wangfe <wangfe@google.com>
> 
> In packet offload mode, append Security Association (SA) information
> to each packet, replicating the crypto offload implementation.
> The XFRM_XMIT flag is set to enable packet to be returned immediately
> from the validate_xmit_xfrm function, thus aligning with the existing
> code path for packet offload mode.
> 
> This SA info helps HW offload match packets to their correct security
> policies. The XFRM interface ID is included, which is crucial in setups
> with multiple XFRM interfaces where source/destination addresses alone
> can't pinpoint the right policy.
> 
> Signed-off-by: wangfe <wangfe@google.com>
> ---
> v2:
>   - Add why HW offload requires the SA info to the commit message
> v1: https://lore.kernel.org/all/20240812182317.1962756-1-wangfe@google.com/
> ---
>  net/xfrm/xfrm_output.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
> index e5722c95b8bb..a12588e7b060 100644
> --- a/net/xfrm/xfrm_output.c
> +++ b/net/xfrm/xfrm_output.c
> @@ -706,6 +706,8 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
>  	struct xfrm_state *x = skb_dst(skb)->xfrm;
>  	int family;
>  	int err;
> +	struct xfrm_offload *xo;
> +	struct sec_path *sp;
>  
>  	family = (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) ? x->outer_mode.family
>  		: skb_dst(skb)->ops->family;
> @@ -728,6 +730,25 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
>  			kfree_skb(skb);
>  			return -EHOSTUNREACH;
>  		}
> +		sp = secpath_set(skb);

Maybe we should do that only if if we really use xfrm interfaces,
i.e. if xfrm if_id is not zero.
Leon Romanovsky Sept. 2, 2024, 9:44 a.m. UTC | #12
On Mon, Sep 02, 2024 at 09:44:24AM +0200, Steffen Klassert wrote:
> Sorry for the delay. I'm on vacation, so responses will take a bit
> longer during the next two weeks.
> 
> On Sat, Aug 31, 2024 at 08:39:34PM +0300, Leon Romanovsky wrote:
> > On Wed, Aug 28, 2024 at 07:32:47AM +0200, Steffen Klassert wrote:
> > > On Thu, Aug 22, 2024 at 01:02:52PM -0700, Feng Wang wrote:
> > > > From: wangfe <wangfe@google.com>
> > > > 
> > > > In packet offload mode, append Security Association (SA) information
> > > > to each packet, replicating the crypto offload implementation.
> > > > The XFRM_XMIT flag is set to enable packet to be returned immediately
> > > > from the validate_xmit_xfrm function, thus aligning with the existing
> > > > code path for packet offload mode.
> > > > 
> > > > This SA info helps HW offload match packets to their correct security
> > > > policies. The XFRM interface ID is included, which is crucial in setups
> > > > with multiple XFRM interfaces where source/destination addresses alone
> > > > can't pinpoint the right policy.
> > > > 
> > > > Signed-off-by: wangfe <wangfe@google.com>
> > > 
> > > Applied to ipsec-next, thanks Feng!
> > 
> > Steffen,
> > 
> > What is your position on this patch?
> > It is the same patch (logically) as the one that was rejected before?
> > https://lore.kernel.org/all/ZfpnCIv+8eYd7CpO@gauss3.secunet.de/
> 
> This is an infrastructure patch to support routing based IPsec
> with xfrm interfaces. I just did not notice it because it was not
> mentioned in the commit message of the first patchset. This should have
> been included into the packet offload API patchset, but I overlooked
> that xfrm interfaces can't work with packet offload mode. The stack
> infrastructure should be complete, so that drivers can implement
> that without the need to fix the stack before.

Core implementation that is not used by any upstream code is rarely
right thing to do. It is not tested, complicates the code and mostly
overlooked when patches are reviewed. The better way will be to extend
the stack when this feature will be actually used and needed.

IMHO, attempt to enrich core code without showing users of this new flow
is comparable to premature optimization.

And Feng more than once said that this code is for some out-of-tree
driver.

> 
> In case the patch has issues, we should fix it.

Yes, this patch doesn't have in-kernel users.

Thanks
Feng Wang Sept. 3, 2024, 6:19 p.m. UTC | #13
This patch simply assigns a value to a field, replicating existing
crypto offload behavior - it's working/tested in that mode. Many
instances within the kernel code utilize this information in different
cases, making the implementation pretty simple and safe.

Hi Leon,

"It is not specific to mlx5, but to all HW offload drivers. They should
implement both policy and SA offloading. It is violation of current mailing
list deign to do not offload policy. If you offload both policy and SA, you
won't need if_id at all."

Could you please clarify why the if_id is unnecessary in scenarios
with hardware offload?

For instance, imagine I have two tunnel sessions sharing the same
source and destination addresses. One tunnel utilizes xfrm ID 1, while
the other uses xfrm ID 2. If a packet is sent out via xfrm ID 1 and
lacks any specific markings, how does the hardware offload determine
that this packet belongs to xfrm ID 1 and not xfrm ID 2? This
distinction is crucial for the hardware to locate the correct
encryption information and encrypt the packet accordingly.

Thanks for your help.

Feng
Leon Romanovsky Sept. 3, 2024, 7:04 p.m. UTC | #14
On Tue, Sep 03, 2024 at 11:19:41AM -0700, Feng Wang wrote:
> This patch simply assigns a value to a field, replicating existing
> crypto offload behavior - it's working/tested in that mode. Many
> instances within the kernel code utilize this information in different
> cases, making the implementation pretty simple and safe.

Not really, the thing that you are adding secpath in the place which
wasn't designed to have it, is very problematic. Crypto and packet
offloads are two different things as they treat packets differently.
First one sees them as ESP packets, while the second one sees them as
plain text packets.

> 
> Hi Leon,
> 
> "It is not specific to mlx5, but to all HW offload drivers. They should
> implement both policy and SA offloading. It is violation of current mailing
> list deign to do not offload policy. If you offload both policy and SA, you
> won't need if_id at all."
> 
> Could you please clarify why the if_id is unnecessary in scenarios
> with hardware offload?

I have no objections to support xfrm IDs in the packet offload, my
request is to have upstream driver which uses this feature.

> 
> For instance, imagine I have two tunnel sessions sharing the same
> source and destination addresses. One tunnel utilizes xfrm ID 1, while
> the other uses xfrm ID 2. If a packet is sent out via xfrm ID 1 and
> lacks any specific markings, how does the hardware offload determine
> that this packet belongs to xfrm ID 1 and not xfrm ID 2? This
> distinction is crucial for the hardware to locate the correct
> encryption information and encrypt the packet accordingly.

HW doesn't know about xfrm ID, as this information is kernel specific.
In order HW will use this information someone needs to pass it to HW,
while configuring SA/policy and nothing in the kernel does that.

Thanks

> 
> Thanks for your help.
> 
> Feng
Feng Wang Sept. 4, 2024, 5:41 p.m. UTC | #15
Hi Leon,

I'm looking at the MLX5 driver to understand how the SA information is
used. In mlx5e_ipsec_handle_tx_skb(), it appears we might leverage the
current MLX5 implementation to verify the xfrm id.
https://elixir.bootlin.com/linux/v6.10/source/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c#L271

During the mlx5e_xfrm_add_state() function, the xfrm ID (x->if_id) is
passed to the driver along with the associated xfrm_state pointer.
Therefore, by checking the if_id within the skb tx function like
mlx5e_ipsec_handle_tx_skb(), we should be able to demonstrate the use
case effectively.

What’s your opinion?

Thanks for your help.

Feng
Leon Romanovsky Sept. 5, 2024, 7:49 a.m. UTC | #16
On Wed, Sep 04, 2024 at 10:41:38AM -0700, Feng Wang wrote:
> Hi Leon,
> 
> I'm looking at the MLX5 driver to understand how the SA information is
> used. In mlx5e_ipsec_handle_tx_skb(), it appears we might leverage the
> current MLX5 implementation to verify the xfrm id.
> https://elixir.bootlin.com/linux/v6.10/source/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c#L271
> 
> During the mlx5e_xfrm_add_state() function, the xfrm ID (x->if_id) is
> passed to the driver along with the associated xfrm_state pointer.
> Therefore, by checking the if_id within the skb tx function like
> mlx5e_ipsec_handle_tx_skb(), we should be able to demonstrate the use
> case effectively.
> 
> What’s your opinion?

Packet offloaded packets don't pass mlx5e_ipsec_handle_tx_skb() because SKB is
treated as plain text and not encrypted.

In order to support this feature in mlx5, you will need to do two things:
1. Create rule which matches x->if_id in mlx5 flow steering, while
creating SAs (see tx_add_rule()->setup_fte_reg_a()).

This register is used in the transmit steering tables, and is loaded with
the value of flow_table_metadata field in the Ethernet Segment of the WQE.

2. Set x->if_id from SKB in flow_table_metadata to allow HW to catch
these packets. It means change mlx5e datapath to set this value from
SKB.

The first item is easy, just move setup_fte_reg_a() to the right place,
but the second one is more complex as whole packet offload assumption
that we are working with plain text packets.

I'm not even talking about eswitch mode, which will bring more
complexity.

Thanks

> 
> Thanks for your help.
> 
> Feng
Feng Wang Sept. 5, 2024, 6:18 p.m. UTC | #17
Hi Leon,

I'm currently reading the mlx5e_xmit() function to understand the MLX5
datapath. I'm particularly interested in understanding how packet
offload mode functions.

Could you please clarify if CONFIG_MLX5_EN_IPSEC needs to be defined
when MLX5 employs packet offload mode? Additionally, are there any
specific flags associated with packet offload that I should be mindful
of?

Any guidance would be greatly appreciated.

Thanks, Feng
Steffen Klassert Sept. 9, 2024, 9:09 a.m. UTC | #18
On Mon, Sep 02, 2024 at 12:44:52PM +0300, Leon Romanovsky wrote:
> On Mon, Sep 02, 2024 at 09:44:24AM +0200, Steffen Klassert wrote:
> > > 
> > > Steffen,
> > > 
> > > What is your position on this patch?
> > > It is the same patch (logically) as the one that was rejected before?
> > > https://lore.kernel.org/all/ZfpnCIv+8eYd7CpO@gauss3.secunet.de/
> > 
> > This is an infrastructure patch to support routing based IPsec
> > with xfrm interfaces. I just did not notice it because it was not
> > mentioned in the commit message of the first patchset. This should have
> > been included into the packet offload API patchset, but I overlooked
> > that xfrm interfaces can't work with packet offload mode. The stack
> > infrastructure should be complete, so that drivers can implement
> > that without the need to fix the stack before.
> 
> Core implementation that is not used by any upstream code is rarely
> right thing to do. It is not tested, complicates the code and mostly
> overlooked when patches are reviewed. The better way will be to extend
> the stack when this feature will be actually used and needed.

This is our tradeoff, an API should be fully designed from the
beginning, everything else is bad design and will likely result
in band aids (as it happens here). The API can be connected to
netdevsim to test it.

Currently the combination of xfrm interfaces and packet offload
is just broken. Unfortunalely this patch does not fix it.

I think we need to do three things:

- Fix xfrm interfaces + packet offload combination

- Extend netdevsim to support packet offload

- Extend the API for xfrm interfaces (and everything
  else we forgot).

> IMHO, attempt to enrich core code without showing users of this new flow
> is comparable to premature optimization.
> 
> And Feng more than once said that this code is for some out-of-tree
> driver.

It is an API, so everyone can use it.

Anyway, I'm thinking about reverting it for now and do it right
in the next development cycle.
Steffen Klassert Sept. 9, 2024, 10:02 a.m. UTC | #19
On Mon, Sep 09, 2024 at 11:09:06AM +0200, Steffen Klassert wrote:
> On Mon, Sep 02, 2024 at 12:44:52PM +0300, Leon Romanovsky wrote:
> > On Mon, Sep 02, 2024 at 09:44:24AM +0200, Steffen Klassert wrote:
> 
> Anyway, I'm thinking about reverting it for now and do it right
> in the next development cycle.

I finally decided to revert it. Let's work on it after the next
merge window.
Leon Romanovsky Sept. 11, 2024, 10:40 a.m. UTC | #20
On Mon, Sep 09, 2024 at 11:09:05AM +0200, Steffen Klassert wrote:
> On Mon, Sep 02, 2024 at 12:44:52PM +0300, Leon Romanovsky wrote:
> > On Mon, Sep 02, 2024 at 09:44:24AM +0200, Steffen Klassert wrote:
> > > > 
> > > > Steffen,
> > > > 
> > > > What is your position on this patch?
> > > > It is the same patch (logically) as the one that was rejected before?
> > > > https://lore.kernel.org/all/ZfpnCIv+8eYd7CpO@gauss3.secunet.de/
> > > 
> > > This is an infrastructure patch to support routing based IPsec
> > > with xfrm interfaces. I just did not notice it because it was not
> > > mentioned in the commit message of the first patchset. This should have
> > > been included into the packet offload API patchset, but I overlooked
> > > that xfrm interfaces can't work with packet offload mode. The stack
> > > infrastructure should be complete, so that drivers can implement
> > > that without the need to fix the stack before.
> > 
> > Core implementation that is not used by any upstream code is rarely
> > right thing to do. It is not tested, complicates the code and mostly
> > overlooked when patches are reviewed. The better way will be to extend
> > the stack when this feature will be actually used and needed.
> 
> This is our tradeoff, an API should be fully designed from the
> beginning, everything else is bad design and will likely result
> in band aids (as it happens here). The API can be connected to
> netdevsim to test it.
> 
> Currently the combination of xfrm interfaces and packet offload
> is just broken. 

I don't think that it is broken. It is just not implemented. XFRM
interfaces are optional field, which is not really popular in the
field.

> Unfortunalely this patch does not fix it.
> 
> I think we need to do three things:
> 
> - Fix xfrm interfaces + packet offload combination
> 
> - Extend netdevsim to support packet offload
> 
> - Extend the API for xfrm interfaces (and everything
>   else we forgot).

This is the most challenging part. It is not clear what should
we extend if customers are not asking for it and they are extremely
happy with the current IPsec packet offload state.

BTW, I'm aware of one gap, which is not clear how to handle, and
it is combination of policy sockets and offload.

> 
> > IMHO, attempt to enrich core code without showing users of this new flow
> > is comparable to premature optimization.
> > 
> > And Feng more than once said that this code is for some out-of-tree
> > driver.
> 
> It is an API, so everyone can use it.

Of course, as long as in-kernel user exists.

Thanks
Feng Wang Sept. 11, 2024, 11:43 p.m. UTC | #21
Hi Steffen,

Can you reconsider the revert of the CL? Based on our discussion, I
believe we've reached a consensus that the xfrm id is necessary. If
some customers prefer not to utilize it, we can extend the offload
flag (something like XFRM_DEV_OFFLOAD_FLAG_ACQ) to manage this
behavior. The default setting would remain consistent with the current
implementation.

Thank you!

Feng

On Wed, Sep 11, 2024 at 3:40 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Sep 09, 2024 at 11:09:05AM +0200, Steffen Klassert wrote:
> > On Mon, Sep 02, 2024 at 12:44:52PM +0300, Leon Romanovsky wrote:
> > > On Mon, Sep 02, 2024 at 09:44:24AM +0200, Steffen Klassert wrote:
> > > > >
> > > > > Steffen,
> > > > >
> > > > > What is your position on this patch?
> > > > > It is the same patch (logically) as the one that was rejected before?
> > > > > https://lore.kernel.org/all/ZfpnCIv+8eYd7CpO@gauss3.secunet.de/
> > > >
> > > > This is an infrastructure patch to support routing based IPsec
> > > > with xfrm interfaces. I just did not notice it because it was not
> > > > mentioned in the commit message of the first patchset. This should have
> > > > been included into the packet offload API patchset, but I overlooked
> > > > that xfrm interfaces can't work with packet offload mode. The stack
> > > > infrastructure should be complete, so that drivers can implement
> > > > that without the need to fix the stack before.
> > >
> > > Core implementation that is not used by any upstream code is rarely
> > > right thing to do. It is not tested, complicates the code and mostly
> > > overlooked when patches are reviewed. The better way will be to extend
> > > the stack when this feature will be actually used and needed.
> >
> > This is our tradeoff, an API should be fully designed from the
> > beginning, everything else is bad design and will likely result
> > in band aids (as it happens here). The API can be connected to
> > netdevsim to test it.
> >
> > Currently the combination of xfrm interfaces and packet offload
> > is just broken.
>
> I don't think that it is broken. It is just not implemented. XFRM
> interfaces are optional field, which is not really popular in the
> field.
>
> > Unfortunalely this patch does not fix it.
> >
> > I think we need to do three things:
> >
> > - Fix xfrm interfaces + packet offload combination
> >
> > - Extend netdevsim to support packet offload
> >
> > - Extend the API for xfrm interfaces (and everything
> >   else we forgot).
>
> This is the most challenging part. It is not clear what should
> we extend if customers are not asking for it and they are extremely
> happy with the current IPsec packet offload state.
>
> BTW, I'm aware of one gap, which is not clear how to handle, and
> it is combination of policy sockets and offload.
>
> >
> > > IMHO, attempt to enrich core code without showing users of this new flow
> > > is comparable to premature optimization.
> > >
> > > And Feng more than once said that this code is for some out-of-tree
> > > driver.
> >
> > It is an API, so everyone can use it.
>
> Of course, as long as in-kernel user exists.
>
> Thanks
diff mbox series

Patch

diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index e5722c95b8bb..a12588e7b060 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -706,6 +706,8 @@  int xfrm_output(struct sock *sk, struct sk_buff *skb)
 	struct xfrm_state *x = skb_dst(skb)->xfrm;
 	int family;
 	int err;
+	struct xfrm_offload *xo;
+	struct sec_path *sp;
 
 	family = (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) ? x->outer_mode.family
 		: skb_dst(skb)->ops->family;
@@ -728,6 +730,25 @@  int xfrm_output(struct sock *sk, struct sk_buff *skb)
 			kfree_skb(skb);
 			return -EHOSTUNREACH;
 		}
+		sp = secpath_set(skb);
+		if (!sp) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
+			kfree_skb(skb);
+			return -ENOMEM;
+		}
+
+		sp->olen++;
+		sp->xvec[sp->len++] = x;
+		xfrm_state_hold(x);
+
+		xo = xfrm_offload(skb);
+		if (!xo) {
+			secpath_reset(skb);
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
+			kfree_skb(skb);
+			return -EINVAL;
+		}
+		xo->flags |= XFRM_XMIT;
 
 		return xfrm_output_resume(sk, skb, 0);
 	}