diff mbox series

[ipsec-next,v5,06/17] xfrm: add mode_cbs module functionality

Message ID 20240714202246.1573817-7-chopps@chopps.org (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series Add IP-TFS mode to xfrm | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Christian Hopps July 14, 2024, 8:22 p.m. UTC
From: Christian Hopps <chopps@labn.net>

Add a set of callbacks xfrm_mode_cbs to xfrm_state. These callbacks
enable the addition of new xfrm modes, such as IP-TFS to be defined
in modules.

Signed-off-by: Christian Hopps <chopps@labn.net>
---
 include/net/xfrm.h     | 39 ++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_device.c |  3 ++-
 net/xfrm/xfrm_input.c  | 14 ++++++++++--
 net/xfrm/xfrm_output.c |  2 ++
 net/xfrm/xfrm_policy.c | 18 ++++++++++------
 net/xfrm/xfrm_state.c  | 48 ++++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_user.c   | 13 ++++++++++++
 7 files changed, 127 insertions(+), 10 deletions(-)

Comments

Florian Westphal July 19, 2024, 6:14 a.m. UTC | #1
Christian Hopps via Devel <devel@linux-ipsec.org> wrote:
> +static struct xfrm_mode_cbs xfrm_mode_cbs_map[XFRM_MODE_MAX];

Why not
static struct xfrm_mode_cbs *xfrm_mode_cbs_map[XFRM_MODE_MAX];
i.e., why does

> +int xfrm_register_mode_cbs(u8 mode, const struct xfrm_mode_cbs *mode_cbs)
> +{
> +	if (mode >= XFRM_MODE_MAX)
> +		return -EINVAL;
> +
> +	xfrm_mode_cbs_map[mode] = *mode_cbs;

do a deep copy here rather than
	xfrm_mode_cbs_map[mode] = mode_cbs;

?

> +	if (mode == XFRM_MODE_IPTFS && !xfrm_mode_cbs_map[mode].create_state)
> +		request_module("xfrm-iptfs");
> +	return &xfrm_mode_cbs_map[mode];

Sabrina noticed that this looks weird, and I agree.

This can, afaics, return a partially initialised structure or an
all-zero structure.

request_module() might load the module, it can fail to load the
module, and, most importantly, there is no guarantee that the
module is removed right after.

Is there any locking scheme in place that prevents this from
happening? If so, a comment should be added that explains this.
Christian Hopps July 20, 2024, 12:06 a.m. UTC | #2
Florian Westphal <fw@strlen.de> writes:

> Christian Hopps via Devel <devel@linux-ipsec.org> wrote:
>> +static struct xfrm_mode_cbs xfrm_mode_cbs_map[XFRM_MODE_MAX];
>
> Why not
> static struct xfrm_mode_cbs *xfrm_mode_cbs_map[XFRM_MODE_MAX];
> i.e., why does
>
>> +int xfrm_register_mode_cbs(u8 mode, const struct xfrm_mode_cbs *mode_cbs)
>> +{
>> +	if (mode >= XFRM_MODE_MAX)
>> +		return -EINVAL;
>> +
>> +	xfrm_mode_cbs_map[mode] = *mode_cbs;
>
> do a deep copy here rather than
> 	xfrm_mode_cbs_map[mode] = mode_cbs;
>
> ?
>
>> +	if (mode == XFRM_MODE_IPTFS && !xfrm_mode_cbs_map[mode].create_state)
>> +		request_module("xfrm-iptfs");
>> +	return &xfrm_mode_cbs_map[mode];
>
> Sabrina noticed that this looks weird, and I agree.
>
> This can, afaics, return a partially initialised structure or an
> all-zero structure.
>
> request_module() might load the module, it can fail to load the
> module, and, most importantly, there is no guarantee that the
> module is removed right after.
>
> Is there any locking scheme in place that prevents this from
> happening? If so, a comment should be added that explains this.

I've switched it to the more obvious array of pointers and storing the module provided pointer.

Thanks,
Chris.
Sabrina Dubroca July 25, 2024, 1:32 p.m. UTC | #3
2024-07-14, 16:22:34 -0400, Christian Hopps wrote:
> +struct xfrm_mode_cbs {

It would be nice to add kdoc for the whole thing.


> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index 7cee9c0a2cdc..6ff05604f973 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -494,6 +497,10 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
>  
>  		family = x->props.family;
>  
> +		/* An encap_type of -3 indicates reconstructed inner packet */

And I think it's time to document all the encap_types above the
function (and in particular, how xfrm_inner_mode_input/encap_type=-3
pair together), and/or define some constants. Also, is -2 used
anywhere (I only see -1 and -3)? If not, then why -3?
Christian Hopps July 30, 2024, 9:29 p.m. UTC | #4
Sabrina Dubroca <sd@queasysnail.net> writes:

> 2024-07-14, 16:22:34 -0400, Christian Hopps wrote:
>> +struct xfrm_mode_cbs {
>
> It would be nice to add kdoc for the whole thing.

Ok, I'll move the inline comments to a kdoc. FWIW, all the other structs in this header, including the main `xfrm_state` struct use the same inline comment documentation style I copied.

>> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
>> index 7cee9c0a2cdc..6ff05604f973 100644
>> --- a/net/xfrm/xfrm_input.c
>> +++ b/net/xfrm/xfrm_input.c
>> @@ -494,6 +497,10 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
>>
>>  		family = x->props.family;
>>
>> +		/* An encap_type of -3 indicates reconstructed inner packet */
>
> And I think it's time to document all the encap_types above the
> function (and in particular, how xfrm_inner_mode_input/encap_type=-3
> pair together), and/or define some constants. Also, is -2 used
> anywhere (I only see -1 and -3)? If not, then why -3?

At the time this was added ISTR that there was some belief that -2 was used perhaps in an upcoming patch, so I picked -3. I can't find a -2 use case though so I will switch to -2 instead.

Re documentation: I think the inline comments where encap_type is used is sufficient documentation for the 2 negative values. There's a lot going on in this function and someone wishing to change (or understand) something is going to have to walk the code and use cases regardless of a bit of extra verbiage on the encap_value beyond what's already there. Fully documenting how xfrm_input works (in all it's use cases) seems beyond the scope of this patch to me.

Thanks,
Chris.
Sabrina Dubroca July 31, 2024, 5:10 p.m. UTC | #5
2024-07-30, 17:29:06 -0400, Christian Hopps wrote:
> 
> Sabrina Dubroca <sd@queasysnail.net> writes:
> 
> > 2024-07-14, 16:22:34 -0400, Christian Hopps wrote:
> > > +struct xfrm_mode_cbs {
> > 
> > It would be nice to add kdoc for the whole thing.
> 
> Ok, I'll move the inline comments to a kdoc. FWIW, all the other structs in this header, including the main `xfrm_state` struct use the same inline comment documentation style I copied.

Sure, but I don't think we should model new code on old habits.

> > > diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> > > index 7cee9c0a2cdc..6ff05604f973 100644
> > > --- a/net/xfrm/xfrm_input.c
> > > +++ b/net/xfrm/xfrm_input.c
> > > @@ -494,6 +497,10 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
> > > 
> > >  		family = x->props.family;
> > > 
> > > +		/* An encap_type of -3 indicates reconstructed inner packet */
> > 
> > And I think it's time to document all the encap_types above the
> > function (and in particular, how xfrm_inner_mode_input/encap_type=-3
> > pair together), and/or define some constants. Also, is -2 used
> > anywhere (I only see -1 and -3)? If not, then why -3?
> 
> At the time this was added ISTR that there was some belief that -2
> was used perhaps in an upcoming patch, so I picked -3. I can't find
> a -2 use case though so I will switch to -2 instead.
> 
> Re documentation: I think the inline comments where encap_type is
> used is sufficient documentation for the 2 negative values.

I don't think it is. Inline comments are good to explain the internal
behavior, but that's more external behavior.

> There's
> a lot going on in this function and someone wishing to change (or
> understand) something is going to have to walk the code and use
> cases regardless of a bit of extra verbiage on the encap_value
> beyond what's already there. Fully documenting how xfrm_input works
> (in all it's use cases) seems beyond the scope of this patch to me.

Sure, and that's really not what I'm asking for here. Something like
"encap_type=-3 makes xfrm_input jump right back to where it stopped
when xfrm_inner_mode_input returned -EINPROGRESS" is useful without
having to dive into the mess that is xfrm_input.
Christian Hopps July 31, 2024, 6:32 p.m. UTC | #6
Sabrina Dubroca <sd@queasysnail.net> writes:

> 2024-07-30, 17:29:06 -0400, Christian Hopps wrote:
>>
>> Sabrina Dubroca <sd@queasysnail.net> writes:
>>
>> > 2024-07-14, 16:22:34 -0400, Christian Hopps wrote:
>> > > +struct xfrm_mode_cbs {
>> >
>> > It would be nice to add kdoc for the whole thing.
>>
>> Ok, I'll move the inline comments to a kdoc. FWIW, all the other structs in
>> this header, including the main `xfrm_state` struct use the same inline
>> comment documentation style I copied.
>
> Sure, but I don't think we should model new code on old habits.
>
>> > > diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
>> > > index 7cee9c0a2cdc..6ff05604f973 100644
>> > > --- a/net/xfrm/xfrm_input.c
>> > > +++ b/net/xfrm/xfrm_input.c
>> > > @@ -494,6 +497,10 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
>> > >
>> > >  		family = x->props.family;
>> > >
>> > > +		/* An encap_type of -3 indicates reconstructed inner packet */
>> >
>> > And I think it's time to document all the encap_types above the
>> > function (and in particular, how xfrm_inner_mode_input/encap_type=-3
>> > pair together), and/or define some constants. Also, is -2 used
>> > anywhere (I only see -1 and -3)? If not, then why -3?
>>
>> At the time this was added ISTR that there was some belief that -2
>> was used perhaps in an upcoming patch, so I picked -3. I can't find
>> a -2 use case though so I will switch to -2 instead.
>>
>> Re documentation: I think the inline comments where encap_type is
>> used is sufficient documentation for the 2 negative values.
>
> I don't think it is. Inline comments are good to explain the internal
> behavior, but that's more external behavior.

>> There's
>> a lot going on in this function and someone wishing to change (or
>> understand) something is going to have to walk the code and use
>> cases regardless of a bit of extra verbiage on the encap_value
>> beyond what's already there. Fully documenting how xfrm_input works
>> (in all it's use cases) seems beyond the scope of this patch to me.
>
> Sure, and that's really not what I'm asking for here. Something like
> "encap_type=-3 makes xfrm_input jump right back to where it stopped
> when xfrm_inner_mode_input returned -EINPROGRESS" is useful without
> having to dive into the mess that is xfrm_input.

If I'm not adding your suggested text into an inline comment where am I doing this?

Bear in mind that encap_type can also have non-negative values, am I documenting all these cases too? It just seems like going down this path is asking for the entire function to be documented, perhaps I'm missing something though.

Are other people going to be OK with a top of function comment that only documents the single (now) `-2` value for encap_type?

Thanks,
Chris.
Christian Hopps July 31, 2024, 6:41 p.m. UTC | #7
Christian Hopps <chopps@chopps.org> writes:

> Sabrina Dubroca <sd@queasysnail.net> writes:
>
>> 2024-07-30, 17:29:06 -0400, Christian Hopps wrote:
>>>
>>> Sabrina Dubroca <sd@queasysnail.net> writes:
>>>
>>> > 2024-07-14, 16:22:34 -0400, Christian Hopps wrote:
>>> > > +struct xfrm_mode_cbs {
>>> >
>>> > It would be nice to add kdoc for the whole thing.
>>>
>>> Ok, I'll move the inline comments to a kdoc. FWIW, all the other structs in
>>> this header, including the main `xfrm_state` struct use the same inline
>>> comment documentation style I copied.
>>
>> Sure, but I don't think we should model new code on old habits.
>>
>>> > > diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
>>> > > index 7cee9c0a2cdc..6ff05604f973 100644
>>> > > --- a/net/xfrm/xfrm_input.c
>>> > > +++ b/net/xfrm/xfrm_input.c
>>> > > @@ -494,6 +497,10 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
>>> > >
>>> > >  		family = x->props.family;
>>> > >
>>> > > +		/* An encap_type of -3 indicates reconstructed inner packet */
>>> >
>>> > And I think it's time to document all the encap_types above the
>>> > function (and in particular, how xfrm_inner_mode_input/encap_type=-3
>>> > pair together), and/or define some constants. Also, is -2 used
>>> > anywhere (I only see -1 and -3)? If not, then why -3?
>>>
>>> At the time this was added ISTR that there was some belief that -2
>>> was used perhaps in an upcoming patch, so I picked -3. I can't find
>>> a -2 use case though so I will switch to -2 instead.
>>>
>>> Re documentation: I think the inline comments where encap_type is
>>> used is sufficient documentation for the 2 negative values.
>>
>> I don't think it is. Inline comments are good to explain the internal
>> behavior, but that's more external behavior.
>
>>> There's
>>> a lot going on in this function and someone wishing to change (or
>>> understand) something is going to have to walk the code and use
>>> cases regardless of a bit of extra verbiage on the encap_value
>>> beyond what's already there. Fully documenting how xfrm_input works
>>> (in all it's use cases) seems beyond the scope of this patch to me.
>>
>> Sure, and that's really not what I'm asking for here. Something like
>> "encap_type=-3 makes xfrm_input jump right back to where it stopped
>> when xfrm_inner_mode_input returned -EINPROGRESS" is useful without
>> having to dive into the mess that is xfrm_input.
>
> If I'm not adding your suggested text into an inline comment where am I doing this?
>
> Bear in mind that encap_type can also have non-negative values, am I documenting
> all these cases too? It just seems like going down this path is asking for the
> entire function to be documented, perhaps I'm missing something though.
>
> Are other people going to be OK with a top of function comment that only documents the single (now) `-2` value for encap_type?

Actually, I will document the "Resume" negative value special cases in a top of function comment. If we want more generalized documentation for this function I think it should be in a different patch/project.

Thanks,
Chris.

> Thanks,
> Chris.
diff mbox series

Patch

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 46a214a76081..218659a454a1 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -206,6 +206,7 @@  struct xfrm_state {
 		u16		family;
 		xfrm_address_t	saddr;
 		int		header_len;
+		int		enc_hdr_len;
 		int		trailer_len;
 		u32		extra_flags;
 		struct xfrm_mark	smark;
@@ -296,6 +297,9 @@  struct xfrm_state {
 	 * interpreted by xfrm_type methods. */
 	void			*data;
 	u8			dir;
+
+	const struct xfrm_mode_cbs	*mode_cbs;
+	void				*mode_data;
 };
 
 static inline struct net *xs_net(struct xfrm_state *x)
@@ -448,6 +452,41 @@  struct xfrm_type_offload {
 int xfrm_register_type_offload(const struct xfrm_type_offload *type, unsigned short family);
 void xfrm_unregister_type_offload(const struct xfrm_type_offload *type, unsigned short family);
 
+struct xfrm_mode_cbs {
+	struct module	*owner;
+	/* Add/delete state in the new xfrm_state in `x`. */
+	int	(*create_state)(struct xfrm_state *x);
+	void	(*delete_state)(struct xfrm_state *x);
+
+	/* Called while handling the user netlink options. */
+	int	(*user_init)(struct net *net, struct xfrm_state *x,
+			     struct nlattr **attrs,
+			     struct netlink_ext_ack *extack);
+	int	(*copy_to_user)(struct xfrm_state *x, struct sk_buff *skb);
+	int     (*clone)(struct xfrm_state *x, struct xfrm_state *orig);
+	unsigned int (*sa_len)(const struct xfrm_state *x);
+
+	u32	(*get_inner_mtu)(struct xfrm_state *x, int outer_mtu);
+
+	/* Called to handle received xfrm (egress) packets. */
+	int	(*input)(struct xfrm_state *x, struct sk_buff *skb);
+
+	/* Placed in dst_output of the dst when an xfrm_state is bound. */
+	int	(*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
+
+	/**
+	 * Prepare the skb for output for the given mode. Returns:
+	 *    Error value, if 0 then skb values should be as follows:
+	 *    transport_header should point at ESP header
+	 *    network_header should point at Outer IP header
+	 *    mac_header should point at protocol/nexthdr of the outer IP
+	 */
+	int	(*prepare_output)(struct xfrm_state *x, struct sk_buff *skb);
+};
+
+int xfrm_register_mode_cbs(u8 mode, const struct xfrm_mode_cbs *mode_cbs);
+void xfrm_unregister_mode_cbs(u8 mode);
+
 static inline int xfrm_af2proto(unsigned int family)
 {
 	switch(family) {
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 9a44d363ba62..e412e4afb169 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -42,7 +42,8 @@  static void __xfrm_mode_tunnel_prep(struct xfrm_state *x, struct sk_buff *skb,
 		skb->transport_header = skb->network_header + hsize;
 
 	skb_reset_mac_len(skb);
-	pskb_pull(skb, skb->mac_len + x->props.header_len);
+	pskb_pull(skb,
+		  skb->mac_len + x->props.header_len - x->props.enc_hdr_len);
 }
 
 static void __xfrm_mode_beet_prep(struct xfrm_state *x, struct sk_buff *skb,
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 7cee9c0a2cdc..6ff05604f973 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -446,6 +446,9 @@  static int xfrm_inner_mode_input(struct xfrm_state *x,
 		WARN_ON_ONCE(1);
 		break;
 	default:
+		if (x->mode_cbs && x->mode_cbs->input)
+			return x->mode_cbs->input(x, skb);
+
 		WARN_ON_ONCE(1);
 		break;
 	}
@@ -494,6 +497,10 @@  int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 		family = x->props.family;
 
+		/* An encap_type of -3 indicates reconstructed inner packet */
+		if (encap_type == -3)
+			goto resume_decapped;
+
 		/* An encap_type of -1 indicates async resumption. */
 		if (encap_type == -1) {
 			async = 1;
@@ -681,11 +688,14 @@  int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 		XFRM_MODE_SKB_CB(skb)->protocol = nexthdr;
 
-		if (xfrm_inner_mode_input(x, skb)) {
+		err = xfrm_inner_mode_input(x, skb);
+		if (err == -EINPROGRESS)
+			return 0;
+		else if (err) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR);
 			goto drop;
 		}
-
+resume_decapped:
 		if (x->outer_mode.flags & XFRM_MODE_FLAG_TUNNEL) {
 			decaps = 1;
 			break;
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index e5722c95b8bb..ef81359e4038 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -472,6 +472,8 @@  static int xfrm_outer_mode_output(struct xfrm_state *x, struct sk_buff *skb)
 		WARN_ON_ONCE(1);
 		break;
 	default:
+		if (x->mode_cbs && x->mode_cbs->prepare_output)
+			return x->mode_cbs->prepare_output(x, skb);
 		WARN_ON_ONCE(1);
 		break;
 	}
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 2a9a31f2a9c1..a2ed27fb0941 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2720,13 +2720,17 @@  static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 
 		dst1->input = dst_discard;
 
-		rcu_read_lock();
-		afinfo = xfrm_state_afinfo_get_rcu(inner_mode->family);
-		if (likely(afinfo))
-			dst1->output = afinfo->output;
-		else
-			dst1->output = dst_discard_out;
-		rcu_read_unlock();
+		if (xfrm[i]->mode_cbs && xfrm[i]->mode_cbs->output) {
+			dst1->output = xfrm[i]->mode_cbs->output;
+		} else {
+			rcu_read_lock();
+			afinfo = xfrm_state_afinfo_get_rcu(inner_mode->family);
+			if (likely(afinfo))
+				dst1->output = afinfo->output;
+			else
+				dst1->output = dst_discard_out;
+			rcu_read_unlock();
+		}
 
 		xdst_prev = xdst;
 
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index abadc857cd45..c6db3e2465ca 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -513,6 +513,36 @@  static const struct xfrm_mode *xfrm_get_mode(unsigned int encap, int family)
 	return NULL;
 }
 
+static struct xfrm_mode_cbs xfrm_mode_cbs_map[XFRM_MODE_MAX];
+
+int xfrm_register_mode_cbs(u8 mode, const struct xfrm_mode_cbs *mode_cbs)
+{
+	if (mode >= XFRM_MODE_MAX)
+		return -EINVAL;
+
+	xfrm_mode_cbs_map[mode] = *mode_cbs;
+	return 0;
+}
+EXPORT_SYMBOL(xfrm_register_mode_cbs);
+
+void xfrm_unregister_mode_cbs(u8 mode)
+{
+	if (mode >= XFRM_MODE_MAX)
+		return;
+
+	memset(&xfrm_mode_cbs_map[mode], 0, sizeof(xfrm_mode_cbs_map[mode]));
+}
+EXPORT_SYMBOL(xfrm_unregister_mode_cbs);
+
+static const struct xfrm_mode_cbs *xfrm_get_mode_cbs(u8 mode)
+{
+	if (mode >= XFRM_MODE_MAX)
+		return NULL;
+	if (mode == XFRM_MODE_IPTFS && !xfrm_mode_cbs_map[mode].create_state)
+		request_module("xfrm-iptfs");
+	return &xfrm_mode_cbs_map[mode];
+}
+
 void xfrm_state_free(struct xfrm_state *x)
 {
 	kmem_cache_free(xfrm_state_cache, x);
@@ -521,6 +551,8 @@  EXPORT_SYMBOL(xfrm_state_free);
 
 static void ___xfrm_state_destroy(struct xfrm_state *x)
 {
+	if (x->mode_cbs && x->mode_cbs->delete_state)
+		x->mode_cbs->delete_state(x);
 	hrtimer_cancel(&x->mtimer);
 	del_timer_sync(&x->rtimer);
 	kfree(x->aead);
@@ -678,6 +710,7 @@  struct xfrm_state *xfrm_state_alloc(struct net *net)
 		x->replay_maxage = 0;
 		x->replay_maxdiff = 0;
 		spin_lock_init(&x->lock);
+		x->mode_data = NULL;
 	}
 	return x;
 }
@@ -1749,6 +1782,12 @@  static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig,
 	x->new_mapping_sport = 0;
 	x->dir = orig->dir;
 
+	x->mode_cbs = orig->mode_cbs;
+	if (x->mode_cbs && x->mode_cbs->clone) {
+		if (x->mode_cbs->clone(x, orig))
+			goto error;
+	}
+
 	return x;
 
  error:
@@ -2788,6 +2827,9 @@  u32 xfrm_state_mtu(struct xfrm_state *x, int mtu)
 	case XFRM_MODE_TUNNEL:
 		break;
 	default:
+		if (x->mode_cbs && x->mode_cbs->get_inner_mtu)
+			return x->mode_cbs->get_inner_mtu(x, mtu);
+
 		WARN_ON_ONCE(1);
 		break;
 	}
@@ -2888,6 +2930,12 @@  int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload,
 		}
 	}
 
+	x->mode_cbs = xfrm_get_mode_cbs(x->props.mode);
+	if (x->mode_cbs && x->mode_cbs->create_state) {
+		err = x->mode_cbs->create_state(x);
+		if (err)
+			goto error;
+	}
 error:
 	return err;
 }
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d42805314a2a..4f9fb6a0b7dc 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -918,6 +918,12 @@  static struct xfrm_state *xfrm_state_construct(struct net *net,
 			goto error;
 	}
 
+	if (x->mode_cbs && x->mode_cbs->user_init) {
+		err = x->mode_cbs->user_init(net, x, attrs, extack);
+		if (err)
+			goto error;
+	}
+
 	return x;
 
 error:
@@ -1331,6 +1337,10 @@  static int copy_to_user_state_extra(struct xfrm_state *x,
 		if (ret)
 			goto out;
 	}
+	if (x->mode_cbs && x->mode_cbs->copy_to_user)
+		ret = x->mode_cbs->copy_to_user(x, skb);
+	if (ret)
+		goto out;
 	if (x->mapping_maxage) {
 		ret = nla_put_u32(skb, XFRMA_MTIMER_THRESH, x->mapping_maxage);
 		if (ret)
@@ -3541,6 +3551,9 @@  static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
 	if (x->nat_keepalive_interval)
 		l += nla_total_size(sizeof(x->nat_keepalive_interval));
 
+	if (x->mode_cbs && x->mode_cbs->sa_len)
+		l += x->mode_cbs->sa_len(x);
+
 	return l;
 }