mbox series

[RFC,ipsec-next,v2,0/8] Add IP-TFS mode to xfrm

Message ID 20231113035219.920136-1-chopps@chopps.org (mailing list archive)
Headers show
Series Add IP-TFS mode to xfrm | expand

Message

Christian Hopps Nov. 13, 2023, 3:52 a.m. UTC
From: Christian Hopps <chopps@labn.net>

This patchset adds a new xfrm mode implementing on-demand IP-TFS. IP-TFS
(AggFrag encapsulation) has been standardized in RFC9347.

Link: https://www.rfc-editor.org/rfc/rfc9347.txt

This feature supports demand driven (i.e., non-constant send rate) IP-TFS to
take advantage of the AGGFRAG ESP payload encapsulation. This payload type
supports aggregation and fragmentation of the inner IP packet stream which in
turn yields higher small-packet bandwidth as well as reducing MTU/PMTU issues.
Congestion control is unimplementated as the send rate is demand driven rather
than constant.

In order to allow loading this fucntionality as a module a set of callbacks
xfrm_mode_cbs has been added to xfrm as well.

Comments

Steffen Klassert Nov. 13, 2023, 7:15 a.m. UTC | #1
On Sun, Nov 12, 2023 at 10:52:11PM -0500, Christian Hopps wrote:
> From: Christian Hopps <chopps@labn.net>
> 
> This patchset adds a new xfrm mode implementing on-demand IP-TFS. IP-TFS
> (AggFrag encapsulation) has been standardized in RFC9347.
> 
> Link: https://www.rfc-editor.org/rfc/rfc9347.txt
> 
> This feature supports demand driven (i.e., non-constant send rate) IP-TFS to
> take advantage of the AGGFRAG ESP payload encapsulation. This payload type
> supports aggregation and fragmentation of the inner IP packet stream which in
> turn yields higher small-packet bandwidth as well as reducing MTU/PMTU issues.
> Congestion control is unimplementated as the send rate is demand driven rather
> than constant.
> 
> In order to allow loading this fucntionality as a module a set of callbacks
> xfrm_mode_cbs has been added to xfrm as well.

I did a multiple days peer review with Chris on this pachset. So my
concerns are already addressed.

Further reviews are welcome! This is a bigger change and it would
be nice if more people could look at it.

Thanks!
Andrew Cagney Nov. 16, 2023, 3:14 p.m. UTC | #2
> I did a multiple days peer review with Chris on this pachset. So my
> concerns are already addressed.
>
> Further reviews are welcome! This is a bigger change and it would
> be nice if more people could look at it.

I have a usability question.  What name should appear when a user
interacts with and sees log messages from this feature?
    ip-tfs, IP-TFS, IP_TFS
or:
   iptfs, IPTFS, ...
Antony Antony Nov. 16, 2023, 4:02 p.m. UTC | #3
On Sun, Nov 12, 2023 at 10:52:11PM -0500, Christian Hopps via Devel wrote:
> From: Christian Hopps <chopps@labn.net>
> 
> This patchset adds a new xfrm mode implementing on-demand IP-TFS. IP-TFS
> (AggFrag encapsulation) has been standardized in RFC9347.
> 
> Link: https://www.rfc-editor.org/rfc/rfc9347.txt
> 
> This feature supports demand driven (i.e., non-constant send rate) IP-TFS to
> take advantage of the AGGFRAG ESP payload encapsulation. This payload type
> supports aggregation and fragmentation of the inner IP packet stream which in
> turn yields higher small-packet bandwidth as well as reducing MTU/PMTU issues.
> Congestion control is unimplementated as the send rate is demand driven rather
> than constant.
> 
> In order to allow loading this fucntionality as a module a set of callbacks
> xfrm_mode_cbs has been added to xfrm as well.

Hi Chris,

I have further reviewed the code and have a few minor questions, mainly 
related to handling of XFRM_MODE_IPTFS. It appears to me be either some case  
missing support or/and in a few places it should be prohibited. I have three 
types of questions:

1. missing XFRM_MODE_IPTFS support?
2. Will XFRM_MODE_IPTFS be supported this combination?
3. Should be prohibited combination with XFRM_MODE_IPTFS 

1.  Missing:  

a.  wouldn't xfrm_sa_len() need extending? 

I could not find support for XFRM_MODE_IPTFS explicitly.

However, I'm not sure how the following code is working when xfrm_sa_len is 
missing IP-TFS xfrm_sa_len:

copy_to_user_state_extra() {
    if (x->mode_cbs && x->mode_cbs->copy_to_user)
        ret = x->mode_cbs->copy_to_user(x, skb);
}

I have attached what I imagine is a fix. Check with Steffen or others if 
this is necessary.

b. esp6_init_state() and esp_init_state():
These functions do not seem to handle XFRM_MODE_IPTFS. Would they default to work with it?

2. Would xfrm4_outer_mode_gso_segment() xfrm6_outer_mode_gso_segment(): 
support XFRM_MODE_IPTFS?
These functions appear to be missing. Is it possible that you don't support GSO and GRO?

3: Shouldn't these combinations return error?

a. ipcomp and  XFRM_MODE_IPTFS
I'm guessing that ipcomp would generate an error when userspace tries to add an SA with XFRM_MODE_IPTFS.
ipcomp6_init_state()
ipcomp4_init_state()

b: In xfrm_state_construct():

if (attrs[XFRMA_TFCPAD])
    x->tfcpad = nla_get_u32(attrs[XFRMA_TFCPAD]);

-antony
>From 3d9ad9fab35f4efd6c2fb17853aaca986e7daaf3 Mon Sep 17 00:00:00 2001
From: Antony Antony <antony.antony@secunet.com>
Date: Thu, 16 Nov 2023 16:46:54 +0100
Subject: [PATCH] xfrm: iptfs extend xfrm_sa_len

think this extension is necessary. I wonder how it works now:)
May it is alreday handled:)

Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
 include/net/xfrm.h    |  1 +
 net/xfrm/xfrm_iptfs.c | 18 ++++++++++++++++++
 net/xfrm/xfrm_user.c  |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 176ab5ac436e..7ca508b0c46c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -474,6 +474,7 @@ struct xfrm_mode_cbs {
 	 *    mac_header should point at protocol/nexthdr of the outer IP
 	 */
 	int	(*prepare_output)(struct xfrm_state *x, struct sk_buff *skb);
+	unsigned int (*xfrm_sa_len)(const struct xfrm_state *x);
 };
 
 int xfrm_register_mode_cbs(u8 mode, const struct xfrm_mode_cbs *mode_cbs);
diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
index 910c5e060931..d8b02cc3073b 100644
--- a/net/xfrm/xfrm_iptfs.c
+++ b/net/xfrm/xfrm_iptfs.c
@@ -2613,6 +2613,23 @@ static int iptfs_copy_to_user(struct xfrm_state *x, struct sk_buff *skb)
 	return ret;
 }
 
+static unsigned int iptfs_xfrm_sa_len(const struct xfrm_state *x)
+{
+	struct xfrm_iptfs_data *xtfs = x->mode_data;
+        struct xfrm_iptfs_config *xc = &xtfs->cfg;
+	unsigned int l = 0;
+
+	if (xc->dont_frag)
+		l += nla_total_size(sizeof(xc->reorder_win_size));
+	l += nla_total_size(sizeof(xc->reorder_win_size));
+	l += nla_total_size(sizeof(xc->pkt_size));
+	l += nla_total_size(sizeof(xc->max_queue_size));
+	l += nla_total_size(sizeof(xc->drop_time_us));
+	l += nla_total_size(sizeof(xc->init_delay_us));
+
+	return l;
+}
+
 static int iptfs_create_state(struct xfrm_state *x)
 {
 	struct xfrm_iptfs_data *xtfs;
@@ -2701,6 +2718,7 @@ static const struct xfrm_mode_cbs iptfs_mode_cbs = {
 	.input = iptfs_input,
 	.output = iptfs_output_collect,
 	.prepare_output = iptfs_prepare_output,
+	.xfrm_sa_len = iptfs_xfrm_sa_len,
 };
 
 static int __init xfrm_iptfs_init(void)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 4b0e462d5e31..737636b2c267 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3350,6 +3350,9 @@ static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
 	if (x->mapping_maxage)
 		l += nla_total_size(sizeof(x->mapping_maxage));
 
+	if (x->mode_cbs && x->mode_cbs->copy_to_user)
+                l += x->mode_cbs->xfrm_sa_len(x);
+
 	return l;
 }
Christian Hopps Nov. 20, 2023, 6:39 p.m. UTC | #4
Andrew Cagney <andrew.cagney@gmail.com> writes:

>> I did a multiple days peer review with Chris on this pachset. So my
>> concerns are already addressed.
>>
>> Further reviews are welcome! This is a bigger change and it would
>> be nice if more people could look at it.
>
> I have a usability question.  What name should appear when a user
> interacts with and sees log messages from this feature?
>     ip-tfs, IP-TFS, IP_TFS
> or:
>    iptfs, IPTFS, ...

I think no `-` or `_` in the code/api. For documentation it is probably better to hew closer to the RFC and use `IP-TFS`.

Thanks,
Chris.
Antony Antony Nov. 20, 2023, 8 p.m. UTC | #5
On Mon, Nov 20, 2023 at 01:39:50PM -0500, Christian Hopps via Devel wrote:
> 
> Andrew Cagney <andrew.cagney@gmail.com> writes:
> 
> > > I did a multiple days peer review with Chris on this pachset. So my
> > > concerns are already addressed.
> > > 
> > > Further reviews are welcome! This is a bigger change and it would
> > > be nice if more people could look at it.
> > 
> > I have a usability question.  What name should appear when a user
> > interacts with and sees log messages from this feature?
> >     ip-tfs, IP-TFS, IP_TFS
> > or:
> >    iptfs, IPTFS, ...
> 
> I think no `-` or `_` in the code/api. For documentation it is probably better to hew closer to the RFC and use `IP-TFS`.

That sounds good. However,
iproute2 output, ip xfrm state, or "ip xfrm policy" is that documentation or code?

current unsubmitted patch shows: "iptfs"

src 192.1.2.23 dst 192.1.2.45
	proto esp spi 0x76ee6b87(1995336583) reqid 16389(0x00004005) mode iptfs

root@west:/testing/pluto/ikev2-74-iptfs-01 (iptfs-aa-20231120)# ip  x p
src 192.0.1.0/24 dst 192.0.2.0/24
	dir out priority 1757393 ptype main
	tmpl src 192.1.2.45 dst 192.1.2.23
		proto esp reqid 16389 mode iptfs

-antony
Antony Antony Nov. 20, 2023, 8:02 p.m. UTC | #6
On Mon, Nov 20, 2023 at 09:00:45PM +0100, Antony Antony wrote:
> On Mon, Nov 20, 2023 at 01:39:50PM -0500, Christian Hopps via Devel wrote:
> > 
> > Andrew Cagney <andrew.cagney@gmail.com> writes:
> > 
> > > > I did a multiple days peer review with Chris on this pachset. So my
> > > > concerns are already addressed.
> > > > 
> > > > Further reviews are welcome! This is a bigger change and it would
> > > > be nice if more people could look at it.
> > > 
> > > I have a usability question.  What name should appear when a user
> > > interacts with and sees log messages from this feature?
> > >     ip-tfs, IP-TFS, IP_TFS
> > > or:
> > >    iptfs, IPTFS, ...
> > 
> > I think no `-` or `_` in the code/api. For documentation it is probably better to hew closer to the RFC and use `IP-TFS`.
> 
> That sounds good. However,
> iproute2 output, ip xfrm state, or "ip xfrm policy" is that documentation or code?
> 
> current unsubmitted patch shows: "iptfs"
> 
> src 192.1.2.23 dst 192.1.2.45
> 	proto esp spi 0x76ee6b87(1995336583) reqid 16389(0x00004005) mode iptfs

there also the following line further down in ip x s

iptfs-opts pkt-size 0 max-queue-size 1048576 drop-time 1000000 reorder-window 3 init-delay 0

> 
> root@west:/testing/pluto/ikev2-74-iptfs-01 (iptfs-aa-20231120)# ip  x p
> src 192.0.1.0/24 dst 192.0.2.0/24
> 	dir out priority 1757393 ptype main
> 	tmpl src 192.1.2.45 dst 192.1.2.23
> 		proto esp reqid 16389 mode iptfs
> 
> -antony
Christian Hopps Nov. 20, 2023, 8:14 p.m. UTC | #7
Antony Antony <antony@phenome.org> writes:

> On Mon, Nov 20, 2023 at 01:39:50PM -0500, Christian Hopps via Devel wrote:
>>
>> Andrew Cagney <andrew.cagney@gmail.com> writes:
>>
>> > > I did a multiple days peer review with Chris on this pachset. So my
>> > > concerns are already addressed.
>> > >
>> > > Further reviews are welcome! This is a bigger change and it would
>> > > be nice if more people could look at it.
>> >
>> > I have a usability question.  What name should appear when a user
>> > interacts with and sees log messages from this feature?
>> >     ip-tfs, IP-TFS, IP_TFS
>> > or:
>> >    iptfs, IPTFS, ...
>>
>> I think no `-` or `_` in the code/api. For documentation it is probably better to hew closer to the RFC and use `IP-TFS`.
>
> That sounds good. However,
> iproute2 output, ip xfrm state, or "ip xfrm policy" is that documentation or code?
>
> current unsubmitted patch shows: "iptfs"

That's code/api. I was thinking ui/api when I said api.

Documentation would be something ending in a .rst or within a comment as that might be referring to the RFC too.

I will go through and make sure things are consistent in the next version.

Thanks,
Chris.

>
> src 192.1.2.23 dst 192.1.2.45
> 	proto esp spi 0x76ee6b87(1995336583) reqid 16389(0x00004005) mode iptfs
>
> root@west:/testing/pluto/ikev2-74-iptfs-01 (iptfs-aa-20231120)# ip  x p
> src 192.0.1.0/24 dst 192.0.2.0/24
> 	dir out priority 1757393 ptype main
> 	tmpl src 192.1.2.45 dst 192.1.2.23
> 		proto esp reqid 16389 mode iptfs
>
> -antony
Antony Antony Nov. 21, 2023, 7:13 p.m. UTC | #8
On Mon, Nov 13, 2023 at 08:15:47AM +0100, Steffen Klassert via Devel wrote:
> On Sun, Nov 12, 2023 at 10:52:11PM -0500, Christian Hopps wrote:
> > From: Christian Hopps <chopps@labn.net>
> > 
> > This patchset adds a new xfrm mode implementing on-demand IP-TFS. IP-TFS
> > (AggFrag encapsulation) has been standardized in RFC9347.
> > 
> > Link: https://www.rfc-editor.org/rfc/rfc9347.txt
> > 
> > This feature supports demand driven (i.e., non-constant send rate) IP-TFS to
> > take advantage of the AGGFRAG ESP payload encapsulation. This payload type
> > supports aggregation and fragmentation of the inner IP packet stream which in
> > turn yields higher small-packet bandwidth as well as reducing MTU/PMTU issues.
> > Congestion control is unimplementated as the send rate is demand driven rather
> > than constant.
> > 
> > In order to allow loading this fucntionality as a module a set of callbacks
> > xfrm_mode_cbs has been added to xfrm as well.
> 
> I did a multiple days peer review with Chris on this pachset. So my
> concerns are already addressed.
> 
> Further reviews are welcome! This is a bigger change and it would
> be nice if more people could look at it.

I'd like to pose a basic question to understand the new IP-TFS config 
options for an SA.

When configuring IP-TFS parameters on an SA, are all parameters actively 
used by that SA? or does their usage vary based on the direction of the 
traffic?
Currently, each IP-TFS SA includes both init-delay and drop-time parameters. 
I would like to understand whether both parameters are necessary for every SA.

ip xfrm state

src 2001:db8:1:2::45 dst 2001:db8:1:2::23
proto esp spi 0x32af1f6d reqid 16389 mode iptfs
replay-window 0 flag af-unspec esn
aead rfc4106(gcm(aes)) 0x0... 128
anti-replay esn context:
seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
replay_window 128, bitmap-length 4
00000000 00000000 00000000 00000000
iptfs-opts pkt-size 0 max-queue-size 1048576 drop-time 1000000 reorder-window 3 init-delay 0

Chris, would like to you share which of iptfs-opts are for input and which 
are for output?

My current understanding suggests that, depending on the traffic direction, 
onlysome of these parameters might be in active use for each SA. If this is 
indeed the case, I propose we discuss the possibility of refining our 
configuration approach. Could we consider making these parameters mutually 
exclusive and dependent on the traffic direction? Furthermore, given that 
the xfrm community has previously discussed the idea of adding direction to 
the SA — a concept also used in hardware offload scenarios — why not explore 
adding direction to the SA?

We currently have DIR with offload.

ip xfrm state { add | update } ... offload [ crypto|packet ] dev DEV dir DIR

Implementing direction could imply that an IP-TFS SA would require only fewer parameters
– init-delay or drop-time, depending on the specified direction. This would 
make ip x s output simple and comprehensible, thereby reducing potential 
confusion. Also avoid confusing when creating state say input, why add 
output parameters to an input SA?"
Antony Antony Nov. 28, 2023, 10:12 p.m. UTC | #9
Hi Chris,

I got a crash when running inside a  namespace using veth. I got twice. It 
is not 100% reproducable. 

Note: I have KASAN enabled.

[  226.323824] BUG: KASAN: null-ptr-deref in 
iptfs_output_collect+0x1f5/0x61a
[  226.325155] Read of size 8 at addr 0000000000000478 by task ping/5276
[  226.325980]
[  226.326220] CPU: 2 PID: 5276 Comm: ping Not tainted 6.6.0-rc1-00528-gadd9146753b7 #95
[  226.327219] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  226.328396] Call Trace:
[  226.328791]  <TASK>
[  226.329099]  dump_stack_lvl+0x33/0x42
[  226.329597]  kasan_report+0xa0/0xc3
[  226.330074]  ? iptfs_output_collect+0x1f5/0x61a
[  226.330678]  iptfs_output_collect+0x1f5/0x61a
[  226.331259]  ip_send_skb+0x25/0x58
[  226.331724]  raw_sendmsg+0xec4/0xfee
[  226.332223]  ? raw_hash_sk+0x224/0x224
[  226.332741]  ? kasan_unpoison+0x44/0x52
[  226.333258]  ? kernel_init_pages+0x42/0x51
[  226.333810]  ? post_alloc_hook+0xba/0xe0
[  226.334339]  ? prep_new_page+0x44/0x51
[  226.334849]  ? ma_data_end+0x6e/0x88
[  226.335338]  ? preempt_count_sub+0x14/0xb5
[  226.335894]  ? zone_watermark_fast.isra.0+0x12b/0x12b
[  226.336577]  ? __might_resched+0x8d/0x248
[  226.337116]  ? __might_sleep+0x27/0xa2
[  226.337626]  ? first_zones_zonelist+0x2c/0x43
[  226.338211]  ? do_raw_spin_lock+0x72/0xbb
[  226.338752]  ? __might_resched+0x8d/0x248
[  226.339291]  ? __might_sleep+0x27/0xa2
[  226.339801]  ? inet_send_prepare+0x59/0x59
[  226.340356]  ? sock_sendmsg_nosec+0x42/0x6c
[  226.340924]  sock_sendmsg_nosec+0x42/0x6c
[  226.341465]  __sys_sendto+0x172/0x1e1
[  226.341964]  ? __x64_sys_getpeername+0x44/0x44
[  226.342558]  ? folio_batch_add_and_move+0x6a/0x7d
[  226.343189]  ? find_vma+0x6b/0x8b
[  226.343646]  ? find_vma_intersection+0x8a/0x8a
[  226.344246]  ? handle_mm_fault+0xfa/0x163
[  226.344788]  ? preempt_latency_start+0x41/0x4c
[  226.345385]  ? preempt_count_sub+0x14/0xb5
[  226.345937]  __x64_sys_sendto+0x76/0x82
[  226.346456]  do_syscall_64+0x58/0x78
[  226.346946]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  226.347611] RIP: 0033:0x7f378b00ba73
[  226.348095] Code: 8b 15 a9 83 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 71 0b 0d 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24
[  226.350433] RSP: 002b:00007fff58f9c148 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[  226.351400] RAX: ffffffffffffffda RBX: 00005591b05fa340 RCX: 00007f378b00ba73
[  226.352318] RDX: 0000000000000040 RSI: 00005591b06003c0 RDI: 0000000000000003
[  226.353233] RBP: 00005591b06003c0 R08: 00005591b05fc5c0 R09: 0000000000000010
[  226.354147] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040
[  226.355061] R13: 00007fff58f9d830 R14: 0000001d00000001 R15: 00005591b05fd680
[  226.355978]  </TASK>
[  226.356301] ==================================================================
[  226.357274] Disabling lock debugging due to kernel taint
[  226.357973] BUG: kernel NULL pointer dereference, address: 0000000000000478
[  226.358865] #PF: supervisor read access in kernel mode
[  226.359540] #PF: error_code(0x0000) - not-present page
[  226.360220] PGD 0 P4D 0
[  226.360594] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[  226.361346] CPU: 2 PID: 5276 Comm: ping Tainted: G    B              6.6.0-rc1-00528-gadd9146753b7 #95
[  226.362520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  226.363693] RIP: 0010:iptfs_output_collect+0x1f5/0x61a
[  226.364376] Code: c1 ff ff 65 ff 0d 12 29 35 7e 75 05 e8 c3 f0 31 ff 48 8d 7b 10 e8 4b 16 6b ff 4c 8b 73 10 49 8d be 78 04 00 00 e8 3b 16 6b ff <4d> 8b b6 78 04 00 00 49 8d be c0 01 00 00 e8 28 16 6b ff 49 8b 86
[  226.366704] RSP: 0018:ffffc9000a357998 EFLAGS: 00010296
[  226.367389] RAX: 0000000000000001 RBX: ffff8881123a7a00 RCX: fffffbfff075e9f5
[  226.368308] RDX: fffffbfff075e9f5 RSI: fffffbfff075e9f5 RDI: ffffffff83af4fa0
[  226.369237] RBP: ffff88810f45ac00 R08: 0000000000000008 R09: 0000000000000001
[  226.370154] R10: ffffffff83af4fa7 R11: fffffbfff075e9f4 R12: 0000000000000000
[  226.371070] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888114cbeb01
[  226.371984] FS:  00007f378ad4fc40(0000) GS:ffff88815ad00000(0000) knlGS:0000000000000000
[  226.373029] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  226.373777] CR2: 0000000000000478 CR3: 00000001083e9000 CR4: 0000000000350ee0
[  226.374693] Call Trace:
[  226.375042]  <TASK>
[  226.375352]  ? __die_body+0x1a/0x58
[  226.375828]  ? page_fault_oops+0x459/0x4c7
[  226.376387]  ? irq_work_queue+0x2c/0x41
[  226.376915]  ? dump_pagetable+0x1db/0x1db
[  226.377453]  ? vprintk_emit+0x187/0x195
[  226.377972]  ? iptfs_output_collect+0x1f5/0x61a
[  226.378576]  ? _printk+0xb2/0xe1
[  226.379021]  ? syslog_print+0x340/0x340
[  226.379541]  ? do_user_addr_fault+0x156/0x59f
[  226.380713]  ? exc_page_fault+0xaa/0xc3
[  226.381234]  ? asm_exc_page_fault+0x22/0x30
[  226.381795]  ? iptfs_output_collect+0x1f5/0x61a
[  226.382399]  ? iptfs_output_collect+0x1f5/0x61a
[  226.383002]  ip_send_skb+0x25/0x58
[  226.383469]  raw_sendmsg+0xec4/0xfee
[  226.383958]  ? raw_hash_sk+0x224/0x224
[  226.385083]  ? kasan_unpoison+0x44/0x52
[  226.385866]  ? kernel_init_pages+0x42/0x51
[  226.386687]  ? post_alloc_hook+0xba/0xe0
[  226.387559]  ? prep_new_page+0x44/0x51
[  226.388522]  ? ma_data_end+0x6e/0x88
[  226.389365]  ? preempt_count_sub+0x14/0xb5
[  226.390575]  ? zone_watermark_fast.isra.0+0x12b/0x12b
[  226.391555]  ? __might_resched+0x8d/0x248
[  226.392127]  ? __might_sleep+0x27/0xa2
[  226.392794]  ? first_zones_zonelist+0x2c/0x43
[  226.393684]  ? do_raw_spin_lock+0x72/0xbb
[  226.394501]  ? __might_resched+0x8d/0x248
[  226.395324]  ? __might_sleep+0x27/0xa2
[  226.396204]  ? inet_send_prepare+0x59/0x59
[  226.397042]  ? sock_sendmsg_nosec+0x42/0x6c
[  226.397879]  sock_sendmsg_nosec+0x42/0x6c
[  226.398704]  __sys_sendto+0x172/0x1e1
[  226.399465]  ? __x64_sys_getpeername+0x44/0x44
[  226.400367]  ? folio_batch_add_and_move+0x6a/0x7d
[  226.401327]  ? find_vma+0x6b/0x8b
[  226.401812]  ? find_vma_intersection+0x8a/0x8a
[  226.402442]  ? handle_mm_fault+0xfa/0x163
[  226.403014]  ? preempt_latency_start+0x41/0x4c
[  226.403641]  ? preempt_count_sub+0x14/0xb5
[  226.404224]  __x64_sys_sendto+0x76/0x82
[  226.404790]  do_syscall_64+0x58/0x78
[  226.405306]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  226.406010] RIP: 0033:0x7f378b00ba73
[  226.406524] Code: 8b 15 a9 83 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 71 0b 0d 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24
[  226.408990] RSP: 002b:00007fff58f9c148 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[  226.410020] RAX: ffffffffffffffda RBX: 00005591b05fa340 RCX: 00007f378b00ba73
[  226.410992] RDX: 0000000000000040 RSI: 00005591b06003c0 RDI: 0000000000000003
[  226.411968] RBP: 00005591b06003c0 R08: 00005591b05fc5c0 R09: 0000000000000010
[  226.412948] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040
[  226.413923] R13: 00007fff58f9d830 R14: 0000001d00000001 R15: 00005591b05fd680
[  226.414904]  </TASK>
[  226.415243] Modules linked in:
[  226.415697] CR2: 0000000000000478
[  226.416181] ---[ end trace 0000000000000000 ]---
[  226.416848] RIP: 0010:iptfs_output_collect+0x1f5/0x61a
[  226.417569] Code: c1 ff ff 65 ff 0d 12 29 35 7e 75 05 e8 c3 f0 31 ff 48 8d 7b 10 e8 4b 16 6b ff 4c 8b 73 10 49 8d be 78 04 00 00 e8 3b 16 6b ff <4d> 8b b6 78 04 00 00 49 8d be c0 01 00 00 e8 28 16 6b ff 49 8b 86
[  226.420031] RSP: 0018:ffffc9000a357998 EFLAGS: 00010296
[  226.420764] RAX: 0000000000000001 RBX: ffff8881123a7a00 RCX: fffffbfff075e9f5
[  226.421733] RDX: fffffbfff075e9f5 RSI: fffffbfff075e9f5 RDI: ffffffff83af4fa0
[  226.422704] RBP: ffff88810f45ac00 R08: 0000000000000008 R09: 0000000000000001
[  226.423671] R10: ffffffff83af4fa7 R11: fffffbfff075e9f4 R12: 0000000000000000
[  226.424646] R13: 0000000000000000 R14: 0000000000000000 R15: ffff888114cbeb01
[  226.425616] FS:  00007f378ad4fc40(0000) GS:ffff88815ad00000(0000) knlGS:0000000000000000
[  226.426709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  226.427501] CR2: 0000000000000478 CR3: 00000001083e9000 CR4: 0000000000350ee0
[  226.428492] Kernel panic - not syncing: Fatal exception in interrupt
[  226.429587] Kernel Offset: disabled
[  226.430096] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

(gdb) list *iptfs_output_collect+0x1f5
0xffffffff81ce40d6 is in iptfs_output_collect (./include/net/net_namespace.h:385).
380	}
381
382	static inline struct net *read_pnet(const possible_net_t *pnet)
383	{
384	#ifdef CONFIG_NET_NS
385		return pnet->net;
386	#else
387		return &init_net;
388	#endif
389	}


On Sun, Nov 12, 2023 at 10:52:11PM -0500, Christian Hopps via Devel wrote:
> From: Christian Hopps <chopps@labn.net>
> 
> This patchset adds a new xfrm mode implementing on-demand IP-TFS. IP-TFS
> (AggFrag encapsulation) has been standardized in RFC9347.
> 
> Link: https://www.rfc-editor.org/rfc/rfc9347.txt
> 
> This feature supports demand driven (i.e., non-constant send rate) IP-TFS to
> take advantage of the AGGFRAG ESP payload encapsulation. This payload type
> supports aggregation and fragmentation of the inner IP packet stream which in
> turn yields higher small-packet bandwidth as well as reducing MTU/PMTU issues.
> Congestion control is unimplementated as the send rate is demand driven rather
> than constant.
> 
> In order to allow loading this fucntionality as a module a set of callbacks
> xfrm_mode_cbs has been added to xfrm as well.
> -- 
> Devel mailing list
> Devel@linux-ipsec.org
> https://linux-ipsec.org/mailman/listinfo/devel
Christian Hopps March 7, 2024, 11:05 a.m. UTC | #10
Antony Antony <antony@phenome.org> writes:

> On Sun, Nov 12, 2023 at 10:52:11PM -0500, Christian Hopps via Devel wrote:
>> From: Christian Hopps <chopps@labn.net>
>>
>> This patchset adds a new xfrm mode implementing on-demand IP-TFS. IP-TFS
>> (AggFrag encapsulation) has been standardized in RFC9347.
>>
>> Link: https://www.rfc-editor.org/rfc/rfc9347.txt
>>
>> This feature supports demand driven (i.e., non-constant send rate) IP-TFS to
>> take advantage of the AGGFRAG ESP payload encapsulation. This payload type
>> supports aggregation and fragmentation of the inner IP packet stream which in
>> turn yields higher small-packet bandwidth as well as reducing MTU/PMTU issues.
>> Congestion control is unimplementated as the send rate is demand driven rather
>> than constant.
>>
>> In order to allow loading this fucntionality as a module a set of callbacks
>> xfrm_mode_cbs has been added to xfrm as well.
>
> Hi Chris,
>
> I have further reviewed the code and have a few minor questions, mainly
> related to handling of XFRM_MODE_IPTFS. It appears to me be either some case
> missing support or/and in a few places it should be prohibited. I have three
> types of questions:
>
> 1. missing XFRM_MODE_IPTFS support?
> 2. Will XFRM_MODE_IPTFS be supported this combination?
> 3. Should be prohibited combination with XFRM_MODE_IPTFS
>
> 1.  Missing:
>
> a.  wouldn't xfrm_sa_len() need extending?
>
> I could not find support for XFRM_MODE_IPTFS explicitly.
>
> However, I'm not sure how the following code is working when xfrm_sa_len is
> missing IP-TFS xfrm_sa_len:
>
> copy_to_user_state_extra() {
>     if (x->mode_cbs && x->mode_cbs->copy_to_user)
>         ret = x->mode_cbs->copy_to_user(x, skb);
> }
>
> I have attached what I imagine is a fix. Check with Steffen or others if
> this is necessary.

I have adopted this change with a couple small changes, thanks!

> b. esp6_init_state() and esp_init_state():
> These functions do not seem to handle XFRM_MODE_IPTFS. Would they default to work with it?

That's b/c IPTFS uses ESP w/o modification. IP-TFS makes its own mode based changes (similar to what `esp_init_state()` does) in it's `create_state` callback which like `esp_init_state()` is called from `__xfrm_init_state()`.

> 2. Would xfrm4_outer_mode_gso_segment() xfrm6_outer_mode_gso_segment():
> support XFRM_MODE_IPTFS?
> These functions appear to be missing. Is it possible that you don't support GSO and GRO?

Correct.

> 3: Shouldn't these combinations return error?
>
> a. ipcomp and  XFRM_MODE_IPTFS
> I'm guessing that ipcomp would generate an error when userspace tries to add an SA with XFRM_MODE_IPTFS.
> ipcomp6_init_state()
> ipcomp4_init_state()

Correct.

> b: In xfrm_state_construct():
>
> if (attrs[XFRMA_TFCPAD])
>     x->tfcpad = nla_get_u32(attrs[XFRMA_TFCPAD]);

Can you explain this more?

Thanks,
Chris.

>
> -antony
>
> [2. text/plain; 0001-xfrm-iptfs-extend-xfrm_sa_len.patch]...
Antony Antony March 10, 2024, 8:34 p.m. UTC | #11
On Thu, Mar 07, 2024 at 06:05:37AM -0500, Christian Hopps wrote:
> 
> Antony Antony <antony@phenome.org> writes:
> 
> > On Sun, Nov 12, 2023 at 10:52:11PM -0500, Christian Hopps via Devel wrote:
> > > From: Christian Hopps <chopps@labn.net>
> > > 

> > b: In xfrm_state_construct():
> > 
> > if (attrs[XFRMA_TFCPAD])
> >     x->tfcpad = nla_get_u32(attrs[XFRMA_TFCPAD]);
> 
> Can you explain this more?

I haven't used it. I just know it from *swan code. And I wondered how it 
would work with IP-TFS. Looking in the kernel code, I noticed, it is not 
allowed in xfrm_user.c

 270                 if (attrs[XFRMA_TFCPAD] &&
 271                     p->mode != XFRM_MODE_TUNNEL) {
 272                         NL_SET_ERR_MSG(extack, "TFC padding can only be used in tunnel mode");
 273                         goto out;
 274                 }