Message ID | e721c615e22fc4d3d53bfa230d5d71462ae9c9a8.1697779681.git.yan@cloudflare.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ipv6: avoid atomic fragment on GSO output | expand |
On Fri, Oct 20, 2023 at 7:32 AM Yan Zhai <yan@cloudflare.com> wrote: > > dst_allfrag was added before the first git commit: > > https://www.mail-archive.com/bk-commits-head@vger.kernel.org/msg03399.html > > The feature would send packets to the fragmentation path if a box > receives a PMTU value with less than 1280 byte. However, since commit > 9d289715eb5c ("ipv6: stop sending PTB packets for MTU < 1280"), such > message would be simply discarded. The feature flag is neither supported > in iproute2 utility. In theory one can still manipulate it with direct > netlink message, but it is not ideal because it was based on obsoleted > guidance of RFC-2460 (replaced by RFC-8200). > > The feature test would always return false at the moment, so remove it > from the output path. What about other callers of dst_allfrag() ? This feature seems broken atm.
On Fri, Oct 20, 2023 at 1:06 AM Eric Dumazet <edumazet@google.com> wrote: > > On Fri, Oct 20, 2023 at 7:32 AM Yan Zhai <yan@cloudflare.com> wrote: > > > > dst_allfrag was added before the first git commit: > > > > https://www.mail-archive.com/bk-commits-head@vger.kernel.org/msg03399.html > > > > The feature would send packets to the fragmentation path if a box > > receives a PMTU value with less than 1280 byte. However, since commit > > 9d289715eb5c ("ipv6: stop sending PTB packets for MTU < 1280"), such > > message would be simply discarded. The feature flag is neither supported > > in iproute2 utility. In theory one can still manipulate it with direct > > netlink message, but it is not ideal because it was based on obsoleted > > guidance of RFC-2460 (replaced by RFC-8200). > > > > The feature test would always return false at the moment, so remove it > > from the output path. > > What about other callers of dst_allfrag() ? > > This feature seems broken atm. It is broken as far as I can tell. The reason I removed just one caller here is to keep the code simpler and consistent. If I don't do so, I ought to test it for both GSO fast path and slow path to be logically consistent. Seems an overkill to me. For the removal of the rest, I'd hope it could come in as a standalone patch(set) because it is not just callers but also those unnecessary flags and tests on IP corks and sockets, not quite aligned with this patch's intention. I noted you have drafted something like this in the past: https://lkml.kernel.org/netdev/1335348157.3274.30.camel@edumazet-glaptop/ I guess it might be a good base point to work on as a new patch(set)? What's your call on this? Yan
On Fri, Oct 20, 2023 at 8:39 AM Yan Zhai <yan@cloudflare.com> wrote: > > On Fri, Oct 20, 2023 at 1:06 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Fri, Oct 20, 2023 at 7:32 AM Yan Zhai <yan@cloudflare.com> wrote: > > > > > > dst_allfrag was added before the first git commit: > > > > > > https://www.mail-archive.com/bk-commits-head@vger.kernel.org/msg03399.html > > > > > > The feature would send packets to the fragmentation path if a box > > > receives a PMTU value with less than 1280 byte. However, since commit > > > 9d289715eb5c ("ipv6: stop sending PTB packets for MTU < 1280"), such > > > message would be simply discarded. The feature flag is neither supported > > > in iproute2 utility. In theory one can still manipulate it with direct > > > netlink message, but it is not ideal because it was based on obsoleted > > > guidance of RFC-2460 (replaced by RFC-8200). > > > > > > The feature test would always return false at the moment, so remove it > > > from the output path. > > > > What about other callers of dst_allfrag() ? > > > > This feature seems broken atm. > > It is broken as far as I can tell. The reason I removed just one > caller here is to keep the code simpler and consistent. If I don't do > so, I ought to test it for both GSO fast path and slow path to be > logically consistent. Seems an overkill to me. For the removal of the > rest, I'd hope it could come in as a standalone patch(set) because it > is not just callers but also those unnecessary flags and tests on IP > corks and sockets, not quite aligned with this patch's intention. I > noted you have drafted something like this in the past: > > https://lkml.kernel.org/netdev/1335348157.3274.30.camel@edumazet-glaptop/ > > I guess it might be a good base point to work on as a new patch(set)? > What's your call on this? > I am about to send a TCP patch series to enable usec TSval (instead of ms), and was planning to use a new RTAX_FEATURE_TCP_USEC_TS. I also noticed that iproute2 was not supporting RTAX_FEATURE_ALLFRAG, so we might kill it completely ? commit b258c87639f77d772c077a4e45dad602c62c9f1c Author: Eric Dumazet <edumazet@google.com> Date: Wed Oct 18 09:33:38 2023 +0000 tcp: add RTAX_FEATURE_TCP_USEC_TS This new dst feature flag will be used to allow TCP to use usec based timestamps instead of msec ones. ip route .... feature tcp_usec_ts Also document that RTAX_FEATURE_SACK and RTAX_FEATURE_TIMESTAMP are unused. Note that iproute2 does yet support RTAX_FEATURE_ALLFRAG ? Signed-off-by: Eric Dumazet <edumazet@google.com> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 51c13cf9c5aee4a2d1ab33c1a89043383d67b9cf..aa2482a0614aa685590fcc73819cbe1baac63d66 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -502,13 +502,17 @@ enum { #define RTAX_MAX (__RTAX_MAX - 1) -#define RTAX_FEATURE_ECN (1 << 0) -#define RTAX_FEATURE_SACK (1 << 1) -#define RTAX_FEATURE_TIMESTAMP (1 << 2) -#define RTAX_FEATURE_ALLFRAG (1 << 3) - -#define RTAX_FEATURE_MASK (RTAX_FEATURE_ECN | RTAX_FEATURE_SACK | \ - RTAX_FEATURE_TIMESTAMP | RTAX_FEATURE_ALLFRAG) +#define RTAX_FEATURE_ECN (1 << 0) +#define RTAX_FEATURE_SACK (1 << 1) /* unused */ +#define RTAX_FEATURE_TIMESTAMP (1 << 2) /* unused */ +#define RTAX_FEATURE_ALLFRAG (1 << 3) +#define RTAX_FEATURE_TCP_USEC_TS (1 << 4) + +#define RTAX_FEATURE_MASK (RTAX_FEATURE_ECN | \ + RTAX_FEATURE_SACK | \ + RTAX_FEATURE_TIMESTAMP | \ + RTAX_FEATURE_ALLFRAG | \ + RTAX_FEATURE_TCP_USEC_TS) struct rta_session { __u8 proto;
Eric Dumazet <edumazet@google.com> wrote: > I also noticed that iproute2 was not supporting RTAX_FEATURE_ALLFRAG, > so we might kill it completely ? Yes, I intentionally did not expose it in iproute2. Lets remove ALLFRAG.
On Fri, Oct 20, 2023 at 5:01 AM Florian Westphal <fw@strlen.de> wrote: > > Eric Dumazet <edumazet@google.com> wrote: > > I also noticed that iproute2 was not supporting RTAX_FEATURE_ALLFRAG, > > so we might kill it completely ? > > Yes, I intentionally did not expose it in iproute2. > > Lets remove ALLFRAG. I will do that in V4 later today to completely clean it up then. Always cheerful to delete code! Yan
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a471c7e91761..ae87a3817d4a 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -189,7 +189,6 @@ static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu); if ((skb->len > mtu && !skb_is_gso(skb)) || - dst_allfrag(skb_dst(skb)) || (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size)) return ip6_fragment(net, sk, skb, ip6_finish_output2); else
dst_allfrag was added before the first git commit: https://www.mail-archive.com/bk-commits-head@vger.kernel.org/msg03399.html The feature would send packets to the fragmentation path if a box receives a PMTU value with less than 1280 byte. However, since commit 9d289715eb5c ("ipv6: stop sending PTB packets for MTU < 1280"), such message would be simply discarded. The feature flag is neither supported in iproute2 utility. In theory one can still manipulate it with direct netlink message, but it is not ideal because it was based on obsoleted guidance of RFC-2460 (replaced by RFC-8200). The feature test would always return false at the moment, so remove it from the output path. Signed-off-by: Yan Zhai <yan@cloudflare.com> --- net/ipv6/ip6_output.c | 1 - 1 file changed, 1 deletion(-)