netpoll: support sending over raw IP interfaces

Message ID	20240313124613.51399-1-mark@yotsuba.nl (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from wfhigh1-smtp.messagingengine.com (wfhigh1-smtp.messagingengine.com [64.147.123.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D0641392; Wed, 13 Mar 2024 12:46:27 +0000 (UTC) Feedback-ID: i85e1472c:Fastmail From: Mark Cilissen <mark@yotsuba.nl> To: netdev@vger.kernel.org Cc: Mark Cilissen <mark@yotsuba.nl>, Hans de Goede <hdegoede@redhat.com>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Breno Leitao <leitao@debian.org>, Ingo Molnar <mingo@redhat.com>, "David S. Miller" <davem@davemloft.net>, Paolo Abeni <pabeni@redhat.com>, linux-kernel@vger.kernel.org Subject: [PATCH] netpoll: support sending over raw IP interfaces Date: Wed, 13 Mar 2024 13:46:13 +0100 Message-ID: <20240313124613.51399-1-mark@yotsuba.nl> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	netpoll: support sending over raw IP interfaces \| expand netpoll: support sending over raw IP interfaces

Context	Check	Description
netdev/series_format	warning	Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 942 this patch: 942
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	warning	2 maintainers not CCed: linux-doc@vger.kernel.org corbet@lwn.net
netdev/build_clang	success	Errors and warnings before: 957 this patch: 957
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 959 this patch: 959
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 62 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-03-14--12-00 (tests: 908)

Mark March 13, 2024, 12:46 p.m. UTC

Currently, netpoll only supports interfaces with an ethernet-compatible
link layer. Certain interfaces like SLIP do not have a link layer
on the network interface level at all and expect raw IP packets,
and could benefit from being supported by netpoll.

This commit adds support for such interfaces by using the network device's
`hard_header_len` field as an indication that no link layer is present.
If that is the case we simply skip adding the ethernet header, causing
a raw IP packet to be sent over the interface. This has been confirmed
to add netconsole support to at least SLIP and WireGuard interfaces.

Signed-off-by: Mark Cilissen <mark@yotsuba.nl>
---
 Documentation/networking/netconsole.rst |  3 ++-
 net/core/netpoll.c                      | 30 ++++++++++++++++---------
 2 files changed, 22 insertions(+), 11 deletions(-)

Ratheesh Kannoth March 13, 2024, 1:36 p.m. UTC | #1

On 2024-03-13 at 18:16:13, Mark Cilissen (mark@yotsuba.nl) wrote:
> Currently, netpoll only supports interfaces with an ethernet-compatible
> link layer. Certain interfaces like SLIP do not have a link layer
> on the network interface level at all and expect raw IP packets,
> and could benefit from being supported by netpoll.
>
> This commit adds support for such interfaces by using the network device's
> `hard_header_len` field as an indication that no link layer is present.
> If that is the case we simply skip adding the ethernet header, causing
> a raw IP packet to be sent over the interface. This has been confirmed
> to add netconsole support to at least SLIP and WireGuard interfaces.
>
> Signed-off-by: Mark Cilissen <mark@yotsuba.nl>
> ---
>  Documentation/networking/netconsole.rst |  3 ++-
>  net/core/netpoll.c                      | 30 ++++++++++++++++---------
>  2 files changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst
> index d55c2a22ec7a..434ce0366027 100644
> --- a/Documentation/networking/netconsole.rst
> +++ b/Documentation/networking/netconsole.rst
> @@ -327,4 +327,5 @@ enable the logging of even the most critical kernel bugs. It works
>  from IRQ contexts as well, and does not enable interrupts while
>  sending packets. Due to these unique needs, configuration cannot
>  be more automatic, and some fundamental limitations will remain:
> -only IP networks, UDP packets and ethernet devices are supported.
> +only UDP packets over IP networks, over ethernet links if a
> +hardware layer is required, are supported.
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 543007f159f9..0299fb71b456 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -399,7 +399,7 @@ EXPORT_SYMBOL(netpoll_send_skb);
>
>  void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
>  {
> -	int total_len, ip_len, udp_len;
> +	int total_len, ip_len, udp_len, link_len;
>  	struct sk_buff *skb;
>  	struct udphdr *udph;
>  	struct iphdr *iph;
> @@ -416,7 +416,10 @@ void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
>  	else
>  		ip_len = udp_len + sizeof(*iph);
>
> -	total_len = ip_len + LL_RESERVED_SPACE(np->dev);
> +	/* if there's a hardware header assume ethernet, else raw IP */
Taking an assumption based on dev's lower layer does not look to be good.
why not transmit packet from skb_network_header() in your driver (by making
changes in your driver)

> +	eth = NULL;
> +	link_len = np->dev->hard_header_len ? LL_RESERVED_SPACE(np->dev) : 0;
> +	total_len = ip_len + link_len;
>
>  	skb = find_skb(np, total_len + np->dev->needed_tailroom,
>  		       total_len - len);
> @@ -458,9 +461,11 @@ void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
>  		ip6h->saddr = np->local_ip.in6;
>  		ip6h->daddr = np->remote_ip.in6;
>
> -		eth = skb_push(skb, ETH_HLEN);
> -		skb_reset_mac_header(skb);
> -		skb->protocol = eth->h_proto = htons(ETH_P_IPV6);
> +		skb->protocol = htons(ETH_P_IPV6);
> +		if (link_len) {
> +			eth = skb_push(skb, ETH_HLEN);
> +			skb_reset_mac_header(skb);
> +		}
>  	} else {
>  		udph->check = 0;
>  		udph->check = csum_tcpudp_magic(np->local_ip.ip,
> @@ -487,13 +492,18 @@ void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
>  		put_unaligned(np->remote_ip.ip, &(iph->daddr));
>  		iph->check    = ip_fast_csum((unsigned char *)iph, iph->ihl);
>
> -		eth = skb_push(skb, ETH_HLEN);
> -		skb_reset_mac_header(skb);
> -		skb->protocol = eth->h_proto = htons(ETH_P_IP);
> +		skb->protocol = htons(ETH_P_IP);
> +		if (link_len) {
> +			eth = skb_push(skb, ETH_HLEN);
> +			skb_reset_mac_header(skb);
> +		}
>  	}
>
> -	ether_addr_copy(eth->h_source, np->dev->dev_addr);
> -	ether_addr_copy(eth->h_dest, np->remote_mac);
> +	if (eth) {
> +		eth->h_proto = skb->protocol;
> +		ether_addr_copy(eth->h_source, np->dev->dev_addr);
> +		ether_addr_copy(eth->h_dest, np->remote_mac);
> +	}
>
>  	skb->dev = np->dev;
>
> --
> 2.43.0
>

Mark March 13, 2024, 1:53 p.m. UTC | #2

Hi Ratheesh,

> Op 13 mrt 6 Reiwa, om 14:36 heeft Ratheesh Kannoth <rkannoth@marvell.com> het volgende geschreven:
> 
> On 2024-03-13 at 18:16:13, Mark Cilissen (mark@yotsuba.nl) wrote:
>> […]
> Taking an assumption based on dev’s lower layer does not look to be good.
> why not transmit packet from skb_network_header() in your driver (by making
> changes in your driver)

There’s two assumptions at play here:
- The lower layer is ethernet: this has always been present in netpoll, and is even
  documented in netconsole.rst. This comment just mentions it because we add a way
  to bypass the assumption; it is not an assumption this patch adds to the code.
- hard_header_len==0 means that there is no exposed link layer: this is a rather
  conservative assumption in my opinion, and is also mentioned in the definition
  of LL_RESERVED_SPACE:

> * Alternative is:
> *   dev->hard_header_len ? (dev->hard_header_len +
> *                            (HH_DATA_MOD - 1)) & ~(HH_DATA_MOD - 1) : 0

  The same assumption is also made in more places in the core network code, like af_packet:

>   - If the device has no dev->header_ops->create, there is no LL header
>     visible above the device. In this case, its hard_header_len should be 0.
>     The device may prepend its own header internally. In this case, its
>     needed_headroom should be set to the space needed for it to add its
>     internal header.

I could change it to, like af_packet, check `dev->header_ops` instead if that is preferred,
but I don’t think that patching every single raw IP driver to deal with skbs that managed
to somehow have link layer data would be preferred here, especially since netpoll is kind
of a special case to begin with. I am open to suggestions and ideas, though.

> […]

Thanks and regards,
Mark

Ratheesh Kannoth March 14, 2024, 2:46 a.m. UTC | #3

> From: Mark <mark@yotsuba.nl>
> Sent: Wednesday, March 13, 2024 7:23 PM
> To: Ratheesh Kannoth <rkannoth@marvell.com>
> > On 2024-03-13 at 18:16:13, Mark Cilissen (mark@yotsuba.nl) wrote:
> >> […]
> > Taking an assumption based on dev’s lower layer does not look to be good.
> > why not transmit packet from skb_network_header() in your driver (by
> > making changes in your driver)
> 
> There’s two assumptions at play here:
> - The lower layer is ethernet: this has always been present in netpoll, and is
> even
>   documented in netconsole.rst. This comment just mentions it because we
> add a way
>   to bypass the assumption; it is not an assumption this patch adds to the
> code.
> - hard_header_len==0 means that there is no exposed link layer: this is a rather
>   conservative assumption in my opinion, and is also mentioned in the
> definition

Hmm.  That is not my question.   Let me explain it in detail. Netconsole is using netpoll_send_udp() to encapsulate the msg over 
UDP/IP/ MAC headers. Job well done. Now it calls netdev->ops->ndo_start_xmit(skb, dev).  If your driver is well aware that you can
Transmit only from network header, why don’t you dma map from network header ?  

>   of LL_RESERVED_SPACE:
> 
> > * Alternative is:
> > *   dev->hard_header_len ? (dev->hard_header_len +
> > *                            (HH_DATA_MOD - 1)) & ~(HH_DATA_MOD - 1) : 0
> 
>   The same assumption is also made in more places in the core network code,
> like af_packet:
> 
> >   - If the device has no dev->header_ops->create, there is no LL header
> >     visible above the device. In this case, its hard_header_len should be 0.
> >     The device may prepend its own header internally. In this case, its
> >     needed_headroom should be set to the space needed for it to add its
> >     internal header.
> 
ACK.

Paolo Abeni March 14, 2024, 12:48 p.m. UTC | #4

On Wed, 2024-03-13 at 13:46 +0100, Mark Cilissen wrote:
> Currently, netpoll only supports interfaces with an ethernet-compatible
> link layer. Certain interfaces like SLIP do not have a link layer
> on the network interface level at all and expect raw IP packets,
> and could benefit from being supported by netpoll.
> 
> This commit adds support for such interfaces by using the network device's
> `hard_header_len` field as an indication that no link layer is present.
> If that is the case we simply skip adding the ethernet header, causing
> a raw IP packet to be sent over the interface. This has been confirmed
> to add netconsole support to at least SLIP and WireGuard interfaces.
> 
> Signed-off-by: Mark Cilissen <mark@yotsuba.nl>

## Form letter - net-next-closed

The merge window for v6.9 has begun and we have already posted our pull
request. Therefore net-next is closed for new drivers, features, code
refactoring and optimizations. We are currently accepting bug fixes
only.

Please repost when net-next reopens after March 25th.

RFC patches sent for review only are obviously welcome at any time.

See:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle

--
pw-bot: defer

Jakub Kicinski March 14, 2024, 6:34 p.m. UTC | #5

On Wed, 13 Mar 2024 13:46:13 +0100 Mark Cilissen wrote:
> Currently, netpoll only supports interfaces with an ethernet-compatible
> link layer. Certain interfaces like SLIP do not have a link layer
> on the network interface level at all and expect raw IP packets,
> and could benefit from being supported by netpoll.
> 
> This commit adds support for such interfaces by using the network device's
> `hard_header_len` field as an indication that no link layer is present.
> If that is the case we simply skip adding the ethernet header, causing
> a raw IP packet to be sent over the interface. This has been confirmed
> to add netconsole support to at least SLIP and WireGuard interfaces.

Would be great if this could come with a simple selftest under
tools/testing/selftests/net. Preferably using some simple tunnel
device, rather than wg to limit tooling dependencies ;)

Mark March 18, 2024, 11:43 a.m. UTC | #6

Hi Ratheesh,

> Op 14 mrt 6 Reiwa, om 03:46 heeft Ratheesh Kannoth <rkannoth@marvell.com> het volgende geschreven:
> 
>> From: Mark <mark@yotsuba.nl>
>> […]
> 
> Hmm.  That is not my question.   Let me explain it in detail. Netconsole is using netpoll_send_udp() to encapsulate the msg over 
> UDP/IP/ MAC headers. Job well done. Now it calls netdev->ops->ndo_start_xmit(skb, dev).  If your driver is well aware that you can
> Transmit only from network header, why don’t you dma map from network header ?  

The rest of the network subsystem seems to not add a header to skbs submitted
to netdev->ops->ndo_start_xmit() at all, which makes sense considering
netdev->header_ops is either NULL or no-op for these devices.

Following this line of reasoning, from API perspective it made more sense
to me for netpoll to not submit ‘bogus’ skbs that are out-of-line with what
the rest of the network subsystem does to ndo_start_xmit() to begin with.
It really depends on the API guarantees we want to have for netdev,
but personally I'm wary of introducing an allowance for bogus headers.

Additionally from a practical perspective, this would require changing almost
every, if not every, IP interface driver. I took a look at the WireGuard
driver to see what it would entail, and from my limited experience with the
networking code it seems like there's some quite annoying interactions with
e.g. GSO which would make driver-side handling of such packets quite a bit
more complex.

So from my perspective, fixing this in netpoll is both the more API-correct
change and introduces the least amount of additional complexity.

> […]

Thanks and regards,
Mark

Mark March 18, 2024, 11:47 a.m. UTC | #7

Hi Jakub,

> Op 14 mrt 6 Reiwa, om 19:34 heeft Jakub Kicinski <kuba@kernel.org> het volgende geschreven:
> 
> On Wed, 13 Mar 2024 13:46:13 +0100 Mark Cilissen wrote:
>> Currently, netpoll only supports interfaces with an ethernet-compatible
>> link layer. Certain interfaces like SLIP do not have a link layer
>> on the network interface level at all and expect raw IP packets,
>> and could benefit from being supported by netpoll.
>> 
>> This commit adds support for such interfaces by using the network device's
>> `hard_header_len` field as an indication that no link layer is present.
>> If that is the case we simply skip adding the ethernet header, causing
>> a raw IP packet to be sent over the interface. This has been confirmed
>> to add netconsole support to at least SLIP and WireGuard interfaces.
> 
> Would be great if this could come with a simple selftest under
> tools/testing/selftests/net. Preferably using some simple tunnel
> device, rather than wg to limit tooling dependencies ;)

Yes, that makes a lot of sense. I wrote a selftest that tests netconsole
using IPIP and GRE tunnels, and they too seem to work correctly and pass (and
more importantly, fail without the patch ;-). Would you prefer me to submit
a v2 with that test now, or when net-next reopens in a week?

Thanks and regards,
Mark

Ratheesh Kannoth March 18, 2024, 2:06 p.m. UTC | #8

> From: Mark <mark@yotsuba.nl>
> Sent: Monday, March 18, 2024 5:13 PM
> To: Ratheesh Kannoth <rkannoth@marvell.com>
> Cc: netdev@vger.kernel.org; Hans de Goede <hdegoede@redhat.com>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Breno
> Leitao <leitao@debian.org>; Ingo Molnar <mingo@redhat.com>; David S.
> Miller <davem@davemloft.net>; Paolo Abeni <pabeni@redhat.com>; linux-
> kernel@vger.kernel.org
> Subject: Re: [EXTERNAL] [PATCH] netpoll: support sending over raw IP
> interfaces
> 
> 
> Hi Ratheesh,
> 
> > Op 14 mrt 6 Reiwa, om 03:46 heeft Ratheesh Kannoth
> <rkannoth@marvell.com> het volgende geschreven:
> >
> >> From: Mark <mark@yotsuba.nl>
> >> […]
> >
> > Hmm.  That is not my question.   Let me explain it in detail. Netconsole is
> using netpoll_send_udp() to encapsulate the msg over
> > UDP/IP/ MAC headers. Job well done. Now it calls
> > netdev->ops->ndo_start_xmit(skb, dev).  If your driver is well aware that
> you can Transmit only from network header, why don’t you dma map from
> network header ?
> 
> The rest of the network subsystem seems to not add a header to skbs
> submitted to netdev->ops->ndo_start_xmit() at all, which makes sense
> considering
> netdev->header_ops is either NULL or no-op for these devices.
> 
> Following this line of reasoning, from API perspective it made more sense to
> me for netpoll to not submit ‘bogus’ skbs that are out-of-line with what the
> rest of the network subsystem does to ndo_start_xmit() to begin with.
> It really depends on the API guarantees we want to have for netdev, but
> personally I'm wary of introducing an allowance for bogus headers.
>
Is below network topology possible ?  
Netpoll()- ------> netdev A ----> raw interface 
Where netdev A's  netdev->header_ops != NULL


> Additionally from a practical perspective, this would require changing almost
> every, if not every, IP interface driver. I took a look at the WireGuard driver to
> see what it would entail, and from my limited experience with the networking
> code it seems like there's some quite annoying interactions with e.g. GSO
> which would make driver-side handling of such packets quite a bit more
> complex.
ACK.

> 
> So from my perspective, fixing this in netpoll is both the more API-correct
> change and introduces the least amount of additional complexity.
ACK.

Jakub Kicinski March 19, 2024, 5:10 p.m. UTC | #9

On Mon, 18 Mar 2024 12:47:46 +0100 Mark wrote:
> > Would be great if this could come with a simple selftest under
> > tools/testing/selftests/net. Preferably using some simple tunnel
> > device, rather than wg to limit tooling dependencies ;)  
> 
> Yes, that makes a lot of sense. I wrote a selftest that tests netconsole
> using IPIP and GRE tunnels, and they too seem to work correctly and pass (and
> more importantly, fail without the patch ;-). Would you prefer me to submit
> a v2 with that test now, or when net-next reopens in a week?

Feel free to post as [RFC net-next v2] now. I can't guarantee people
will find the time to review, but it shouldn't hurt to have it on 
the list :)

Mark March 21, 2024, 12:33 p.m. UTC | #10

Hi Ratheesh,

> Op 18 mrt 6 Reiwa, om 15:06 heeft Ratheesh Kannoth <rkannoth@marvell.com> het volgende geschreven:
> 
>> […]
> Is below network topology possible ?  
> Netpoll()- ------> netdev A ----> raw interface 
> Where netdev A's  netdev->header_ops != NULL

I believe so, this is not uncommon in tunnel devices like gretap.
However, those fully encapsulate the link layer header in the packet
to the lower interface. I am not aware of a interface driver that removes
a header upon xmit, so to speak. However, I have just posted a v2 that
instead uses the documented `dev_has_header()` API, which seems to fit the
check exactly, here:

https://lore.kernel.org/netdev/20240321122003.20089-1-mark@yotsuba.nl/T/

I hope this change manages to mitigate your concerns. :-)

> […]

Thanks and regards,
Mark

Ratheesh Kannoth March 22, 2024, 3:33 a.m. UTC | #11

> From: Mark <mark@yotsuba.nl>
> Sent: Thursday, March 21, 2024 6:03 PM
> To: Ratheesh Kannoth <rkannoth@marvell.com>
> Cc: netdev@vger.kernel.org; Hans de Goede <hdegoede@redhat.com>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Breno
> Leitao <leitao@debian.org>; Ingo Molnar <mingo@redhat.com>; David S.
> Miller <davem@davemloft.net>; Paolo Abeni <pabeni@redhat.com>; linux-
> kernel@vger.kernel.org
> Subject: Re: [EXTERNAL] [PATCH] netpoll: support sending over raw IP
> interfaces
> 
> Hi Ratheesh,
> 
> > Op 18 mrt 6 Reiwa, om 15:06 heeft Ratheesh Kannoth
> <rkannoth@marvell.com> het volgende geschreven:
> >
> >> […]
> > Is below network topology possible ?
> > Netpoll()- ------> netdev A ----> raw interface Where netdev A's
> > netdev->header_ops != NULL
> 
> I believe so, this is not uncommon in tunnel devices like gretap.
> However, those fully encapsulate the link layer header in the packet to the
> lower interface. I am not aware of a interface driver that removes a header
> upon xmit, so to speak. However, I have just posted a v2 that instead uses the
> documented `dev_has_header()` API, which seems to fit the check exactly,
> here:
ACK.  I understand the complexity to implement it in driver than a easy fix in netpoll.

netpoll: support sending over raw IP interfaces

Checks

Commit Message

Comments

Patch