[net-next,v3,10/24] ovpn: implement basic RX path (UDP)

Message ID	20240506011637.27272-11-antonio@openvpn.net (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 527053CF6A for <netdev@vger.kernel.org>; Mon, 6 May 2024 01:15:45 +0000 (UTC) From: Antonio Quartulli <antonio@openvpn.net> To: netdev@vger.kernel.org Cc: Jakub Kicinski <kuba@kernel.org>, Sergey Ryazanov <ryazanov.s.a@gmail.com>, Paolo Abeni <pabeni@redhat.com>, Eric Dumazet <edumazet@google.com>, Andrew Lunn <andrew@lunn.ch>, Esben Haabendal <esben@geanix.com>, Antonio Quartulli <antonio@openvpn.net> Subject: [PATCH net-next v3 10/24] ovpn: implement basic RX path (UDP) Date: Mon, 6 May 2024 03:16:23 +0200 Message-ID: <20240506011637.27272-11-antonio@openvpn.net> In-Reply-To: <20240506011637.27272-1-antonio@openvpn.net> References: <20240506011637.27272-1-antonio@openvpn.net> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Introducing OpenVPN Data Channel Offload \| expand [net-next,v3,00/24] Introducing OpenVPN Data Channel Offload [net-next,v3,01/24] netlink: add NLA_POLICY_MAX_LEN macro [net-next,v3,02/24] net: introduce OpenVPN Data Channel Offload (ovpn) [net-next,v3,03/24] ovpn: add basic netlink support [net-next,v3,04/24] ovpn: add basic interface creation/destruction/management routines [net-next,v3,05/24] ovpn: implement interface creation/destruction via netlink [net-next,v3,06/24] ovpn: keep carrier always on [net-next,v3,07/24] ovpn: introduce the ovpn_peer object [net-next,v3,08/24] ovpn: introduce the ovpn_socket object [net-next,v3,09/24] ovpn: implement basic TX path (UDP) [net-next,v3,10/24] ovpn: implement basic RX path (UDP) [net-next,v3,11/24] ovpn: implement packet processing [net-next,v3,12/24] ovpn: store tunnel and transport statistics [net-next,v3,13/24] ovpn: implement TCP transport [net-next,v3,14/24] ovpn: implement multi-peer support [net-next,v3,15/24] ovpn: implement peer lookup logic [net-next,v3,16/24] ovpn: implement keepalive mechanism [net-next,v3,17/24] ovpn: add support for updating local UDP endpoint [net-next,v3,18/24] ovpn: add support for peer floating [net-next,v3,19/24] ovpn: implement peer add/dump/delete via netlink [net-next,v3,20/24] ovpn: implement key add/del/swap via netlink [net-next,v3,21/24] ovpn: kill key and notify userspace in case of IV exhaustion [net-next,v3,22/24] ovpn: notify userspace when a peer is deleted [net-next,v3,23/24] ovpn: add basic ethtool support [net-next,v3,24/24] testing/selftest: add test tool and scripts for ovpn module

Context	Check	Description
netdev/series_format	fail	Series longer than 15 patches
netdev/tree_selection	success	Clearly marked for net-next, async
netdev/ynl	success	Generated files up to date; no warnings/errors; GEN HAS DIFF 2 files changed, 2613 insertions(+);
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 932 this patch: 932
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	warning	1 maintainers not CCed: openvpn-devel@lists.sourceforge.net
netdev/build_clang	success	Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 944 this patch: 944
netdev/checkpatch	warning	WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 1 this patch: 1
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-05-07--03-00 (tests: 1000)

Antonio Quartulli May 6, 2024, 1:16 a.m. UTC

Packets received over the socket are forwarded to the user device.

Implementation is UDP only. TCP will be added by a later patch.

Note: no decryption/decapsulation exists yet, packets are forwarded as
they arrive without much processing.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/io.c     | 114 +++++++++++++++++++++++++++++++++-
 drivers/net/ovpn/io.h     |   5 ++
 drivers/net/ovpn/peer.c   |   9 +++
 drivers/net/ovpn/peer.h   |   2 +
 drivers/net/ovpn/proto.h  | 115 +++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/socket.c |  24 ++++++++
 drivers/net/ovpn/udp.c    | 125 +++++++++++++++++++++++++++++++++++++-
 drivers/net/ovpn/udp.h    |   6 ++
 8 files changed, 397 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ovpn/proto.h

Sabrina Dubroca May 10, 2024, 1:45 p.m. UTC | #1

2024-05-06, 03:16:23 +0200, Antonio Quartulli wrote:
> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> index 36cfb95edbf4..9935a863bffe 100644
> --- a/drivers/net/ovpn/io.c
> +++ b/drivers/net/ovpn/io.c
> +/* Called after decrypt to write the IP packet to the device.
> + * This method is expected to manage/free the skb.
> + */
> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
> +{
> +	/* packet integrity was verified on the VPN layer - no need to perform
> +	 * any additional check along the stack

But it could have been corrupted before it got into the VPN?

> +	 */
> +	skb->ip_summed = CHECKSUM_UNNECESSARY;
> +	skb->csum_level = ~0;
> +

[...]
> +int ovpn_napi_poll(struct napi_struct *napi, int budget)
> +{
> +	struct ovpn_peer *peer = container_of(napi, struct ovpn_peer, napi);
> +	struct sk_buff *skb;
> +	int work_done = 0;
> +
> +	if (unlikely(budget <= 0))
> +		return 0;
> +	/* this function should schedule at most 'budget' number of
> +	 * packets for delivery to the interface.
> +	 * If in the queue we have more packets than what allowed by the
> +	 * budget, the next polling will take care of those
> +	 */
> +	while ((work_done < budget) &&
> +	       (skb = ptr_ring_consume_bh(&peer->netif_rx_ring))) {
> +		ovpn_netdev_write(peer, skb);
> +		work_done++;
> +	}
> +
> +	if (work_done < budget)
> +		napi_complete_done(napi, work_done);
> +
> +	return work_done;
> +}

Why not use gro_cells? It would avoid all that napi polling and
netif_rx_ring code (and it's per-cpu, going back to our other
discussion around napi).


> diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h
> new file mode 100644
> index 000000000000..0a51104ed931
> --- /dev/null
> +++ b/drivers/net/ovpn/proto.h
[...]
> +/**
> + * ovpn_key_id_from_skb - extract key ID from the skb head
> + * @skb: the packet to extract the key ID code from
> + *
> + * Note: this function assumes that the skb head was pulled enough
> + * to access the first byte.
> + *
> + * Return: the key ID
> + */
> +static inline u8 ovpn_key_id_from_skb(const struct sk_buff *skb)

> +static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id)

(tiny nit: those aren't used yet in this patch. probably not worth
moving them into the right patch.)


> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
> index f434da76dc0a..07182703e598 100644
> --- a/drivers/net/ovpn/udp.c
> +++ b/drivers/net/ovpn/udp.c
> @@ -20,9 +20,117 @@
>  #include "bind.h"
>  #include "io.h"
>  #include "peer.h"
> +#include "proto.h"
>  #include "socket.h"
>  #include "udp.h"
>  
> +/**
> + * ovpn_udp_encap_recv - Start processing a received UDP packet.
> + * @sk: socket over which the packet was received
> + * @skb: the received packet
> + *
> + * If the first byte of the payload is DATA_V2, the packet is further processed,
> + * otherwise it is forwarded to the UDP stack for delivery to user space.
> + *
> + * Return:
> + *  0 if skb was consumed or dropped
> + * >0 if skb should be passed up to userspace as UDP (packet not consumed)
> + * <0 if skb should be resubmitted as proto -N (packet not consumed)
> + */
> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
> +{
> +	struct ovpn_peer *peer = NULL;
> +	struct ovpn_struct *ovpn;
> +	u32 peer_id;
> +	u8 opcode;
> +	int ret;
> +
> +	ovpn = ovpn_from_udp_sock(sk);
> +	if (unlikely(!ovpn)) {
> +		net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n",
> +				    __func__);
> +		goto drop;
> +	}
> +
> +	/* Make sure the first 4 bytes of the skb data buffer after the UDP
> +	 * header are accessible.
> +	 * They are required to fetch the OP code, the key ID and the peer ID.
> +	 */
> +	if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + 4))) {

Is this OVPN_OP_SIZE_V2?

> +		net_dbg_ratelimited("%s: packet too small\n", __func__);
> +		goto drop;
> +	}
> +
> +	opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
> +	if (unlikely(opcode != OVPN_DATA_V2)) {
> +		/* DATA_V1 is not supported */
> +		if (opcode == OVPN_DATA_V1)
> +			goto drop;
> +
> +		/* unknown or control packet: let it bubble up to userspace */
> +		return 1;
> +	}
> +
> +	peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
> +	/* some OpenVPN server implementations send data packets with the
> +	 * peer-id set to undef. In this case we skip the peer lookup by peer-id
> +	 * and we try with the transport address
> +	 */
> +	if (peer_id != OVPN_PEER_ID_UNDEF) {
> +		peer = ovpn_peer_get_by_id(ovpn, peer_id);
> +		if (!peer) {
> +			net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
> +					    __func__, peer_id);
> +			goto drop;
> +		}
> +	}
> +
> +	if (!peer) {
> +		/* data packet with undef peer-id */
> +		peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
> +		if (unlikely(!peer)) {
> +			netdev_dbg(ovpn->dev,
> +				   "%s: received data with undef peer-id from unknown source\n",
> +				   __func__);

_ratelimited?

> +			goto drop;
> +		}
> +	}
> +
> +	/* At this point we know the packet is from a configured peer.
> +	 * DATA_V2 packets are handled in kernel space, the rest goes to user
> +	 * space.
> +	 *
> +	 * Return 1 to instruct the stack to let the packet bubble up to
> +	 * userspace
> +	 */
> +	if (unlikely(opcode != OVPN_DATA_V2)) {

You already handled those earlier, before getting the peer.


[...]
> @@ -255,10 +368,20 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
>  			return -EALREADY;
>  		}
>  
> -		netdev_err(ovpn->dev, "%s: provided socket already taken by other user\n",
> +		netdev_err(ovpn->dev,
> +			   "%s: provided socket already taken by other user\n",

I guess you meant to break that line in the patch that introduced it,
rather than here? :)


> +void ovpn_udp_socket_detach(struct socket *sock)
> +{
> +	struct udp_tunnel_sock_cfg cfg = { };
> +
> +	setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);

I can't find anything in the kernel currently using
setup_udp_tunnel_sock the way you're using it here.

Does this provide any benefit compared to just letting the kernel
disable encap when the socket goes away? Are you planning to detach
and then re-attach the same socket?

Antonio Quartulli May 10, 2024, 2:41 p.m. UTC | #2

On 10/05/2024 15:45, Sabrina Dubroca wrote:
> 2024-05-06, 03:16:23 +0200, Antonio Quartulli wrote:
>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
>> index 36cfb95edbf4..9935a863bffe 100644
>> --- a/drivers/net/ovpn/io.c
>> +++ b/drivers/net/ovpn/io.c
>> +/* Called after decrypt to write the IP packet to the device.
>> + * This method is expected to manage/free the skb.
>> + */
>> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
>> +{
>> +	/* packet integrity was verified on the VPN layer - no need to perform
>> +	 * any additional check along the stack
> 
> But it could have been corrupted before it got into the VPN?

It could, but I believe a VPN should only take care of integrity along 
its tunnel (and this is guaranteed by the OpenVPN protocol).
If something corrupted enters the tunnel, we will just deliver it as is 
to the other end. Upper layers (where the corruption actually happened) 
have to deal with that.

> 
>> +	 */
>> +	skb->ip_summed = CHECKSUM_UNNECESSARY;
>> +	skb->csum_level = ~0;
>> +
> 
> [...]
>> +int ovpn_napi_poll(struct napi_struct *napi, int budget)
>> +{
>> +	struct ovpn_peer *peer = container_of(napi, struct ovpn_peer, napi);
>> +	struct sk_buff *skb;
>> +	int work_done = 0;
>> +
>> +	if (unlikely(budget <= 0))
>> +		return 0;
>> +	/* this function should schedule at most 'budget' number of
>> +	 * packets for delivery to the interface.
>> +	 * If in the queue we have more packets than what allowed by the
>> +	 * budget, the next polling will take care of those
>> +	 */
>> +	while ((work_done < budget) &&
>> +	       (skb = ptr_ring_consume_bh(&peer->netif_rx_ring))) {
>> +		ovpn_netdev_write(peer, skb);
>> +		work_done++;
>> +	}
>> +
>> +	if (work_done < budget)
>> +		napi_complete_done(napi, work_done);
>> +
>> +	return work_done;
>> +}
> 
> Why not use gro_cells?

First because I did not know they existed :-)

> It would avoid all that napi polling and
> netif_rx_ring code (and it's per-cpu, going back to our other
> discussion around napi).

This sounds truly appealing. And if we can make this per-cpu by design, 
I believe we can definitely drop the per-peer NAPI logic.

> 
> 
>> diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h
>> new file mode 100644
>> index 000000000000..0a51104ed931
>> --- /dev/null
>> +++ b/drivers/net/ovpn/proto.h
> [...]
>> +/**
>> + * ovpn_key_id_from_skb - extract key ID from the skb head
>> + * @skb: the packet to extract the key ID code from
>> + *
>> + * Note: this function assumes that the skb head was pulled enough
>> + * to access the first byte.
>> + *
>> + * Return: the key ID
>> + */
>> +static inline u8 ovpn_key_id_from_skb(const struct sk_buff *skb)
> 
>> +static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id)
> 
> (tiny nit: those aren't used yet in this patch. probably not worth
> moving them into the right patch.)

ouch. I am already going at a speed of 20-25rph (Rebases Per Hour).
It shouldn't be a problem to clean this up too.

> 
> 
>> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
>> index f434da76dc0a..07182703e598 100644
>> --- a/drivers/net/ovpn/udp.c
>> +++ b/drivers/net/ovpn/udp.c
>> @@ -20,9 +20,117 @@
>>   #include "bind.h"
>>   #include "io.h"
>>   #include "peer.h"
>> +#include "proto.h"
>>   #include "socket.h"
>>   #include "udp.h"
>>   
>> +/**
>> + * ovpn_udp_encap_recv - Start processing a received UDP packet.
>> + * @sk: socket over which the packet was received
>> + * @skb: the received packet
>> + *
>> + * If the first byte of the payload is DATA_V2, the packet is further processed,
>> + * otherwise it is forwarded to the UDP stack for delivery to user space.
>> + *
>> + * Return:
>> + *  0 if skb was consumed or dropped
>> + * >0 if skb should be passed up to userspace as UDP (packet not consumed)
>> + * <0 if skb should be resubmitted as proto -N (packet not consumed)
>> + */
>> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>> +{
>> +	struct ovpn_peer *peer = NULL;
>> +	struct ovpn_struct *ovpn;
>> +	u32 peer_id;
>> +	u8 opcode;
>> +	int ret;
>> +
>> +	ovpn = ovpn_from_udp_sock(sk);
>> +	if (unlikely(!ovpn)) {
>> +		net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n",
>> +				    __func__);
>> +		goto drop;
>> +	}
>> +
>> +	/* Make sure the first 4 bytes of the skb data buffer after the UDP
>> +	 * header are accessible.
>> +	 * They are required to fetch the OP code, the key ID and the peer ID.
>> +	 */
>> +	if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) + 4))) {
> 
> Is this OVPN_OP_SIZE_V2?

It is! I will use that define. thanks

> 
>> +		net_dbg_ratelimited("%s: packet too small\n", __func__);
>> +		goto drop;
>> +	}
>> +
>> +	opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
>> +	if (unlikely(opcode != OVPN_DATA_V2)) {
>> +		/* DATA_V1 is not supported */
>> +		if (opcode == OVPN_DATA_V1)
>> +			goto drop;
>> +
>> +		/* unknown or control packet: let it bubble up to userspace */
>> +		return 1;
>> +	}
>> +
>> +	peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
>> +	/* some OpenVPN server implementations send data packets with the
>> +	 * peer-id set to undef. In this case we skip the peer lookup by peer-id
>> +	 * and we try with the transport address
>> +	 */
>> +	if (peer_id != OVPN_PEER_ID_UNDEF) {
>> +		peer = ovpn_peer_get_by_id(ovpn, peer_id);
>> +		if (!peer) {
>> +			net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
>> +					    __func__, peer_id);
>> +			goto drop;
>> +		}
>> +	}
>> +
>> +	if (!peer) {
>> +		/* data packet with undef peer-id */
>> +		peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
>> +		if (unlikely(!peer)) {
>> +			netdev_dbg(ovpn->dev,
>> +				   "%s: received data with undef peer-id from unknown source\n",
>> +				   __func__);
> 
> _ratelimited?

makes sense. will use net_dbg_ratelimited

> 
>> +			goto drop;
>> +		}
>> +	}
>> +
>> +	/* At this point we know the packet is from a configured peer.
>> +	 * DATA_V2 packets are handled in kernel space, the rest goes to user
>> +	 * space.
>> +	 *
>> +	 * Return 1 to instruct the stack to let the packet bubble up to
>> +	 * userspace
>> +	 */
>> +	if (unlikely(opcode != OVPN_DATA_V2)) {
> 
> You already handled those earlier, before getting the peer.

ouch..you're right. This can just go.

> 
> 
> [...]
>> @@ -255,10 +368,20 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
>>   			return -EALREADY;
>>   		}
>>   
>> -		netdev_err(ovpn->dev, "%s: provided socket already taken by other user\n",
>> +		netdev_err(ovpn->dev,
>> +			   "%s: provided socket already taken by other user\n",
> 
> I guess you meant to break that line in the patch that introduced it,
> rather than here? :)

indeed.

> 
> 
>> +void ovpn_udp_socket_detach(struct socket *sock)
>> +{
>> +	struct udp_tunnel_sock_cfg cfg = { };
>> +
>> +	setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
> 
> I can't find anything in the kernel currently using
> setup_udp_tunnel_sock the way you're using it here.
> 
> Does this provide any benefit compared to just letting the kernel
> disable encap when the socket goes away? Are you planning to detach
> and then re-attach the same socket?

Technically, we don't know what happens to this socket after we detach.
We have no guarantee that it will be closed.

Right now we detach when the instance is closed, so it's likely that the 
socket will go, but I don't want to make hard assumptions about what 
userspace may decide to do with this socket in the future.

If it doesn't hurt, why not doing this easy cleanup?


Thanks!

>

Sabrina Dubroca July 18, 2024, 10:46 a.m. UTC | #3

Sorry Antonio, I'm only coming back to this now.

2024-05-10, 16:41:43 +0200, Antonio Quartulli wrote:
> On 10/05/2024 15:45, Sabrina Dubroca wrote:
> > 2024-05-06, 03:16:23 +0200, Antonio Quartulli wrote:
> > > diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> > > index 36cfb95edbf4..9935a863bffe 100644
> > > --- a/drivers/net/ovpn/io.c
> > > +++ b/drivers/net/ovpn/io.c
> > > +/* Called after decrypt to write the IP packet to the device.
> > > + * This method is expected to manage/free the skb.
> > > + */
> > > +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
> > > +{
> > > +	/* packet integrity was verified on the VPN layer - no need to perform
> > > +	 * any additional check along the stack
> > 
> > But it could have been corrupted before it got into the VPN?
> 
> It could, but I believe a VPN should only take care of integrity along its
> tunnel (and this is guaranteed by the OpenVPN protocol).
> If something corrupted enters the tunnel, we will just deliver it as is to
> the other end. Upper layers (where the corruption actually happened) have to
> deal with that.

I agree with that, but I don't think that's what CHECKSUM_UNNECESSARY
(especially with csum_level = MAX) would do. CHECKSUM_UNNECESSARY
tells the networking stack that the checksum has been verified (up to
csum_level+1, so 0 means the first level of TCP/UDP type headers has
been validated):

// include/linux/skbuff.h

 * - %CHECKSUM_UNNECESSARY
 *
 *   The hardware you're dealing with doesn't calculate the full checksum
 *   (as in %CHECKSUM_COMPLETE), but it does parse headers and verify checksums
 *   for specific protocols. For such packets it will set %CHECKSUM_UNNECESSARY
 *   if their checksums are okay.

 *   &sk_buff.csum_level indicates the number of consecutive checksums found in
 *   the packet minus one that have been verified as %CHECKSUM_UNNECESSARY.
 *   For instance if a device receives an IPv6->UDP->GRE->IPv4->TCP packet
 *   and a device is able to verify the checksums for UDP (possibly zero),
 *   GRE (checksum flag is set) and TCP, &sk_buff.csum_level would be set to
 *   two. If the device were only able to verify the UDP checksum and not
 *   GRE, either because it doesn't support GRE checksum or because GRE
 *   checksum is bad, skb->csum_level would be set to zero (TCP checksum is
 *   not considered in this case).

I think you want CHECKSUM_NONE:

 *   Device did not checksum this packet e.g. due to lack of capabilities.

Then the stack will check if the packet was corrupted.

> 
> > 
> > > +	 */
> > > +	skb->ip_summed = CHECKSUM_UNNECESSARY;
> > > +	skb->csum_level = ~0;
> > > +
> >

Antonio Quartulli July 18, 2024, 1:06 p.m. UTC | #4

On 18/07/2024 12:46, Sabrina Dubroca wrote:
> Sorry Antonio, I'm only coming back to this now.

No worries and thanks for fishing this email.

> 
> 2024-05-10, 16:41:43 +0200, Antonio Quartulli wrote:
>> On 10/05/2024 15:45, Sabrina Dubroca wrote:
>>> 2024-05-06, 03:16:23 +0200, Antonio Quartulli wrote:
>>>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
>>>> index 36cfb95edbf4..9935a863bffe 100644
>>>> --- a/drivers/net/ovpn/io.c
>>>> +++ b/drivers/net/ovpn/io.c
>>>> +/* Called after decrypt to write the IP packet to the device.
>>>> + * This method is expected to manage/free the skb.
>>>> + */
>>>> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
>>>> +{
>>>> +	/* packet integrity was verified on the VPN layer - no need to perform
>>>> +	 * any additional check along the stack
>>>
>>> But it could have been corrupted before it got into the VPN?
>>
>> It could, but I believe a VPN should only take care of integrity along its
>> tunnel (and this is guaranteed by the OpenVPN protocol).
>> If something corrupted enters the tunnel, we will just deliver it as is to
>> the other end. Upper layers (where the corruption actually happened) have to
>> deal with that.
> 
> I agree with that, but I don't think that's what CHECKSUM_UNNECESSARY
> (especially with csum_level = MAX) would do. CHECKSUM_UNNECESSARY
> tells the networking stack that the checksum has been verified (up to
> csum_level+1, so 0 means the first level of TCP/UDP type headers has
> been validated):
> 
> // include/linux/skbuff.h
> 
>   * - %CHECKSUM_UNNECESSARY
>   *
>   *   The hardware you're dealing with doesn't calculate the full checksum
>   *   (as in %CHECKSUM_COMPLETE), but it does parse headers and verify checksums
>   *   for specific protocols. For such packets it will set %CHECKSUM_UNNECESSARY
>   *   if their checksums are okay.
> 
>   *   &sk_buff.csum_level indicates the number of consecutive checksums found in
>   *   the packet minus one that have been verified as %CHECKSUM_UNNECESSARY.
>   *   For instance if a device receives an IPv6->UDP->GRE->IPv4->TCP packet
>   *   and a device is able to verify the checksums for UDP (possibly zero),
>   *   GRE (checksum flag is set) and TCP, &sk_buff.csum_level would be set to
>   *   two. If the device were only able to verify the UDP checksum and not
>   *   GRE, either because it doesn't support GRE checksum or because GRE
>   *   checksum is bad, skb->csum_level would be set to zero (TCP checksum is
>   *   not considered in this case).
> 
> I think you want CHECKSUM_NONE:
> 
>   *   Device did not checksum this packet e.g. due to lack of capabilities.
> 
> Then the stack will check if the packet was corrupted.

I went back to the wireguard code, which I used for inspiration for this 
specific part (we are dealing with the same problem here):

https://elixir.bootlin.com/linux/v6.10/source/drivers/net/wireguard/receive.c#L376

basically the idea is: with our encapsulation we can guarantee that what 
entered the tunnel is also exiting the tunnel, without corruption.
Therefore we claim that checksums are all correct.

Doesn't it make sense?

Cheers,

> 
>>
>>>
>>>> +	 */
>>>> +	skb->ip_summed = CHECKSUM_UNNECESSARY;
>>>> +	skb->csum_level = ~0;
>>>> +
>>>
>

Sabrina Dubroca July 18, 2024, 1:11 p.m. UTC | #5

2024-07-18, 15:06:19 +0200, Antonio Quartulli wrote:
> On 18/07/2024 12:46, Sabrina Dubroca wrote:
> > Sorry Antonio, I'm only coming back to this now.
> 
> No worries and thanks for fishing this email.
> 
> > 
> > 2024-05-10, 16:41:43 +0200, Antonio Quartulli wrote:
> > > On 10/05/2024 15:45, Sabrina Dubroca wrote:
> > > > 2024-05-06, 03:16:23 +0200, Antonio Quartulli wrote:
> > > > > diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> > > > > index 36cfb95edbf4..9935a863bffe 100644
> > > > > --- a/drivers/net/ovpn/io.c
> > > > > +++ b/drivers/net/ovpn/io.c
> > > > > +/* Called after decrypt to write the IP packet to the device.
> > > > > + * This method is expected to manage/free the skb.
> > > > > + */
> > > > > +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
> > > > > +{
> > > > > +	/* packet integrity was verified on the VPN layer - no need to perform
> > > > > +	 * any additional check along the stack
> > > > 
> > > > But it could have been corrupted before it got into the VPN?
> > > 
> > > It could, but I believe a VPN should only take care of integrity along its
> > > tunnel (and this is guaranteed by the OpenVPN protocol).
> > > If something corrupted enters the tunnel, we will just deliver it as is to
> > > the other end. Upper layers (where the corruption actually happened) have to
> > > deal with that.
> > 
> > I agree with that, but I don't think that's what CHECKSUM_UNNECESSARY
> > (especially with csum_level = MAX) would do. CHECKSUM_UNNECESSARY
> > tells the networking stack that the checksum has been verified (up to
> > csum_level+1, so 0 means the first level of TCP/UDP type headers has
> > been validated):
> > 
> > // include/linux/skbuff.h
> > 
> >   * - %CHECKSUM_UNNECESSARY
> >   *
> >   *   The hardware you're dealing with doesn't calculate the full checksum
> >   *   (as in %CHECKSUM_COMPLETE), but it does parse headers and verify checksums
> >   *   for specific protocols. For such packets it will set %CHECKSUM_UNNECESSARY
> >   *   if their checksums are okay.
> > 
> >   *   &sk_buff.csum_level indicates the number of consecutive checksums found in
> >   *   the packet minus one that have been verified as %CHECKSUM_UNNECESSARY.
> >   *   For instance if a device receives an IPv6->UDP->GRE->IPv4->TCP packet
> >   *   and a device is able to verify the checksums for UDP (possibly zero),
> >   *   GRE (checksum flag is set) and TCP, &sk_buff.csum_level would be set to
> >   *   two. If the device were only able to verify the UDP checksum and not
> >   *   GRE, either because it doesn't support GRE checksum or because GRE
> >   *   checksum is bad, skb->csum_level would be set to zero (TCP checksum is
> >   *   not considered in this case).
> > 
> > I think you want CHECKSUM_NONE:
> > 
> >   *   Device did not checksum this packet e.g. due to lack of capabilities.
> > 
> > Then the stack will check if the packet was corrupted.
> 
> I went back to the wireguard code, which I used for inspiration for this
> specific part (we are dealing with the same problem here):
> 
> https://elixir.bootlin.com/linux/v6.10/source/drivers/net/wireguard/receive.c#L376
> 
> basically the idea is: with our encapsulation we can guarantee that what
> entered the tunnel is also exiting the tunnel, without corruption.
> Therefore we claim that checksums are all correct.

Can you be sure that they were correct when they went into the tunnel?
If not, I think you have to set CHECKSUM_NONE.

Antonio Quartulli July 18, 2024, 1:27 p.m. UTC | #6

On 18/07/2024 15:11, Sabrina Dubroca wrote:
>> basically the idea is: with our encapsulation we can guarantee that what
>> entered the tunnel is also exiting the tunnel, without corruption.
>> Therefore we claim that checksums are all correct.
> 
> Can you be sure that they were correct when they went into the tunnel?
> If not, I think you have to set CHECKSUM_NONE.

I can't be sure, because on the sender side we don't validate checksums 
before encapsulation.

If we assume that outgoing packets are always well formed and they can 
only be damaged while traveling on the link, then the current code 
should be ok.

If we cannot make this assumption, then we need the receiver to verify 
all checksums before moving forward (which is what you are suggesting).

Is it truly possible for the kernel to hand ovpn a packet with invalid 
checksums on the TX path?

Cheers,

Sabrina Dubroca July 18, 2024, 1:40 p.m. UTC | #7

2024-07-18, 15:27:42 +0200, Antonio Quartulli wrote:
> On 18/07/2024 15:11, Sabrina Dubroca wrote:
> > > basically the idea is: with our encapsulation we can guarantee that what
> > > entered the tunnel is also exiting the tunnel, without corruption.
> > > Therefore we claim that checksums are all correct.
> > 
> > Can you be sure that they were correct when they went into the tunnel?
> > If not, I think you have to set CHECKSUM_NONE.
> 
> I can't be sure, because on the sender side we don't validate checksums
> before encapsulation.
> 
> If we assume that outgoing packets are always well formed and they can only
> be damaged while traveling on the link, then the current code should be ok.
> 
> If we cannot make this assumption, then we need the receiver to verify all
> checksums before moving forward (which is what you are suggesting).
> 
> Is it truly possible for the kernel to hand ovpn a packet with invalid
> checksums on the TX path?

The networking stack shouldn't generate packets with broken checksums,
but it could happen. On a VPN server that's giving access to an
internal network, I think the packet could get corrupted on the
internal network and may be pushed without verification into the
tunnel.

It's also possible to inject them with packet sockets for
testing. Using scapy to send packets over your ovpn device should
allow you to do that.

Antonio Quartulli July 18, 2024, 2:15 p.m. UTC | #8

On 18/07/2024 15:40, Sabrina Dubroca wrote:
> 2024-07-18, 15:27:42 +0200, Antonio Quartulli wrote:
>> On 18/07/2024 15:11, Sabrina Dubroca wrote:
>>>> basically the idea is: with our encapsulation we can guarantee that what
>>>> entered the tunnel is also exiting the tunnel, without corruption.
>>>> Therefore we claim that checksums are all correct.
>>>
>>> Can you be sure that they were correct when they went into the tunnel?
>>> If not, I think you have to set CHECKSUM_NONE.
>>
>> I can't be sure, because on the sender side we don't validate checksums
>> before encapsulation.
>>
>> If we assume that outgoing packets are always well formed and they can only
>> be damaged while traveling on the link, then the current code should be ok.
>>
>> If we cannot make this assumption, then we need the receiver to verify all
>> checksums before moving forward (which is what you are suggesting).
>>
>> Is it truly possible for the kernel to hand ovpn a packet with invalid
>> checksums on the TX path?
> 
> The networking stack shouldn't generate packets with broken checksums,
> but it could happen. On a VPN server that's giving access to an
> internal network, I think the packet could get corrupted on the
> internal network and may be pushed without verification into the
> tunnel.

Right.

In these cases the receiver would have a chance to detect and discard 
this packet.

With the current ovpn code, instead, we are saying "everything is good, 
don't check" and the packet would be delivered to the upper layer.

Ok, I think it makes sense to switch to CHECKSUM_NONE.

(I wonder what the wireguard guys think about it :-))

> 
> It's also possible to inject them with packet sockets for
> testing. Using scapy to send packets over your ovpn device should
> allow you to do that.

Thanks for the hint!

>

[net-next,v3,10/24] ovpn: implement basic RX path (UDP)

Checks

Commit Message

Comments

Patch