diff mbox series

[net-next,v7,03/25] net: introduce OpenVPN Data Channel Offload (ovpn)

Message ID 20240917010734.1905-4-antonio@openvpn.net (mailing list archive)
State Deferred
Delegated to: Netdev Maintainers
Headers show
Series Introducing OpenVPN Data Channel Offload | expand

Checks

Context Check Description
netdev/series_format fail Series longer than 15 patches
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; GEN HAS DIFF 2 files changed, 2619 insertions(+);
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 26 this patch: 26
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 1 maintainers not CCed: openvpn-devel@lists.sourceforge.net
netdev/build_clang success Errors and warnings before: 42 this patch: 42
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1973 this patch: 1973
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 218 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-09-17--15-00 (tests: 764)

Commit Message

Antonio Quartulli Sept. 17, 2024, 1:07 a.m. UTC
OpenVPN is a userspace software existing since around 2005 that allows
users to create secure tunnels.

So far OpenVPN has implemented all operations in userspace, which
implies several back and forth between kernel and user land in order to
process packets (encapsulate/decapsulate, encrypt/decrypt, rerouting..).

With `ovpn` we intend to move the fast path (data channel) entirely
in kernel space and thus improve user measured throughput over the
tunnel.

`ovpn` is implemented as a simple virtual network device driver, that
can be manipulated by means of the standard RTNL APIs. A device of kind
`ovpn` allows only IPv4/6 traffic and can be of type:
* P2P (peer-to-peer): any packet sent over the interface will be
  encapsulated and transmitted to the other side (typical OpenVPN
  client or peer-to-peer behaviour);
* P2MP (point-to-multipoint): packets sent over the interface are
  transmitted to peers based on existing routes (typical OpenVPN
  server behaviour).

After the interface has been created, OpenVPN in userspace can
configure it using a new Netlink API. Specifically it is possible
to manage peers and their keys.

The OpenVPN control channel is multiplexed over the same transport
socket by means of OP codes. Anything that is not DATA_V2 (OpenVPN
OP code for data traffic) is sent to userspace and handled there.
This way the `ovpn` codebase is kept as compact as possible while
focusing on handling data traffic only (fast path).

Any OpenVPN control feature (like cipher negotiation, TLS handshake,
rekeying, etc.) is still fully handled by the userspace process.

When userspace establishes a new connection with a peer, it first
performs the handshake and then passes the socket to the `ovpn` kernel
module, which takes ownership. From this moment on `ovpn` will handle
data traffic for the new peer.
When control packets are received on the link, they are forwarded to
userspace through the same transport socket they were received on, as
userspace is still listening to them.

Some events (like peer deletion) are sent to a Netlink multicast group.

Although it wasn't easy to convince the community, `ovpn` implements
only a limited number of the data-channel features supported by the
userspace program.

Each feature that made it to `ovpn` was attentively vetted to
avoid carrying too much legacy along with us (and to give a clear cut to
old and probalby-not-so-useful features).

Notably, only encryption using AEAD ciphers (specifically
ChaCha20Poly1305 and AES-GCM) was implemented. Supporting any other
cipher out there was not deemed useful.

Both UDP and TCP sockets ae supported.

As explained above, in case of P2MP mode, OpenVPN will use the main system
routing table to decide which packet goes to which peer. This implies
that no routing table was re-implemented in the `ovpn` kernel module.

This kernel module can be enabled by selecting the CONFIG_OVPN entry
in the networking drivers section.

NOTE: this first patch introduces the very basic framework only.
Features are then added patch by patch, however, although each patch
will compile and possibly not break at runtime, only after having
applied the full set it is expected to see the ovpn module fully working.

Cc: steffen.klassert@secunet.com
Cc: antony.antony@secunet.com
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 MAINTAINERS               |   7 +++
 drivers/net/Kconfig       |  14 +++++
 drivers/net/Makefile      |   1 +
 drivers/net/ovpn/Makefile |  11 ++++
 drivers/net/ovpn/io.c     |  22 ++++++++
 drivers/net/ovpn/io.h     |  15 ++++++
 drivers/net/ovpn/main.c   | 109 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/main.h   |  15 ++++++
 include/uapi/linux/udp.h  |   1 +
 9 files changed, 195 insertions(+)
 create mode 100644 drivers/net/ovpn/Makefile
 create mode 100644 drivers/net/ovpn/io.c
 create mode 100644 drivers/net/ovpn/io.h
 create mode 100644 drivers/net/ovpn/main.c
 create mode 100644 drivers/net/ovpn/main.h

Comments

Kuniyuki Iwashima Sept. 19, 2024, 5:52 a.m. UTC | #1
From: Antonio Quartulli <antonio@openvpn.net>
Date: Tue, 17 Sep 2024 03:07:12 +0200
> +/* we register with rtnl to let core know that ovpn is a virtual driver and
> + * therefore ifaces should be destroyed when exiting a netns
> + */
> +static struct rtnl_link_ops ovpn_link_ops = {
> +};

This looks like abusing rtnl_link_ops.

Instead of a hack to rely on default_device_exit_batch()
and rtnl_link_unregister(), this should be implemented as
struct pernet_operations.exit_batch_rtnl().

Then, the patch 2 is not needed, which is confusing for
all other rtnl_link_ops users.

If we want to avoid extra RTNL in default_device_exit_batch(),
I can post this patch after merge window.

---8<---
diff --git a/net/core/dev.c b/net/core/dev.c
index 1e740faf9e78..eacf6f5a6ace 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11916,7 +11916,8 @@ static void __net_exit default_device_exit_net(struct net *net)
 	}
 }
 
-static void __net_exit default_device_exit_batch(struct list_head *net_list)
+void __net_exit default_device_exit_batch(struct list_head *net_list,
+					  struct list_head *dev_kill_list)
 {
 	/* At exit all network devices most be removed from a network
 	 * namespace.  Do this in the reverse order of registration.
@@ -11925,9 +11926,7 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
 	 */
 	struct net_device *dev;
 	struct net *net;
-	LIST_HEAD(dev_kill_list);
 
-	rtnl_lock();
 	list_for_each_entry(net, net_list, exit_list) {
 		default_device_exit_net(net);
 		cond_resched();
@@ -11936,19 +11935,13 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
 	list_for_each_entry(net, net_list, exit_list) {
 		for_each_netdev_reverse(net, dev) {
 			if (dev->rtnl_link_ops && dev->rtnl_link_ops->dellink)
-				dev->rtnl_link_ops->dellink(dev, &dev_kill_list);
+				dev->rtnl_link_ops->dellink(dev, dev_kill_list);
 			else
-				unregister_netdevice_queue(dev, &dev_kill_list);
+				unregister_netdevice_queue(dev, dev_kill_list);
 		}
 	}
-	unregister_netdevice_many(&dev_kill_list);
-	rtnl_unlock();
 }
 
-static struct pernet_operations __net_initdata default_device_ops = {
-	.exit_batch = default_device_exit_batch,
-};
-
 static void __init net_dev_struct_check(void)
 {
 	/* TX read-mostly hotpath */
@@ -12140,9 +12133,6 @@ static int __init net_dev_init(void)
 	if (register_pernet_device(&loopback_net_ops))
 		goto out;
 
-	if (register_pernet_device(&default_device_ops))
-		goto out;
-
 	open_softirq(NET_TX_SOFTIRQ, net_tx_action);
 	open_softirq(NET_RX_SOFTIRQ, net_rx_action);
 
diff --git a/net/core/dev.h b/net/core/dev.h
index 5654325c5b71..d1feecab9c4a 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -99,6 +99,9 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
 void unregister_netdevice_many_notify(struct list_head *head,
 				      u32 portid, const struct nlmsghdr *nlh);
 
+void default_device_exit_batch(struct list_head *net_list,
+			       struct list_head *dev_kill_list);
+
 static inline void netif_set_gso_max_size(struct net_device *dev,
 					  unsigned int size)
 {
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 11e4dd4f09ed..0a9bce599d54 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -27,6 +27,8 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
+#include "dev.h"
+
 /*
  *	Our network namespace constructor/destructor lists
  */
@@ -380,6 +382,7 @@ static __net_init int setup_net(struct net *net)
 		if (ops->exit_batch_rtnl)
 			ops->exit_batch_rtnl(&net_exit_list, &dev_kill_list);
 	}
+	default_device_exit_batch(&net_exit_list, &dev_kill_list);
 	unregister_netdevice_many(&dev_kill_list);
 	rtnl_unlock();
 
@@ -618,6 +621,7 @@ static void cleanup_net(struct work_struct *work)
 		if (ops->exit_batch_rtnl)
 			ops->exit_batch_rtnl(&net_exit_list, &dev_kill_list);
 	}
+	default_device_exit_batch(&net_exit_list, &dev_kill_list);
 	unregister_netdevice_many(&dev_kill_list);
 	rtnl_unlock();
 
@@ -1214,6 +1218,7 @@ static void free_exit_list(struct pernet_operations *ops, struct list_head *net_
 
 		rtnl_lock();
 		ops->exit_batch_rtnl(net_exit_list, &dev_kill_list);
+		default_device_exit_batch(net_exit_list, &dev_kill_list);
 		unregister_netdevice_many(&dev_kill_list);
 		rtnl_unlock();
 	}
---8<---
Antonio Quartulli Sept. 19, 2024, 11:57 a.m. UTC | #2
Hi Kuniyuki and thank you for chiming in.

On 19/09/2024 07:52, Kuniyuki Iwashima wrote:
> From: Antonio Quartulli <antonio@openvpn.net>
> Date: Tue, 17 Sep 2024 03:07:12 +0200
>> +/* we register with rtnl to let core know that ovpn is a virtual driver and
>> + * therefore ifaces should be destroyed when exiting a netns
>> + */
>> +static struct rtnl_link_ops ovpn_link_ops = {
>> +};
> 
> This looks like abusing rtnl_link_ops.

In some way, the inspiration came from
5b9e7e160795 ("openvswitch: introduce rtnl ops stub")

[which just reminded me that I wanted to fill the .kind field, but I 
forgot to do so]

The reason for taking this approach was to avoid handling the iface 
destruction upon netns exit inside the driver, when the core already has 
all the code for taking care of this for us.

Originally I implemented pernet_operations.pre_exit, but Sabrina 
suggested that letting the core handle the destruction was cleaner (and 
I agreed).

However, after I removed the pre_exit implementation, we realized that 
default_device_exit_batch/default_device_exit_net thought that an ovpn 
device is a real NIC and was moving it to the global netns rather than 
killing it.

One way to fix the above was to register rtnl_link_ops with netns_fund = 
false (so the ops object you see in this patch is not truly "empty").

However, I then hit the bug which required patch 2 to get fixed.

Does it make sense to you?
Or you still think this is an rtnl_link_ops abuse?

The alternative was to change 
default_device_exit_batch/default_device_exit_net to read some new 
netdevice flag which would tell if the interface should be killed or 
moved to global upon netns exit.

Regards,

> 
> Instead of a hack to rely on default_device_exit_batch()
> and rtnl_link_unregister(), this should be implemented as
> struct pernet_operations.exit_batch_rtnl().
> 
> Then, the patch 2 is not needed, which is confusing for
> all other rtnl_link_ops users.
> 
> If we want to avoid extra RTNL in default_device_exit_batch(),
> I can post this patch after merge window.
> 
> ---8<---
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1e740faf9e78..eacf6f5a6ace 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -11916,7 +11916,8 @@ static void __net_exit default_device_exit_net(struct net *net)
>   	}
>   }
>   
> -static void __net_exit default_device_exit_batch(struct list_head *net_list)
> +void __net_exit default_device_exit_batch(struct list_head *net_list,
> +					  struct list_head *dev_kill_list)
>   {
>   	/* At exit all network devices most be removed from a network
>   	 * namespace.  Do this in the reverse order of registration.
> @@ -11925,9 +11926,7 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
>   	 */
>   	struct net_device *dev;
>   	struct net *net;
> -	LIST_HEAD(dev_kill_list);
>   
> -	rtnl_lock();
>   	list_for_each_entry(net, net_list, exit_list) {
>   		default_device_exit_net(net);
>   		cond_resched();
> @@ -11936,19 +11935,13 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
>   	list_for_each_entry(net, net_list, exit_list) {
>   		for_each_netdev_reverse(net, dev) {
>   			if (dev->rtnl_link_ops && dev->rtnl_link_ops->dellink)
> -				dev->rtnl_link_ops->dellink(dev, &dev_kill_list);
> +				dev->rtnl_link_ops->dellink(dev, dev_kill_list);
>   			else
> -				unregister_netdevice_queue(dev, &dev_kill_list);
> +				unregister_netdevice_queue(dev, dev_kill_list);
>   		}
>   	}
> -	unregister_netdevice_many(&dev_kill_list);
> -	rtnl_unlock();
>   }
>   
> -static struct pernet_operations __net_initdata default_device_ops = {
> -	.exit_batch = default_device_exit_batch,
> -};
> -
>   static void __init net_dev_struct_check(void)
>   {
>   	/* TX read-mostly hotpath */
> @@ -12140,9 +12133,6 @@ static int __init net_dev_init(void)
>   	if (register_pernet_device(&loopback_net_ops))
>   		goto out;
>   
> -	if (register_pernet_device(&default_device_ops))
> -		goto out;
> -
>   	open_softirq(NET_TX_SOFTIRQ, net_tx_action);
>   	open_softirq(NET_RX_SOFTIRQ, net_rx_action);
>   
> diff --git a/net/core/dev.h b/net/core/dev.h
> index 5654325c5b71..d1feecab9c4a 100644
> --- a/net/core/dev.h
> +++ b/net/core/dev.h
> @@ -99,6 +99,9 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
>   void unregister_netdevice_many_notify(struct list_head *head,
>   				      u32 portid, const struct nlmsghdr *nlh);
>   
> +void default_device_exit_batch(struct list_head *net_list,
> +			       struct list_head *dev_kill_list);
> +
>   static inline void netif_set_gso_max_size(struct net_device *dev,
>   					  unsigned int size)
>   {
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 11e4dd4f09ed..0a9bce599d54 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -27,6 +27,8 @@
>   #include <net/net_namespace.h>
>   #include <net/netns/generic.h>
>   
> +#include "dev.h"
> +
>   /*
>    *	Our network namespace constructor/destructor lists
>    */
> @@ -380,6 +382,7 @@ static __net_init int setup_net(struct net *net)
>   		if (ops->exit_batch_rtnl)
>   			ops->exit_batch_rtnl(&net_exit_list, &dev_kill_list);
>   	}
> +	default_device_exit_batch(&net_exit_list, &dev_kill_list);
>   	unregister_netdevice_many(&dev_kill_list);
>   	rtnl_unlock();
>   
> @@ -618,6 +621,7 @@ static void cleanup_net(struct work_struct *work)
>   		if (ops->exit_batch_rtnl)
>   			ops->exit_batch_rtnl(&net_exit_list, &dev_kill_list);
>   	}
> +	default_device_exit_batch(&net_exit_list, &dev_kill_list);
>   	unregister_netdevice_many(&dev_kill_list);
>   	rtnl_unlock();
>   
> @@ -1214,6 +1218,7 @@ static void free_exit_list(struct pernet_operations *ops, struct list_head *net_
>   
>   		rtnl_lock();
>   		ops->exit_batch_rtnl(net_exit_list, &dev_kill_list);
> +		default_device_exit_batch(net_exit_list, &dev_kill_list);
>   		unregister_netdevice_many(&dev_kill_list);
>   		rtnl_unlock();
>   	}
> ---8<---
Kuniyuki Iwashima Sept. 20, 2024, 9:32 a.m. UTC | #3
From: Antonio Quartulli <antonio@openvpn.net>
Date: Thu, 19 Sep 2024 13:57:51 +0200
> Hi Kuniyuki and thank you for chiming in.
> 
> On 19/09/2024 07:52, Kuniyuki Iwashima wrote:
> > From: Antonio Quartulli <antonio@openvpn.net>
> > Date: Tue, 17 Sep 2024 03:07:12 +0200
> >> +/* we register with rtnl to let core know that ovpn is a virtual driver and
> >> + * therefore ifaces should be destroyed when exiting a netns
> >> + */
> >> +static struct rtnl_link_ops ovpn_link_ops = {
> >> +};
> > 
> > This looks like abusing rtnl_link_ops.
> 
> In some way, the inspiration came from
> 5b9e7e160795 ("openvswitch: introduce rtnl ops stub")
> 
> [which just reminded me that I wanted to fill the .kind field, but I 
> forgot to do so]
> 
> The reason for taking this approach was to avoid handling the iface 
> destruction upon netns exit inside the driver, when the core already has 
> all the code for taking care of this for us.
> 
> Originally I implemented pernet_operations.pre_exit, but Sabrina 
> suggested that letting the core handle the destruction was cleaner (and 
> I agreed).
> 
> However, after I removed the pre_exit implementation, we realized that 
> default_device_exit_batch/default_device_exit_net thought that an ovpn 
> device is a real NIC and was moving it to the global netns rather than 
> killing it.
> 
> One way to fix the above was to register rtnl_link_ops with netns_fund = 
> false (so the ops object you see in this patch is not truly "empty").
> 
> However, I then hit the bug which required patch 2 to get fixed.
> 
> Does it make sense to you?
> Or you still think this is an rtnl_link_ops abuse?

The use of .kind makes sense, and the change should be in this patch.

For the patch 2 and dellink(), is the device not expected to be removed
by ip link del ?  Setting unregister_netdevice_queue() to dellink() will
support RTM_DELLINK, but otherwise -EOPNOTSUPP is returned.


> 
> The alternative was to change 
> default_device_exit_batch/default_device_exit_net to read some new 
> netdevice flag which would tell if the interface should be killed or 
> moved to global upon netns exit.
> 
> Regards,
>
Antonio Quartulli Sept. 20, 2024, 9:46 a.m. UTC | #4
Hi,

On 20/09/2024 11:32, Kuniyuki Iwashima wrote:
> From: Antonio Quartulli <antonio@openvpn.net>
> Date: Thu, 19 Sep 2024 13:57:51 +0200
>> Hi Kuniyuki and thank you for chiming in.
>>
>> On 19/09/2024 07:52, Kuniyuki Iwashima wrote:
>>> From: Antonio Quartulli <antonio@openvpn.net>
>>> Date: Tue, 17 Sep 2024 03:07:12 +0200
>>>> +/* we register with rtnl to let core know that ovpn is a virtual driver and
>>>> + * therefore ifaces should be destroyed when exiting a netns
>>>> + */
>>>> +static struct rtnl_link_ops ovpn_link_ops = {
>>>> +};
>>>
>>> This looks like abusing rtnl_link_ops.
>>
>> In some way, the inspiration came from
>> 5b9e7e160795 ("openvswitch: introduce rtnl ops stub")
>>
>> [which just reminded me that I wanted to fill the .kind field, but I
>> forgot to do so]
>>
>> The reason for taking this approach was to avoid handling the iface
>> destruction upon netns exit inside the driver, when the core already has
>> all the code for taking care of this for us.
>>
>> Originally I implemented pernet_operations.pre_exit, but Sabrina
>> suggested that letting the core handle the destruction was cleaner (and
>> I agreed).
>>
>> However, after I removed the pre_exit implementation, we realized that
>> default_device_exit_batch/default_device_exit_net thought that an ovpn
>> device is a real NIC and was moving it to the global netns rather than
>> killing it.
>>
>> One way to fix the above was to register rtnl_link_ops with netns_fund =
>> false (so the ops object you see in this patch is not truly "empty").
>>
>> However, I then hit the bug which required patch 2 to get fixed.
>>
>> Does it make sense to you?
>> Or you still think this is an rtnl_link_ops abuse?
> 
> The use of .kind makes sense, and the change should be in this patch.

Ok, will add it here and I will also add an explicit .netns_fund = false 
to highlight the fact that we need this attribute to avoid moving the 
iface to the global netns.

> 
> For the patch 2 and dellink(), is the device not expected to be removed
> by ip link del ?  Setting unregister_netdevice_queue() to dellink() will
> support RTM_DELLINK, but otherwise -EOPNOTSUPP is returned.

For the time being I decided that it would make sense to add and delete 
ovpn interfaces via netlink API only.

But there are already discussions about implementing the RTNL 
add/dellink() too.
Therefore I think it makes sense to set dellink to 
unregister_netdevice_queue() in this patch and thus avoid patch 2 at all.


Thanks.

Regards,

> 
> 
>>
>> The alternative was to change
>> default_device_exit_batch/default_device_exit_net to read some new
>> netdevice flag which would tell if the interface should be killed or
>> moved to global upon netns exit.
>>
>> Regards,
>>
Sergey Ryazanov Sept. 22, 2024, 8:51 p.m. UTC | #5
Hello Antonio, Kuniyuki,

On 20.09.2024 12:46, Antonio Quartulli wrote:
> Hi,
> 
> On 20/09/2024 11:32, Kuniyuki Iwashima wrote:
>> From: Antonio Quartulli <antonio@openvpn.net>
>> Date: Thu, 19 Sep 2024 13:57:51 +0200
>>> Hi Kuniyuki and thank you for chiming in.
>>>
>>> On 19/09/2024 07:52, Kuniyuki Iwashima wrote:
>>>> From: Antonio Quartulli <antonio@openvpn.net>
>>>> Date: Tue, 17 Sep 2024 03:07:12 +0200
>>>>> +/* we register with rtnl to let core know that ovpn is a virtual 
>>>>> driver and
>>>>> + * therefore ifaces should be destroyed when exiting a netns
>>>>> + */
>>>>> +static struct rtnl_link_ops ovpn_link_ops = {
>>>>> +};
>>>>
>>>> This looks like abusing rtnl_link_ops.
>>>
>>> In some way, the inspiration came from
>>> 5b9e7e160795 ("openvswitch: introduce rtnl ops stub")
>>>
>>> [which just reminded me that I wanted to fill the .kind field, but I
>>> forgot to do so]
>>>
>>> The reason for taking this approach was to avoid handling the iface
>>> destruction upon netns exit inside the driver, when the core already has
>>> all the code for taking care of this for us.
>>>
>>> Originally I implemented pernet_operations.pre_exit, but Sabrina
>>> suggested that letting the core handle the destruction was cleaner (and
>>> I agreed).
>>>
>>> However, after I removed the pre_exit implementation, we realized that
>>> default_device_exit_batch/default_device_exit_net thought that an ovpn
>>> device is a real NIC and was moving it to the global netns rather than
>>> killing it.
>>>
>>> One way to fix the above was to register rtnl_link_ops with netns_fund =
>>> false (so the ops object you see in this patch is not truly "empty").
>>>
>>> However, I then hit the bug which required patch 2 to get fixed.
>>>
>>> Does it make sense to you?
>>> Or you still think this is an rtnl_link_ops abuse?
>>
>> The use of .kind makes sense, and the change should be in this patch.
> 
> Ok, will add it here and I will also add an explicit .netns_fund = false 
> to highlight the fact that we need this attribute to avoid moving the 
> iface to the global netns.
> 
>>
>> For the patch 2 and dellink(), is the device not expected to be removed
>> by ip link del ?  Setting unregister_netdevice_queue() to dellink() will
>> support RTM_DELLINK, but otherwise -EOPNOTSUPP is returned.
> 
> For the time being I decided that it would make sense to add and delete 
> ovpn interfaces via netlink API only.
> 
> But there are already discussions about implementing the RTNL 
> add/dellink() too.
> Therefore I think it makes sense to set dellink to 
> unregister_netdevice_queue() in this patch and thus avoid patch 2 at all.

I should make a confession :) It was me who proposed and pushed the idea 
of the RTNL ops removing. I was too concerned about uselessness of 
addlink operation so I did not clearly mention that dellink is useful 
operation. Especially when it comes to namespace destruction. My bad.

So yeah, providing the dellink operation make sense for namespace 
destruction handling and for user to manually cleanup reminding network 
interfaces after a forceful user application killing or crash.

>>> The alternative was to change
>>> default_device_exit_batch/default_device_exit_net to read some new
>>> netdevice flag which would tell if the interface should be killed or
>>> moved to global upon netns exit.

--
Sergey
Antonio Quartulli Sept. 23, 2024, 12:51 p.m. UTC | #6
On 22/09/2024 22:51, Sergey Ryazanov wrote:
> Hello Antonio, Kuniyuki,
> 
> On 20.09.2024 12:46, Antonio Quartulli wrote:
>> Hi,
>>
>> On 20/09/2024 11:32, Kuniyuki Iwashima wrote:
>>> From: Antonio Quartulli <antonio@openvpn.net>
>>> Date: Thu, 19 Sep 2024 13:57:51 +0200
>>>> Hi Kuniyuki and thank you for chiming in.
>>>>
>>>> On 19/09/2024 07:52, Kuniyuki Iwashima wrote:
>>>>> From: Antonio Quartulli <antonio@openvpn.net>
>>>>> Date: Tue, 17 Sep 2024 03:07:12 +0200
>>>>>> +/* we register with rtnl to let core know that ovpn is a virtual 
>>>>>> driver and
>>>>>> + * therefore ifaces should be destroyed when exiting a netns
>>>>>> + */
>>>>>> +static struct rtnl_link_ops ovpn_link_ops = {
>>>>>> +};
>>>>>
>>>>> This looks like abusing rtnl_link_ops.
>>>>
>>>> In some way, the inspiration came from
>>>> 5b9e7e160795 ("openvswitch: introduce rtnl ops stub")
>>>>
>>>> [which just reminded me that I wanted to fill the .kind field, but I
>>>> forgot to do so]
>>>>
>>>> The reason for taking this approach was to avoid handling the iface
>>>> destruction upon netns exit inside the driver, when the core already 
>>>> has
>>>> all the code for taking care of this for us.
>>>>
>>>> Originally I implemented pernet_operations.pre_exit, but Sabrina
>>>> suggested that letting the core handle the destruction was cleaner (and
>>>> I agreed).
>>>>
>>>> However, after I removed the pre_exit implementation, we realized that
>>>> default_device_exit_batch/default_device_exit_net thought that an ovpn
>>>> device is a real NIC and was moving it to the global netns rather than
>>>> killing it.
>>>>
>>>> One way to fix the above was to register rtnl_link_ops with 
>>>> netns_fund =
>>>> false (so the ops object you see in this patch is not truly "empty").
>>>>
>>>> However, I then hit the bug which required patch 2 to get fixed.
>>>>
>>>> Does it make sense to you?
>>>> Or you still think this is an rtnl_link_ops abuse?
>>>
>>> The use of .kind makes sense, and the change should be in this patch.
>>
>> Ok, will add it here and I will also add an explicit .netns_fund = 
>> false to highlight the fact that we need this attribute to avoid 
>> moving the iface to the global netns.
>>
>>>
>>> For the patch 2 and dellink(), is the device not expected to be removed
>>> by ip link del ?  Setting unregister_netdevice_queue() to dellink() will
>>> support RTM_DELLINK, but otherwise -EOPNOTSUPP is returned.
>>
>> For the time being I decided that it would make sense to add and 
>> delete ovpn interfaces via netlink API only.
>>
>> But there are already discussions about implementing the RTNL 
>> add/dellink() too.
>> Therefore I think it makes sense to set dellink to 
>> unregister_netdevice_queue() in this patch and thus avoid patch 2 at all.
> 
> I should make a confession :) It was me who proposed and pushed the idea 
> of the RTNL ops removing. I was too concerned about uselessness of 
> addlink operation so I did not clearly mention that dellink is useful 
> operation. Especially when it comes to namespace destruction. My bad.

It helped getting where we are now :)

> 
> So yeah, providing the dellink operation make sense for namespace 
> destruction handling and for user to manually cleanup reminding network 
> interfaces after a forceful user application killing or crash.

For this specific case (i.e. crash) I am planning to add a netlink 
notifier that detects when the process having created the interface goes 
away and then kill the interface from within the kernel.

This way we have some sort of self cleanup and avoid leaving the system 
in a bogus state. (For those specific use cases where you want to create 
a "persistent" interface, I think we will provide a flag. But this is 
for a later patch..)


Cheers,

> 
>>>> The alternative was to change
>>>> default_device_exit_batch/default_device_exit_net to read some new
>>>> netdevice flag which would tell if the interface should be killed or
>>>> moved to global upon netns exit.
> 
> -- 
> Sergey
diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 77fcd6f802a5..53b6350d95be 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17263,6 +17263,13 @@  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs.git
 F:	Documentation/filesystems/overlayfs.rst
 F:	fs/overlayfs/
 
+OPENVPN DATA CHANNEL OFFLOAD
+M:	Antonio Quartulli <antonio@openvpn.net>
+L:	openvpn-devel@lists.sourceforge.net (moderated for non-subscribers)
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/ovpn/
+
 P54 WIRELESS DRIVER
 M:	Christian Lamparter <chunkeey@googlemail.com>
 L:	linux-wireless@vger.kernel.org
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 9920b3a68ed1..0055bcd2356c 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -115,6 +115,20 @@  config WIREGUARD_DEBUG
 
 	  Say N here unless you know what you're doing.
 
+config OVPN
+	tristate "OpenVPN data channel offload"
+	depends on NET && INET
+	select NET_UDP_TUNNEL
+	select DST_CACHE
+	select CRYPTO
+	select CRYPTO_AES
+	select CRYPTO_GCM
+	select CRYPTO_CHACHA20POLY1305
+	select STREAM_PARSER
+	help
+	  This module enhances the performance of the OpenVPN userspace software
+	  by offloading the data channel processing to kernelspace.
+
 config EQUALIZER
 	tristate "EQL (serial line load balancing) support"
 	help
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 13743d0e83b5..5152b3330e28 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -11,6 +11,7 @@  obj-$(CONFIG_IPVLAN) += ipvlan/
 obj-$(CONFIG_IPVTAP) += ipvlan/
 obj-$(CONFIG_DUMMY) += dummy.o
 obj-$(CONFIG_WIREGUARD) += wireguard/
+obj-$(CONFIG_OVPN) += ovpn/
 obj-$(CONFIG_EQUALIZER) += eql.o
 obj-$(CONFIG_IFB) += ifb.o
 obj-$(CONFIG_MACSEC) += macsec.o
diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
new file mode 100644
index 000000000000..53fb197027d7
--- /dev/null
+++ b/drivers/net/ovpn/Makefile
@@ -0,0 +1,11 @@ 
+# SPDX-License-Identifier: GPL-2.0
+#
+# ovpn -- OpenVPN data channel offload in kernel space
+#
+# Copyright (C) 2020-2024 OpenVPN, Inc.
+#
+# Author:	Antonio Quartulli <antonio@openvpn.net>
+
+obj-$(CONFIG_OVPN) := ovpn.o
+ovpn-y += main.o
+ovpn-y += io.o
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
new file mode 100644
index 000000000000..ad3813419c33
--- /dev/null
+++ b/drivers/net/ovpn/io.c
@@ -0,0 +1,22 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+
+#include "io.h"
+
+/* Send user data to the network
+ */
+netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	skb_tx_error(skb);
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+}
diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
new file mode 100644
index 000000000000..aa259be66441
--- /dev/null
+++ b/drivers/net/ovpn/io.h
@@ -0,0 +1,15 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_OVPN_H_
+#define _NET_OVPN_OVPN_H_
+
+netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
+
+#endif /* _NET_OVPN_OVPN_H_ */
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
new file mode 100644
index 000000000000..8a90319e4600
--- /dev/null
+++ b/drivers/net/ovpn/main.c
@@ -0,0 +1,109 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ *		James Yonan <james@openvpn.net>
+ */
+
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/version.h>
+#include <net/rtnetlink.h>
+
+#include "main.h"
+#include "io.h"
+
+/* Driver info */
+#define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
+#define DRV_COPYRIGHT	"(C) 2020-2024 OpenVPN, Inc."
+
+/**
+ * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn'
+ * @dev: the interface to check
+ *
+ * Return: whether the netdevice is of type 'ovpn'
+ */
+bool ovpn_dev_is_valid(const struct net_device *dev)
+{
+	return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
+}
+
+/* we register with rtnl to let core know that ovpn is a virtual driver and
+ * therefore ifaces should be destroyed when exiting a netns
+ */
+static struct rtnl_link_ops ovpn_link_ops = {
+};
+
+static int ovpn_netdev_notifier_call(struct notifier_block *nb,
+				     unsigned long state, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+	if (!ovpn_dev_is_valid(dev))
+		return NOTIFY_DONE;
+
+	switch (state) {
+	case NETDEV_REGISTER:
+		/* add device to internal list for later destruction upon
+		 * unregistration
+		 */
+		break;
+	case NETDEV_UNREGISTER:
+		/* can be delivered multiple times, so check registered flag,
+		 * then destroy the interface
+		 */
+		break;
+	case NETDEV_POST_INIT:
+	case NETDEV_GOING_DOWN:
+	case NETDEV_DOWN:
+	case NETDEV_UP:
+	case NETDEV_PRE_UP:
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block ovpn_netdev_notifier = {
+	.notifier_call = ovpn_netdev_notifier_call,
+};
+
+static int __init ovpn_init(void)
+{
+	int err = register_netdevice_notifier(&ovpn_netdev_notifier);
+
+	if (err) {
+		pr_err("ovpn: can't register netdevice notifier: %d\n", err);
+		return err;
+	}
+
+	err = rtnl_link_register(&ovpn_link_ops);
+	if (err) {
+		pr_err("ovpn: can't register rtnl link ops: %d\n", err);
+		goto unreg_netdev;
+	}
+
+	return 0;
+
+unreg_netdev:
+	unregister_netdevice_notifier(&ovpn_netdev_notifier);
+	return err;
+}
+
+static __exit void ovpn_cleanup(void)
+{
+	rtnl_link_unregister(&ovpn_link_ops);
+	unregister_netdevice_notifier(&ovpn_netdev_notifier);
+
+	rcu_barrier();
+}
+
+module_init(ovpn_init);
+module_exit(ovpn_cleanup);
+
+MODULE_DESCRIPTION(DRV_DESCRIPTION);
+MODULE_AUTHOR(DRV_COPYRIGHT);
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h
new file mode 100644
index 000000000000..a3215316c49b
--- /dev/null
+++ b/drivers/net/ovpn/main.h
@@ -0,0 +1,15 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_MAIN_H_
+#define _NET_OVPN_MAIN_H_
+
+bool ovpn_dev_is_valid(const struct net_device *dev);
+
+#endif /* _NET_OVPN_MAIN_H_ */
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 1a0fe8b151fb..f9f8ffddfd0c 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -43,5 +43,6 @@  struct udphdr {
 #define UDP_ENCAP_GTP1U		5 /* 3GPP TS 29.060 */
 #define UDP_ENCAP_RXRPC		6
 #define TCP_ENCAP_ESPINTCP	7 /* Yikes, this is really xfrm encap types. */
+#define UDP_ENCAP_OVPNINUDP	8 /* OpenVPN traffic */
 
 #endif /* _UAPI_LINUX_UDP_H */