diff mbox series

[net-next,v15,06/22] ovpn: introduce the ovpn_socket object

Message ID 20241211-b4-ovpn-v15-6-314e2cad0618@openvpn.net (mailing list archive)
State New
Headers show
Series Introducing OpenVPN Data Channel Offload | expand

Commit Message

Antonio Quartulli Dec. 11, 2024, 9:15 p.m. UTC
This specific structure is used in the ovpn kernel module
to wrap and carry around a standard kernel socket.

ovpn takes ownership of passed sockets and therefore an ovpn
specific objects is attached to them for status tracking
purposes.

Initially only UDP support is introduced. TCP will come in a later
patch.

Cc: willemdebruijn.kernel@gmail.com
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/Makefile |   2 +
 drivers/net/ovpn/socket.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/net/ovpn/socket.h |  48 +++++++++++++++++++
 drivers/net/ovpn/udp.c    |  65 +++++++++++++++++++++++++
 drivers/net/ovpn/udp.h    |  17 +++++++
 include/uapi/linux/udp.h  |   1 +
 6 files changed, 252 insertions(+)

Comments

Sabrina Dubroca Dec. 12, 2024, 4:19 p.m. UTC | #1
2024-12-11, 22:15:10 +0100, Antonio Quartulli wrote:
> +static struct ovpn_socket *ovpn_socket_get(struct socket *sock)
> +{
> +	struct ovpn_socket *ovpn_sock;
> +
> +	rcu_read_lock();
> +	ovpn_sock = rcu_dereference_sk_user_data(sock->sk);
> +	if (WARN_ON(!ovpn_socket_hold(ovpn_sock)))

Could we hit this situation when we're removing the last peer (so
detaching its socket) just as we're adding a new one? ovpn_socket_new
finds the socket already attached and goes through the EALREADY path,
but the refcount has already dropped to 0?

Then we'd also return NULL from ovpn_socket_new [1], which I don't
think is handled well by the caller (at least the netdev_dbg call at
the end of ovpn_nl_peer_modify, maybe other spots too).

(I guess it's not an issue you would see with the existing userspace
if it's single-threaded)

[...]
> +struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
> +{
> +	struct ovpn_socket *ovpn_sock;
> +	int ret;
> +
> +	ret = ovpn_socket_attach(sock, peer);
> +	if (ret < 0 && ret != -EALREADY)
> +		return ERR_PTR(ret);
> +
> +	/* if this socket is already owned by this interface, just increase the
> +	 * refcounter and use it as expected.
> +	 *
> +	 * Since UDP sockets can be used to talk to multiple remote endpoints,
> +	 * openvpn normally instantiates only one socket and shares it among all
> +	 * its peers. For this reason, when we find out that a socket is already
> +	 * used for some other peer in *this* instance, we can happily increase
> +	 * its refcounter and use it normally.
> +	 */
> +	if (ret == -EALREADY) {
> +		/* caller is expected to increase the sock refcounter before
> +		 * passing it to this function. For this reason we drop it if
> +		 * not needed, like when this socket is already owned.
> +		 */
> +		ovpn_sock = ovpn_socket_get(sock);
> +		sockfd_put(sock);

[1] so we would need to add

    if (!ovpn_sock)
        return -EAGAIN;

> +		return ovpn_sock;
> +	}
> +

[...]
> +int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_priv *ovpn)
> +{
> +	struct ovpn_socket *old_data;
> +	int ret = 0;
> +
> +	/* make sure no pre-existing encapsulation handler exists */
> +	rcu_read_lock();
> +	old_data = rcu_dereference_sk_user_data(sock->sk);
> +	if (!old_data) {
> +		/* socket is currently unused - we can take it */
> +		rcu_read_unlock();
> +		return 0;
> +	}
> +
> +	/* socket is in use. We need to understand if it's owned by this ovpn
> +	 * instance or by something else.
> +	 * In the former case, we can increase the refcounter and happily
> +	 * use it, because the same UDP socket is expected to be shared among
> +	 * different peers.
> +	 *
> +	 * Unlikely TCP, a single UDP socket can be used to talk to many remote

(since I'm commenting on this patch:)

s/Unlikely/Unlike/

[I have some more nits/typos here and there but I worry the
maintainers will get "slightly" annoyed if I make you repost 22
patches once again :) -- if that's all I find in the next few days,
everyone might be happier if I stash them and we get them fixed after
merging?]
Antonio Quartulli Dec. 12, 2024, 10:46 p.m. UTC | #2
On 12/12/2024 17:19, Sabrina Dubroca wrote:
> 2024-12-11, 22:15:10 +0100, Antonio Quartulli wrote:
>> +static struct ovpn_socket *ovpn_socket_get(struct socket *sock)
>> +{
>> +	struct ovpn_socket *ovpn_sock;
>> +
>> +	rcu_read_lock();
>> +	ovpn_sock = rcu_dereference_sk_user_data(sock->sk);
>> +	if (WARN_ON(!ovpn_socket_hold(ovpn_sock)))
> 
> Could we hit this situation when we're removing the last peer (so
> detaching its socket) just as we're adding a new one? ovpn_socket_new
> finds the socket already attached and goes through the EALREADY path,
> but the refcount has already dropped to 0?
> 

hm good point.

> Then we'd also return NULL from ovpn_socket_new [1], which I don't
> think is handled well by the caller (at least the netdev_dbg call at
> the end of ovpn_nl_peer_modify, maybe other spots too).
> 
> (I guess it's not an issue you would see with the existing userspace
> if it's single-threaded)

The TCP patch 11/22 will convert the socket release routine to a 
scheduled worker.

This means we can have the following flow:
1) userspace deletes a peer -> peer drops its reference to the ovpn_socket
2) ovpn_socket refcnt may hit 0 -> cleanup/detach work is scheduled, but 
not yet executed
3) userspace adds a new peer -> attach returns -EALREADY but refcnt is 0

So not so impossible, even with a single-threaded userspace software.

> 
> [...]
>> +struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
>> +{
>> +	struct ovpn_socket *ovpn_sock;
>> +	int ret;
>> +
>> +	ret = ovpn_socket_attach(sock, peer);
>> +	if (ret < 0 && ret != -EALREADY)
>> +		return ERR_PTR(ret);
>> +
>> +	/* if this socket is already owned by this interface, just increase the
>> +	 * refcounter and use it as expected.
>> +	 *
>> +	 * Since UDP sockets can be used to talk to multiple remote endpoints,
>> +	 * openvpn normally instantiates only one socket and shares it among all
>> +	 * its peers. For this reason, when we find out that a socket is already
>> +	 * used for some other peer in *this* instance, we can happily increase
>> +	 * its refcounter and use it normally.
>> +	 */
>> +	if (ret == -EALREADY) {
>> +		/* caller is expected to increase the sock refcounter before
>> +		 * passing it to this function. For this reason we drop it if
>> +		 * not needed, like when this socket is already owned.
>> +		 */
>> +		ovpn_sock = ovpn_socket_get(sock);
>> +		sockfd_put(sock);
> 
> [1] so we would need to add
> 
>      if (!ovpn_sock)
>          return -EAGAIN;

I am not sure returning -EAGAIN is the right move at this point.
We don't know when the scheduled worker will execute, so we don't know 
when to try again.

Maybe we should call cancel_sync_work(&ovpn_sock->work) inside 
ovpn_socket_get()?
So the latter will return NULL only when it is sure that the socket has 
been detached.

At that point we can skip the following return and continue along the 
"new socket" path.

What do you think?

However, this makes we wonder: what happens if we have two racing 
PEER_NEW with the same non-yet-attached UDP socket?

Maybe we should lock the socket in ovpn_udp_socket_attach() when 
checking its user-data and setting it (in order to make the test-and-set 
atomic)?

I am specifically talking about this in udp.c:

345         /* make sure no pre-existing encapsulation handler exists */
346         rcu_read_lock();
347         old_data = rcu_dereference_sk_user_data(sock->sk);
348         if (!old_data) {
349                 /* socket is currently unused - we can take it */
350                 rcu_read_unlock();
351                 setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
352                 return 0;
353         }

We will end up returning 0 in both contexts and thus allocate two 
ovpn_sockets instead of re-using the first one we allocated.

Does it make sense?

> 
>> +		return ovpn_sock;
>> +	}
>> +
> 
> [...]
>> +int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_priv *ovpn)
>> +{
>> +	struct ovpn_socket *old_data;
>> +	int ret = 0;
>> +
>> +	/* make sure no pre-existing encapsulation handler exists */
>> +	rcu_read_lock();
>> +	old_data = rcu_dereference_sk_user_data(sock->sk);
>> +	if (!old_data) {
>> +		/* socket is currently unused - we can take it */
>> +		rcu_read_unlock();
>> +		return 0;
>> +	}
>> +
>> +	/* socket is in use. We need to understand if it's owned by this ovpn
>> +	 * instance or by something else.
>> +	 * In the former case, we can increase the refcounter and happily
>> +	 * use it, because the same UDP socket is expected to be shared among
>> +	 * different peers.
>> +	 *
>> +	 * Unlikely TCP, a single UDP socket can be used to talk to many remote
> 
> (since I'm commenting on this patch:)
> 
> s/Unlikely/Unlike/

ACK

> 
> [I have some more nits/typos here and there but I worry the
> maintainers will get "slightly" annoyed if I make you repost 22
> patches once again :) -- if that's all I find in the next few days,
> everyone might be happier if I stash them and we get them fixed after
> merging?]

If we have to rework this socket attaching part, it may be worth 
throwing in those typ0 fixes too :)

Thanks a lot.

Regards,
Sabrina Dubroca Dec. 16, 2024, 11:09 a.m. UTC | #3
2024-12-12, 23:46:11 +0100, Antonio Quartulli wrote:
> On 12/12/2024 17:19, Sabrina Dubroca wrote:
> > 2024-12-11, 22:15:10 +0100, Antonio Quartulli wrote:
> > > +static struct ovpn_socket *ovpn_socket_get(struct socket *sock)
> > > +{
> > > +	struct ovpn_socket *ovpn_sock;
> > > +
> > > +	rcu_read_lock();
> > > +	ovpn_sock = rcu_dereference_sk_user_data(sock->sk);
> > > +	if (WARN_ON(!ovpn_socket_hold(ovpn_sock)))
> > 
> > Could we hit this situation when we're removing the last peer (so
> > detaching its socket) just as we're adding a new one? ovpn_socket_new
> > finds the socket already attached and goes through the EALREADY path,
> > but the refcount has already dropped to 0?
> > 
> 
> hm good point.
> 
> > Then we'd also return NULL from ovpn_socket_new [1], which I don't
> > think is handled well by the caller (at least the netdev_dbg call at
> > the end of ovpn_nl_peer_modify, maybe other spots too).
> > 
> > (I guess it's not an issue you would see with the existing userspace
> > if it's single-threaded)
> 
> The TCP patch 11/22 will convert the socket release routine to a scheduled
> worker.

Oh right, I forgot about that.

> This means we can have the following flow:
> 1) userspace deletes a peer -> peer drops its reference to the ovpn_socket
> 2) ovpn_socket refcnt may hit 0 -> cleanup/detach work is scheduled, but not
> yet executed
> 3) userspace adds a new peer -> attach returns -EALREADY but refcnt is 0
> 
> So not so impossible, even with a single-threaded userspace software.

True, that seems possible.

> > [...]
> > > +struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
> > > +{
> > > +	struct ovpn_socket *ovpn_sock;
> > > +	int ret;
> > > +
> > > +	ret = ovpn_socket_attach(sock, peer);
> > > +	if (ret < 0 && ret != -EALREADY)
> > > +		return ERR_PTR(ret);
> > > +
> > > +	/* if this socket is already owned by this interface, just increase the
> > > +	 * refcounter and use it as expected.
> > > +	 *
> > > +	 * Since UDP sockets can be used to talk to multiple remote endpoints,
> > > +	 * openvpn normally instantiates only one socket and shares it among all
> > > +	 * its peers. For this reason, when we find out that a socket is already
> > > +	 * used for some other peer in *this* instance, we can happily increase
> > > +	 * its refcounter and use it normally.
> > > +	 */
> > > +	if (ret == -EALREADY) {
> > > +		/* caller is expected to increase the sock refcounter before
> > > +		 * passing it to this function. For this reason we drop it if
> > > +		 * not needed, like when this socket is already owned.
> > > +		 */
> > > +		ovpn_sock = ovpn_socket_get(sock);
> > > +		sockfd_put(sock);
> > 
> > [1] so we would need to add
> > 
> >      if (!ovpn_sock)
> >          return -EAGAIN;
> 
> I am not sure returning -EAGAIN is the right move at this point.
> We don't know when the scheduled worker will execute, so we don't know when
> to try again.

Right.

> Maybe we should call cancel_sync_work(&ovpn_sock->work) inside
> ovpn_socket_get()?
> So the latter will return NULL only when it is sure that the socket has been
> detached.
> 
> At that point we can skip the following return and continue along the "new
> socket" path.
> 
> What do you think?

The work may not have been scheduled yet? (small window between the
last kref_put and schedule_work)

Maybe a completion [Documentation/scheduler/completion.rst] would
solve it (but it makes things even more complex, unfortunately):

 - at the end of ovpn_socket_detach: complete(&ovpn_sock->detached);
 - in ovpn_socket_new when handling EALREADY: wait_for_completion(&ovpn_sock->detached);
 - in ovpn_socket_new for the new socket: init_completion(&ovpn_sock->detached);

but ovpn_sock could be gone immediately after complete(). Maybe
something with completion_done() before the kfree_rcu in
ovpn_socket_detach? I'm not that familiar with the completion API.


> However, this makes we wonder: what happens if we have two racing PEER_NEW
> with the same non-yet-attached UDP socket?

mhmm, I remember noticing that, but it seems I never mentioned it in
my reviews. Sorry.

> Maybe we should lock the socket in ovpn_udp_socket_attach() when checking
> its user-data and setting it (in order to make the test-and-set atomic)?

I'd use the lock to protect all of ovpn_socket_new.
ovpn_tcp_socket_attach locks the socket but after doing the initial
checks, so 2 callers could both see sock->sk->sk_user_data == NULL and
do the full attach. And I don't think unlocking before
rcu_assign_sk_user_data is safe for either UDP or TCP.

> I am specifically talking about this in udp.c:
> 
> 345         /* make sure no pre-existing encapsulation handler exists */
> 346         rcu_read_lock();
> 347         old_data = rcu_dereference_sk_user_data(sock->sk);
> 348         if (!old_data) {
> 349                 /* socket is currently unused - we can take it */
> 350                 rcu_read_unlock();
> 351                 setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
> 352                 return 0;
> 353         }
> 
> We will end up returning 0 in both contexts and thus allocate two
> ovpn_sockets instead of re-using the first one we allocated.
> 
> Does it make sense?

Yes.

[...]
> > [I have some more nits/typos here and there but I worry the
> > maintainers will get "slightly" annoyed if I make you repost 22
> > patches once again :) -- if that's all I find in the next few days,
> > everyone might be happier if I stash them and we get them fixed after
> > merging?]
> 
> If we have to rework this socket attaching part, it may be worth throwing in
> those typ0 fixes too :)

ACK, I'll send them out.

> Thanks a lot.

Thanks again for your patience.
Antonio Quartulli Dec. 16, 2024, 11:50 a.m. UTC | #4
On 16/12/2024 12:09, Sabrina Dubroca wrote:
[...]
>> Maybe we should call cancel_sync_work(&ovpn_sock->work) inside
>> ovpn_socket_get()?
>> So the latter will return NULL only when it is sure that the socket has been
>> detached.
>>
>> At that point we can skip the following return and continue along the "new
>> socket" path.
>>
>> What do you think?
> 
> The work may not have been scheduled yet? (small window between the
> last kref_put and schedule_work)
> 
> Maybe a completion [Documentation/scheduler/completion.rst] would
> solve it (but it makes things even more complex, unfortunately):
> 
>   - at the end of ovpn_socket_detach: complete(&ovpn_sock->detached);
>   - in ovpn_socket_new when handling EALREADY: wait_for_completion(&ovpn_sock->detached);
>   - in ovpn_socket_new for the new socket: init_completion(&ovpn_sock->detached);
> 
> but ovpn_sock could be gone immediately after complete(). Maybe
> something with completion_done() before the kfree_rcu in
> ovpn_socket_detach? I'm not that familiar with the completion API.
> 

It seems the solution we are aiming for is more complex than the concept 
of ovpn_socket per se :-D

I'll think a bit more about this..maybe we can avoid entering this 
situation at all..

> 
>> However, this makes we wonder: what happens if we have two racing PEER_NEW
>> with the same non-yet-attached UDP socket?
> 
> mhmm, I remember noticing that, but it seems I never mentioned it in
> my reviews. Sorry.
> 
>> Maybe we should lock the socket in ovpn_udp_socket_attach() when checking
>> its user-data and setting it (in order to make the test-and-set atomic)?
> 
> I'd use the lock to protect all of ovpn_socket_new.
> ovpn_tcp_socket_attach locks the socket but after doing the initial
> checks, so 2 callers could both see sock->sk->sk_user_data == NULL and
> do the full attach. And I don't think unlocking before
> rcu_assign_sk_user_data is safe for either UDP or TCP.

I tend to agree here. Guarding the whole ovpn_socket_new with 
lock_sock() seems the right thing to do.

> 
>> I am specifically talking about this in udp.c:
>>
>> 345         /* make sure no pre-existing encapsulation handler exists */
>> 346         rcu_read_lock();
>> 347         old_data = rcu_dereference_sk_user_data(sock->sk);
>> 348         if (!old_data) {
>> 349                 /* socket is currently unused - we can take it */
>> 350                 rcu_read_unlock();
>> 351                 setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
>> 352                 return 0;
>> 353         }
>>
>> We will end up returning 0 in both contexts and thus allocate two
>> ovpn_sockets instead of re-using the first one we allocated.
>>
>> Does it make sense?
> 
> Yes.
> 
> [...]
>>> [I have some more nits/typos here and there but I worry the
>>> maintainers will get "slightly" annoyed if I make you repost 22
>>> patches once again :) -- if that's all I find in the next few days,
>>> everyone might be happier if I stash them and we get them fixed after
>>> merging?]
>>
>> If we have to rework this socket attaching part, it may be worth throwing in
>> those typ0 fixes too :)
> 
> ACK, I'll send them out.

Thanks.

Regards,
Antonio Quartulli Dec. 17, 2024, 12:40 a.m. UTC | #5
On 16/12/2024 12:50, Antonio Quartulli wrote:
> On 16/12/2024 12:09, Sabrina Dubroca wrote:
> [...]
>>> Maybe we should call cancel_sync_work(&ovpn_sock->work) inside
>>> ovpn_socket_get()?
>>> So the latter will return NULL only when it is sure that the socket 
>>> has been
>>> detached.
>>>
>>> At that point we can skip the following return and continue along the 
>>> "new
>>> socket" path.
>>>
>>> What do you think?
>>
>> The work may not have been scheduled yet? (small window between the
>> last kref_put and schedule_work)
>>
>> Maybe a completion [Documentation/scheduler/completion.rst] would
>> solve it (but it makes things even more complex, unfortunately):
>>
>>   - at the end of ovpn_socket_detach: complete(&ovpn_sock->detached);
>>   - in ovpn_socket_new when handling EALREADY: 
>> wait_for_completion(&ovpn_sock->detached);
>>   - in ovpn_socket_new for the new socket: init_completion(&ovpn_sock- 
>> >detached);
>>
>> but ovpn_sock could be gone immediately after complete(). Maybe
>> something with completion_done() before the kfree_rcu in
>> ovpn_socket_detach? I'm not that familiar with the completion API.
>>
> 
> It seems the solution we are aiming for is more complex than the concept 
> of ovpn_socket per se :-D
> 
> I'll think a bit more about this..maybe we can avoid entering this 
> situation at all..

I see that there are some kref_put variants that acquire a lock just 
before hitting zero and running the release cb.

If I implement a kref_put variant that acquires the lock_sock, I could 
then perform the udp detach under lock, thus ensuring that zero'ing the 
refcount and erasing the sk_user_data happens while holding the lock_sock.

This way I should be able to prevent the situation where "sk_user_data 
still says EALREADY, but the refcnt is actually 0".

I hope adding this new API is fine.

I am giving it a try now.

Regards,
diff mbox series

Patch

diff --git a/drivers/net/ovpn/Makefile b/drivers/net/ovpn/Makefile
index ce13499b3e1775a7f2a9ce16c6cb0aa088f93685..56bddc9bef83e0befde6af3c3565bb91731d7b22 100644
--- a/drivers/net/ovpn/Makefile
+++ b/drivers/net/ovpn/Makefile
@@ -13,3 +13,5 @@  ovpn-y += io.o
 ovpn-y += netlink.o
 ovpn-y += netlink-gen.o
 ovpn-y += peer.o
+ovpn-y += socket.o
+ovpn-y += udp.o
diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
new file mode 100644
index 0000000000000000000000000000000000000000..0abac02e13fb4ef1e212dacae075d5b58e872d34
--- /dev/null
+++ b/drivers/net/ovpn/socket.c
@@ -0,0 +1,119 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/net.h>
+#include <linux/netdevice.h>
+#include <linux/udp.h>
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "io.h"
+#include "peer.h"
+#include "socket.h"
+#include "udp.h"
+
+static void ovpn_socket_detach(struct socket *sock)
+{
+	if (!sock)
+		return;
+
+	sockfd_put(sock);
+}
+
+/**
+ * ovpn_socket_release_kref - kref_put callback
+ * @kref: the kref object
+ */
+void ovpn_socket_release_kref(struct kref *kref)
+{
+	struct ovpn_socket *sock = container_of(kref, struct ovpn_socket,
+						refcount);
+
+	ovpn_socket_detach(sock->sock);
+	kfree_rcu(sock, rcu);
+}
+
+static bool ovpn_socket_hold(struct ovpn_socket *sock)
+{
+	return kref_get_unless_zero(&sock->refcount);
+}
+
+static struct ovpn_socket *ovpn_socket_get(struct socket *sock)
+{
+	struct ovpn_socket *ovpn_sock;
+
+	rcu_read_lock();
+	ovpn_sock = rcu_dereference_sk_user_data(sock->sk);
+	if (WARN_ON(!ovpn_socket_hold(ovpn_sock)))
+		ovpn_sock = NULL;
+	rcu_read_unlock();
+
+	return ovpn_sock;
+}
+
+static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
+{
+	int ret = -EOPNOTSUPP;
+
+	if (!sock || !peer)
+		return -EINVAL;
+
+	if (sock->sk->sk_protocol == IPPROTO_UDP)
+		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
+
+	return ret;
+}
+
+/**
+ * ovpn_socket_new - create a new socket and initialize it
+ * @sock: the kernel socket to embed
+ * @peer: the peer reachable via this socket
+ *
+ * Return: an openvpn socket on success or a negative error code otherwise
+ */
+struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
+{
+	struct ovpn_socket *ovpn_sock;
+	int ret;
+
+	ret = ovpn_socket_attach(sock, peer);
+	if (ret < 0 && ret != -EALREADY)
+		return ERR_PTR(ret);
+
+	/* if this socket is already owned by this interface, just increase the
+	 * refcounter and use it as expected.
+	 *
+	 * Since UDP sockets can be used to talk to multiple remote endpoints,
+	 * openvpn normally instantiates only one socket and shares it among all
+	 * its peers. For this reason, when we find out that a socket is already
+	 * used for some other peer in *this* instance, we can happily increase
+	 * its refcounter and use it normally.
+	 */
+	if (ret == -EALREADY) {
+		/* caller is expected to increase the sock refcounter before
+		 * passing it to this function. For this reason we drop it if
+		 * not needed, like when this socket is already owned.
+		 */
+		ovpn_sock = ovpn_socket_get(sock);
+		sockfd_put(sock);
+		return ovpn_sock;
+	}
+
+	ovpn_sock = kzalloc(sizeof(*ovpn_sock), GFP_KERNEL);
+	if (!ovpn_sock)
+		return ERR_PTR(-ENOMEM);
+
+	ovpn_sock->ovpn = peer->ovpn;
+	ovpn_sock->sock = sock;
+	kref_init(&ovpn_sock->refcount);
+
+	rcu_assign_sk_user_data(sock->sk, ovpn_sock);
+
+	return ovpn_sock;
+}
diff --git a/drivers/net/ovpn/socket.h b/drivers/net/ovpn/socket.h
new file mode 100644
index 0000000000000000000000000000000000000000..904814d2b9e9f2b0773bf942372bcbe904ef5474
--- /dev/null
+++ b/drivers/net/ovpn/socket.h
@@ -0,0 +1,48 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2020-2024 OpenVPN, Inc.
+ *
+ *  Author:	James Yonan <james@openvpn.net>
+ *		Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_SOCK_H_
+#define _NET_OVPN_SOCK_H_
+
+#include <linux/net.h>
+#include <linux/kref.h>
+#include <net/sock.h>
+
+struct ovpn_priv;
+struct ovpn_peer;
+
+/**
+ * struct ovpn_socket - a kernel socket referenced in the ovpn code
+ * @ovpn: ovpn instance owning this socket (UDP only)
+ * @sock: the low level sock object
+ * @refcount: amount of contexts currently referencing this object
+ * @rcu: member used to schedule RCU destructor callback
+ */
+struct ovpn_socket {
+	struct ovpn_priv *ovpn;
+	struct socket *sock;
+	struct kref refcount;
+	struct rcu_head rcu;
+};
+
+void ovpn_socket_release_kref(struct kref *kref);
+
+/**
+ * ovpn_socket_put - decrease reference counter
+ * @sock: the socket whose reference counter should be decreased
+ */
+static inline void ovpn_socket_put(struct ovpn_socket *sock)
+{
+	kref_put(&sock->refcount, ovpn_socket_release_kref);
+}
+
+struct ovpn_socket *ovpn_socket_new(struct socket *sock,
+				    struct ovpn_peer *peer);
+
+#endif /* _NET_OVPN_SOCK_H_ */
diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
new file mode 100644
index 0000000000000000000000000000000000000000..c00e07f148d72ff737e732028fd73f82a507fb57
--- /dev/null
+++ b/drivers/net/ovpn/udp.c
@@ -0,0 +1,65 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#include <linux/netdevice.h>
+#include <linux/socket.h>
+#include <linux/udp.h>
+#include <net/udp.h>
+
+#include "ovpnstruct.h"
+#include "main.h"
+#include "socket.h"
+#include "udp.h"
+
+/**
+ * ovpn_udp_socket_attach - set udp-tunnel CBs on socket and link it to ovpn
+ * @sock: socket to configure
+ * @ovpn: the openvp instance to link
+ *
+ * After invoking this function, the sock will be controlled by ovpn so that
+ * any incoming packet may be processed by ovpn first.
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_priv *ovpn)
+{
+	struct ovpn_socket *old_data;
+	int ret = 0;
+
+	/* make sure no pre-existing encapsulation handler exists */
+	rcu_read_lock();
+	old_data = rcu_dereference_sk_user_data(sock->sk);
+	if (!old_data) {
+		/* socket is currently unused - we can take it */
+		rcu_read_unlock();
+		return 0;
+	}
+
+	/* socket is in use. We need to understand if it's owned by this ovpn
+	 * instance or by something else.
+	 * In the former case, we can increase the refcounter and happily
+	 * use it, because the same UDP socket is expected to be shared among
+	 * different peers.
+	 *
+	 * Unlikely TCP, a single UDP socket can be used to talk to many remote
+	 * hosts and therefore openvpn instantiates one only for all its peers
+	 */
+	if ((READ_ONCE(udp_sk(sock->sk)->encap_type) == UDP_ENCAP_OVPNINUDP) &&
+	    old_data->ovpn == ovpn) {
+		netdev_dbg(ovpn->dev,
+			   "provided socket already owned by this interface\n");
+		ret = -EALREADY;
+	} else {
+		netdev_dbg(ovpn->dev,
+			   "provided socket already taken by other user\n");
+		ret = -EBUSY;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h
new file mode 100644
index 0000000000000000000000000000000000000000..3c48a06f15eed624aec0a2a7b871f0e7f3004137
--- /dev/null
+++ b/drivers/net/ovpn/udp.h
@@ -0,0 +1,17 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*  OpenVPN data channel offload
+ *
+ *  Copyright (C) 2019-2024 OpenVPN, Inc.
+ *
+ *  Author:	Antonio Quartulli <antonio@openvpn.net>
+ */
+
+#ifndef _NET_OVPN_UDP_H_
+#define _NET_OVPN_UDP_H_
+
+struct ovpn_priv;
+struct socket;
+
+int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_priv *ovpn);
+
+#endif /* _NET_OVPN_UDP_H_ */
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index d85d671deed3c78f6969189281b9083dcac000c6..edca3e430305a6bffc34e617421f1f3071582e69 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -43,5 +43,6 @@  struct udphdr {
 #define UDP_ENCAP_GTP1U		5 /* 3GPP TS 29.060 */
 #define UDP_ENCAP_RXRPC		6
 #define TCP_ENCAP_ESPINTCP	7 /* Yikes, this is really xfrm encap types. */
+#define UDP_ENCAP_OVPNINUDP	8 /* OpenVPN traffic */
 
 #endif /* _UAPI_LINUX_UDP_H */