diff mbox series

[V3] ax25: Fix refcount leaks caused by ax25_cb_del()

Message ID 20220311014624.51117-1-duoming@zju.edu.cn (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [V3] ax25: Fix refcount leaks caused by ax25_cb_del() | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Duoming Zhou March 11, 2022, 1:46 a.m. UTC
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to
avoid UAF bugs") and commit feef318c855a ("ax25: fix UAF bugs of
net_device caused by rebinding operation") increase the refcounts of
ax25_dev and net_device in ax25_bind() and decrease the matching refcounts
in ax25_kill_by_device() in order to prevent UAF bugs, but there are
reference count leaks.

The root cause of refcount leaks is shown below:

     (Thread 1)                      |      (Thread 2)
ax25_bind()                          |
 ...                                 |
 ax25_addr_ax25dev()                 |
  ax25_dev_hold()   //(1)            |
  ...                                |
 dev_hold_track()   //(2)            |
 ...                                 | ax25_destroy_socket()
                                     |  ax25_cb_del()
                                     |   ...
                                     |   hlist_del_init() //(3)
                                     |
                                     |
     (thread 3)                      |
ax25_kill_by_device()                |
 ...                                 |
 ax25_for_each(s, &ax25_list) {      |
  if (s->ax25_dev == ax25_dev) //(4) |
   ...                               |

Firstly, we use ax25_bind() to increase the refcount of ax25_dev in
position (1) and increase the refcount of net_device in position (2).
Then, we use ax25_cb_del() invoked by ax25_destroy_socket() to delete
ax25_cb in hlist in position (3) before calling ax25_kill_by_device().
Finally, the decrements of refcounts in ax25_kill_by_device() will not
be executed, because no s->ax25_dev equals to ax25_dev in position (4).

This patch adds a flag in ax25_dev in order to prevent reference count
leaks. If the above condition happens, the "test_bit" condition check
in ax25_kill_by_device() could pass and the refcounts could be
decreased properly.

Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Reported-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
Changes in V3:
  - Use one flag in ax25_dev to check.
  - Make commit message more clearer.
  - Fix format problem.

 include/net/ax25.h  | 5 +++++
 net/ax25/af_ax25.c  | 6 ++++++
 net/ax25/ax25_dev.c | 1 +
 3 files changed, 12 insertions(+)

Comments

Dan Carpenter March 11, 2022, 10:53 a.m. UTC | #1
On Fri, Mar 11, 2022 at 09:46:24AM +0800, Duoming Zhou wrote:
> This patch adds a flag in ax25_dev in order to prevent reference count
> leaks. If the above condition happens, the "test_bit" condition check
> in ax25_kill_by_device() could pass and the refcounts could be
> decreased properly.
>

Why are you using test_bit()?  Just use booleans.

I am not a networking person but this whole thing feel like adding on
layers of cruft.  If refcounting needs a lot of additional tracking
information then probably the reference counts are not being done in
the correct location.

> diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
> index 6bd09718077..fc564b87acc 100644
> --- a/net/ax25/af_ax25.c
> +++ b/net/ax25/af_ax25.c
> @@ -86,6 +86,7 @@ static void ax25_kill_by_device(struct net_device *dev)
>  again:
>  	ax25_for_each(s, &ax25_list) {
>  		if (s->ax25_dev == ax25_dev) {
> +			set_bit(AX25_DEV_KILL, &ax25_dev->flag);
>  			sk = s->sk;
>  			if (!sk) {
>  				spin_unlock_bh(&ax25_list_lock);

Why can this not be a local variable.

	bool found = false;

	ax25_for_each(s, &ax25_list) {
		if (s->ax25_dev == ax25_dev) {
			found = true;
			...

> @@ -115,6 +116,10 @@ static void ax25_kill_by_device(struct net_device *dev)
>  		}
>  	}
>  	spin_unlock_bh(&ax25_list_lock);
> +	if (!test_bit(AX25_DEV_KILL, &ax25_dev->flag) && test_bit(AX25_DEV_BIND, &ax25_dev->flag)) {

The comments for ax25_kill_by_device() say:

/*
 *      Kill all bound sockets on a dropped device.
 */

So how can test_bit(AX25_DEV_BIND, &ax25_dev->flag) ever be false at
this location?

	if (!found) {
		dev_put_track(ax25_dev->dev, &ax25_dev->dev_tracker);
		ax25_dev_put(ax25_dev);
	}

But even here, my instinct is that if the refcounting is were done in
the correct place we would not need any additional variables.  Is there
no simpler solution?

> diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
> index d2a244e1c26..9b04d74a1be 100644
> --- a/net/ax25/ax25_dev.c
> +++ b/net/ax25/ax25_dev.c
> @@ -77,6 +77,7 @@ void ax25_dev_device_up(struct net_device *dev)
>  	ax25_dev->values[AX25_VALUES_PACLEN]	= AX25_DEF_PACLEN;
>  	ax25_dev->values[AX25_VALUES_PROTOCOL]  = AX25_DEF_PROTOCOL;
>  	ax25_dev->values[AX25_VALUES_DS_TIMEOUT]= AX25_DEF_DS_TIMEOUT;
> +	ax25_dev->flag = AX25_DEV_INIT;

There is no need for this, it's allocated with kzalloc().

>  
>  #if defined(CONFIG_AX25_DAMA_SLAVE) || defined(CONFIG_AX25_DAMA_MASTER)
>  	ax25_ds_setup_timer(ax25_dev);

regards,
dan carpenter
Duoming Zhou March 13, 2022, 4:26 a.m. UTC | #2
Hello,

On Fri, Mar 11, 2022 Dan Carpen wrote:

> But even here, my instinct is that if the refcounting is were done in
> the correct place we would not need any additional variables.  Is there
> no simpler solution?

I sent "[PATCH net V2 1/2] ax25: Fix refcount leaks caused by ax25_cb_del()"
on On Fri, Mar 11, 2022. Could this patch solve your question?

Best wishes,
Duoming Zhou
Dan Carpenter March 14, 2022, 11:02 a.m. UTC | #3
On Sun, Mar 13, 2022 at 12:26:45PM +0800, 周多明 wrote:
> Hello,
> 
> On Fri, Mar 11, 2022 Dan Carpen wrote:
> 
> > But even here, my instinct is that if the refcounting is were done in
> > the correct place we would not need any additional variables.  Is there
> > no simpler solution?
> 
> I sent "[PATCH net V2 1/2] ax25: Fix refcount leaks caused by ax25_cb_del()"
> on On Fri, Mar 11, 2022. Could this patch solve your question?

I had a bunch of questions...  You just ignored them, and sent a patch
called v2 instead of v4 so I was puzzled and confused. I guess the
answer is no, could you please answer the questions?

regards,
dan carpenter
Duoming Zhou March 15, 2022, 1:14 a.m. UTC | #4
Hello,

On Mon, 14 Mar 2022 14:02:56 +0300, Dan Carpen wrote:

> > > But even here, my instinct is that if the refcounting is were done in
> > > the correct place we would not need any additional variables.  Is there
> > > no simpler solution?

I think there is a simpler solution instead of using any additional variables,
which is shown below:

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 6bd09718077..0886109421a 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -98,8 +98,10 @@ static void ax25_kill_by_device(struct net_device *dev)
 			spin_unlock_bh(&ax25_list_lock);
 			lock_sock(sk);
 			s->ax25_dev = NULL;
-			dev_put_track(ax25_dev->dev, &ax25_dev->dev_tracker);
-			ax25_dev_put(ax25_dev);
+			if (sk->sk_wq) {
+				dev_put_track(ax25_dev->dev, &ax25_dev->dev_tracker);
+				ax25_dev_put(ax25_dev);
+			}
 			ax25_disconnect(s, ENETUNREACH);
 			release_sock(sk);
 			spin_lock_bh(&ax25_list_lock);
@@ -979,14 +981,20 @@ static int ax25_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 	ax25_cb *ax25;
+	ax25_dev *ax25_dev;
 
 	if (sk == NULL)
 		return 0;
 
 	sock_hold(sk);
-	sock_orphan(sk);
 	lock_sock(sk);
+	sock_orphan(sk);
 	ax25 = sk_to_ax25(sk);
+	ax25_dev = ax25->ax25_dev;
+	if (ax25_dev) {
+		dev_put_track(ax25_dev->dev, &ax25_dev->dev_tracker);
+		ax25_dev_put(ax25_dev);
+	}
 
 	if (sk->sk_type == SOCK_SEQPACKET) {
 		switch (ax25->state) {

we add decrements of refcounts in ax25_release(), and use lock_sock() to do synchronization. 
If refcounts decrease in ax25_release(), the decrements of refcounts in ax25_kill_by_device() 
will not be executed and vice versa. 

1. If we decrease the refcounts in ax25_release(), the decrements of refcounts in 
ax25_kill_by_device() will not execute. Because we set NULL to sk->sk_wq in sock_orphan()
and we check whether sk->sk_wq is NULL in ax25_kill_by_device(). Only positions (3) and (4)
could execute.

     (Thread 1)                      |      
ax25_bind()                          |
 ...                                 |
 ax25_addr_ax25dev()                 |
  ax25_dev_hold()   //(1)            |
  ...                                |
 dev_hold_track()   //(2)            |     (Thread 2)
 ...                                 | ax25_release()
                                     |  ...
                                     |   lock_sock(sk)
                                     |   sock_orphan(sk)
                                     |     sk->sk_wq  = NULL //set NULL
                                     |   ...
                                     |   if (ax25_dev) {
                                     |    dev_put_track() //(3)
                                     |    ax25_dev_put() //(4)
                                     |
     (thread 3)                      |
ax25_kill_by_device()                |
 ...                                 |
 lock_sock(sk)                       |
 s->ax25_dev = NULL                  |
 if (sk->sk_wq) { //check            |
  dev_put_track() //(5)              |
  ax25_dev_put() //(6)               |


2. If we decrease refcounts in ax25_kill_by_device(), the decrements of refcounts 
in ax25_release() will not execute. Because we set NULL to s->ax25_dev in ax25_kill_by_device()
and we check whether ax25_dev is NULL in ax25_release(). Only positions (3) and (4)
could execute.

     (Thread 1)                      | 
ax25_bind()                          |
 ...                                 |
 ax25_addr_ax25dev()                 |
  ax25_dev_hold()   //(1)            |
  ...                                |
 dev_hold_track()   //(2)            |      (Thread 2)
 ...                                 | ax25_kill_by_device()
                                     |  ...
                                     |   lock_sock(sk)
                                     |   s->ax25_dev = NULL //set NULL
                                     |   if (sk->sk_wq) {
                                     |    dev_put_track() //(3)
                                     |    ax25_dev_put() //(4)
                                     |    ...
                                     |
     (thread 3)                      |
ax25_release()                       |
 ...                                 |
 lock_sock(sk);                      |
 sock_orphan(sk);                    |
 ...                                 |
 if (ax25_dev) { //check             |
  dev_put_track() //(5)              |
  ax25_dev_put() //(6)               |

> > I sent "[PATCH net V2 1/2] ax25: Fix refcount leaks caused by ax25_cb_del()"
> > on On Fri, Mar 11, 2022. Could this patch solve your question?
> 
> I had a bunch of questions...  You just ignored them, and sent a patch
> called v2 instead of v4 so I was puzzled and confused. I guess the
> answer is no, could you please answer the questions?

I hope my answer could solve your questions. If you still have any questions
welcome to send email to me. I will send "[PATCH net V4 1/2] ax25: Fix
refcount leaks caused by ax25_cb_del()" as soon as possible. 

What`s more, I found NPD bugs in ax25 timers, I will send
"[PATCH net V4 2/2] ax25: Fix NULL pointer dereferences in ax25 timers" together.

Best wishes,
Duoming Zhou
diff mbox series

Patch

diff --git a/include/net/ax25.h b/include/net/ax25.h
index 8221af1811d..ea6ca385190 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -158,6 +158,10 @@  enum {
 #define	AX25_DEF_PROTOCOL	AX25_PROTO_STD_SIMPLEX	/* Standard AX.25 */
 #define AX25_DEF_DS_TIMEOUT	180000			/* DAMA timeout 3 minutes */
 
+#define AX25_DEV_INIT    0
+#define AX25_DEV_KILL    0
+#define AX25_DEV_BIND    1
+
 typedef struct ax25_uid_assoc {
 	struct hlist_node	uid_node;
 	refcount_t		refcount;
@@ -240,6 +244,7 @@  typedef struct ax25_dev {
 	ax25_dama_info		dama;
 #endif
 	refcount_t		refcount;
+	unsigned long	flag;
 } ax25_dev;
 
 typedef struct ax25_cb {
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 6bd09718077..fc564b87acc 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -86,6 +86,7 @@  static void ax25_kill_by_device(struct net_device *dev)
 again:
 	ax25_for_each(s, &ax25_list) {
 		if (s->ax25_dev == ax25_dev) {
+			set_bit(AX25_DEV_KILL, &ax25_dev->flag);
 			sk = s->sk;
 			if (!sk) {
 				spin_unlock_bh(&ax25_list_lock);
@@ -115,6 +116,10 @@  static void ax25_kill_by_device(struct net_device *dev)
 		}
 	}
 	spin_unlock_bh(&ax25_list_lock);
+	if (!test_bit(AX25_DEV_KILL, &ax25_dev->flag) && test_bit(AX25_DEV_BIND, &ax25_dev->flag)) {
+		dev_put_track(ax25_dev->dev, &ax25_dev->dev_tracker);
+		ax25_dev_put(ax25_dev);
+	}
 }
 
 /*
@@ -1132,6 +1137,7 @@  static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 done:
 	ax25_cb_add(ax25);
 	sock_reset_flag(sk, SOCK_ZAPPED);
+	set_bit(AX25_DEV_BIND, &ax25_dev->flag);
 
 out:
 	release_sock(sk);
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index d2a244e1c26..9b04d74a1be 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -77,6 +77,7 @@  void ax25_dev_device_up(struct net_device *dev)
 	ax25_dev->values[AX25_VALUES_PACLEN]	= AX25_DEF_PACLEN;
 	ax25_dev->values[AX25_VALUES_PROTOCOL]  = AX25_DEF_PROTOCOL;
 	ax25_dev->values[AX25_VALUES_DS_TIMEOUT]= AX25_DEF_DS_TIMEOUT;
+	ax25_dev->flag = AX25_DEV_INIT;
 
 #if defined(CONFIG_AX25_DAMA_SLAVE) || defined(CONFIG_AX25_DAMA_MASTER)
 	ax25_ds_setup_timer(ax25_dev);