diff mbox series

[net] ax25: Fix refcount leak issues of ax25_dev

Message ID 20240501060218.32898-1-duoming@zju.edu.cn (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] ax25: Fix refcount leak issues of ax25_dev | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 926 this patch: 926
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 937 this patch: 937
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 937 this patch: 937
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 40 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-02--06-00 (tests: 996)

Commit Message

Duoming Zhou May 1, 2024, 6:02 a.m. UTC
There are two scenarios that might cause refcount leak
issues of ax25_dev.

Scenario one:

The refcount of ax25_dev potentially increase more
than once in ax25_addr_ax25dev(), which will cause
memory leak.

In order to fix the above issue, only increase the
refcount of ax25_dev once, when the res is not null.

Scenario two:

The original code sets the refcount of ax25_dev to 1
in the initial stage and then increase the refcount
when the ax25_dev is added to the ax25_dev_list. As a
result, the refcount of ax25_dev is 2. But when the
device is shutting down. The ax25_dev_device_down()
drops the refcount once or twice depending on if we
goto unlock_put or not, which will cause memory leak.

In order to mitigate the above issues, only increase
the refcount of ax25_dev when the ax25_dev is added
to the ax25_dev_list and decrease the refcount of
ax25_dev after it is removed from the ax25_dev_list.

What's more, the ax25_dev should not be deallocated
directly by kfree() in ax25_dev_free(), replace it
with ax25_dev_put() instead.

Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 net/ax25/ax25_dev.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

Comments

Markus Elfring May 1, 2024, 5:33 p.m. UTC | #1
> In order to mitigate the above issues, only increase
> the refcount of ax25_dev when the ax25_dev is added
> to the ax25_dev_list and decrease the refcount of
> ax25_dev after it is removed from the ax25_dev_list.
…

* I suggest to use more than 53 characters in lines of such a change description.

* Can it be nicer to mention also the term “reference counting” for
  an improved commit message?

Regards,
Markus
Dan Carpenter May 1, 2024, 5:43 p.m. UTC | #2
I'm always happy to take credit for stuff but the Reported by should go
to Lars and Miroslav.

Reported-by: Lars Kellogg-Stedman <lars@oddbit.com>
Reported-by: Miroslav Skoric <skoric@uns.ac.rs>

Lars, could you test this please and let us know if it helps?

regards,
dan carpenter
Lars Kellogg-Stedman May 2, 2024, 1:29 a.m. UTC | #3
On Wed, May 01, 2024 at 02:02:18PM +0800, Duoming Zhou wrote:
> There are two scenarios that might cause refcount leak
> issues of ax25_dev.

This patch doesn't address the refcount leaks I reported earlier and
resolved in the patch I posted [1] last week.

Assume we have the following two interfaces configured on a system:

    $ cat /etc/ax25/axports
    udp0 test0-0 9600 255 2 axudp0
    udp1 test0-1 9600 255 2 axudp1

And we have ax25d listening on both interfaces:

    [udp0]
    default  * * * * * *  - root  /usr/sbin/axwrapper axwrapper -- /bin/sh sh /etc/ax25/example-output.sh
    [udp1]
    default  * * * * * *  - root  /usr/sbin/axwrapper axwrapper -- /bin/sh sh /etc/ax25/example-output.sh

Using the 'ax-devs' and 'ax-sockets' gdb commands shown at the end of
this message, we start with:

    (gdb) ax-devs
    ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
    ax0 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
    (gdb) ax-sockets
    0xffff8881002b6800 if:ax1 state:0 refcnt:2 dev_tracker:0xffff888100ded200
    0xffff888101ac4e00 if:ax0 state:0 refcnt:2 dev_tracker:0xffff888100dec4c0

We initiate a connection from ax0 to ax1:

    call -r udp0 test0-1

When we first enter ax25_rcv, we have:

    (gdb) ax-devs
    ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
    ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
    (gdb) ax-sockets
    0xffff888101ac8000 if:ax0 state:1 refcnt:2 dev_tracker:0xffff888100dedb80
    0xffff8881002b6800 if:ax1 state:0 refcnt:2 dev_tracker:0xffff888100ded200
    0xffff888101ac4e00 if:ax0 state:0 refcnt:2 dev_tracker:0xffff888100dec4c0

After we reach line 413 (in net/ax25/ax25_in.c) and add a new control
block:

    ax25_cb_add(ax25)

We have:

    (gdb) ax-devs
    ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
    ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
    (gdb) ax-sockets
    0xffff88810245ac00 if:ax1 state:3 refcnt:2 dev_tracker:0x0 <fixed_percpu_data>
    0xffff88810245ba00 if:ax0 state:1 refcnt:2 dev_tracker:0xffff88810136c800
    0xffff888100c79e00 if:ax1 state:0 refcnt:2 dev_tracker:0xffff88810136c6e0
    0xffff8881018e9800 if:ax0 state:0 refcnt:2 dev_tracker:0xffff88810170c860

Note that (a) ax25->dev_tracker is NULL, and (b) we have incremeted the
refcount on ax0 (the source interface), but not on ax1 (the destination
interface). When we call ax25_release for this control block, we get to:

    netdev_put(ax25_dev->dev, &ax25->dev_tracker);
    ax25_dev_put(ax25_dev);

With:

    (gdb) ax-devs
    ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
    ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1

After the calls to netdev_put() and ax25_dev_put(), we have:

    (gdb) ax-devs
    ax1 ax_refcnt:1 dev_refcnt:8 dev_untracked:-1073741824 dev_notrack:1
    ax0 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1

You can see that (a) ax25_dev->dev->refcnt_tracker->untracked is now
invalid, and ax25_dev->dev->dev_refcnt is in trouble: it decrements by
one for each closed connection, even though it was never incremented
when we accepted the connection. The underflow in
...refcnt_tracker->untracked yields the traceback with:

    refcount_t: decrement hit 0; leaking memory.

Additional connections will eventually trigger more problems; we will
ultimately underflow ax25_dev->dev->dev_refcnt, but we may also run into
memory corruption because of the invalid tracker data, resulting in:

    BUG: unable to handle page fault for address: 00000010000003b0

The patch I submitted last week resolves all of the above issues and has
no refcount leaks for this particular code path. In order to avoid the
refcount leaks, those _put() calls in ax25_release need to be balanced
by _hold() calls when accepting a new connection (or we need to wrap
them in a conditional so that they're not called when ax25->dev_tracker
is NULL).

GDB commands:

    define ax-devs
      set $x = ax25_dev_list
      while ($x != 0)
        printf "%s ax_refcnt:%d dev_refcnt:%d dev_untracked:%d dev_notrack:%d\n", $x->dev->name, \
          $x->refcount->refs->counter, \
          $x->dev->dev_refcnt->refs->counter, \
          $x->dev->refcnt_tracker->untracked->refs->counter, \
          $x->dev->refcnt_tracker->no_tracker->refs->counter
        set $x = $x->next
      end
    end

    define ax-sockets
      set $x = ax25_list->first
      while ($x != 0)
        set $cb = (ax25_cb *)($x)

        printf "%s if:%s state:%d refcnt:%d dev_tracker:%s\n", \
          $_as_string($cb), \
          $cb->ax25_dev->dev->name, \
          $cb->state, \
          $cb->refcount->refs->counter, \
          $_as_string($cb->dev_tracker)
        set $x = $x->next
      end
    end

[1]: https://marc.info/?l=linux-hams&m=171447153903965&w=2

--
Lars Kellogg-Stedman <lars@oddbit.com> | larsks @ {irc,twitter,github}
http://blog.oddbit.com/                | N1LKS
Duoming Zhou May 2, 2024, 4:35 a.m. UTC | #4
On Wed, 1 May 2024 20:43:37 +0300 Dan Carpenter wrote:
> I'm always happy to take credit for stuff but the Reported by should go
> to Lars and Miroslav.
> 
> Reported-by: Lars Kellogg-Stedman <lars@oddbit.com>
> Reported-by: Miroslav Skoric <skoric@uns.ac.rs>

This patch is not related with the problem raised by Lars Kellogg-Stedman
and Miroslav Skoric, it only solves the reference counting leak issues of
ax25_dev in ax25_addr_ax25dev() and ax25_dev_device_down(). So I think
there is no need to change the "Reported by" label.

Best regards,
Duoming Zhou
Dan Carpenter May 2, 2024, 7:56 a.m. UTC | #5
On Thu, May 02, 2024 at 12:35:44PM +0800, duoming@zju.edu.cn wrote:
> On Wed, 1 May 2024 20:43:37 +0300 Dan Carpenter wrote:
> > I'm always happy to take credit for stuff but the Reported by should go
> > to Lars and Miroslav.
> > 
> > Reported-by: Lars Kellogg-Stedman <lars@oddbit.com>
> > Reported-by: Miroslav Skoric <skoric@uns.ac.rs>
> 
> This patch is not related with the problem raised by Lars Kellogg-Stedman
> and Miroslav Skoric, it only solves the reference counting leak issues of
> ax25_dev in ax25_addr_ax25dev() and ax25_dev_device_down(). So I think
> there is no need to change the "Reported by" label.
> 

Ah...  I was really hoping it was related to the other bugs.

Okay, what about we separate this into different patches one for each
bug?  The changes to ax25_addr_ax25dev() and ax25_dev_free() are
obvious and could go in as-is but as two separate patches.

The changes to ax25_dev_device_up/down() are more subtle.

The ax25_dev_list stuff is frustrating.  It would be so much easier if
it were a normal list and you could just do:

        /*
         *      Remove any packet forwarding that points to this device.
         */
        list_for_each_entry(s, ax25_dev_list, list) {
                if (s->forward == dev)
                        s->forward = NULL;
        }

        list_for_each_entry(s, ax25_dev_list, list) {
                if (s == ax25_dev) {
                        list_del(s);
                        free_net = true;
                        break;
                }
        }

        spin_unlock_bh(&ax25_dev_lock);
        dev->ax25_ptr = NULL;
        if (free_net)
                netdev_put(dev, &ax25_dev->dev_tracker);
        ax25_dev_put(ax25_dev);
}

Why do we call netdev_put() on that one path?  Btw, here is an untested
conversion to lists...

regards,
dan carpenter

diff --git a/include/net/ax25.h b/include/net/ax25.h
index 0d939e5aee4e..c2a85fd3f5ea 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -216,7 +216,7 @@ typedef struct {
 struct ctl_table;
 
 typedef struct ax25_dev {
-	struct ax25_dev		*next;
+	struct list_head	list;
 
 	struct net_device	*dev;
 	netdevice_tracker	dev_tracker;
@@ -330,7 +330,6 @@ int ax25_addr_size(const ax25_digi *);
 void ax25_digi_invert(const ax25_digi *, ax25_digi *);
 
 /* ax25_dev.c */
-extern ax25_dev *ax25_dev_list;
 extern spinlock_t ax25_dev_lock;
 
 #if IS_ENABLED(CONFIG_AX25)
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index 282ec581c072..b632af38f1e1 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -22,11 +22,12 @@
 #include <net/sock.h>
 #include <linux/uaccess.h>
 #include <linux/fcntl.h>
+#include <linux/list.h>
 #include <linux/mm.h>
 #include <linux/interrupt.h>
 #include <linux/init.h>
 
-ax25_dev *ax25_dev_list;
+static struct list_head ax25_dev_list;
 DEFINE_SPINLOCK(ax25_dev_lock);
 
 ax25_dev *ax25_addr_ax25dev(ax25_address *addr)
@@ -34,11 +35,12 @@ ax25_dev *ax25_addr_ax25dev(ax25_address *addr)
 	ax25_dev *ax25_dev, *res = NULL;
 
 	spin_lock_bh(&ax25_dev_lock);
-	for (ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next)
+	list_for_each_entry(ax25_dev, &ax25_dev_list, list) {
 		if (ax25cmp(addr, (const ax25_address *)ax25_dev->dev->dev_addr) == 0) {
 			res = ax25_dev;
 			ax25_dev_hold(ax25_dev);
 		}
+	}
 	spin_unlock_bh(&ax25_dev_lock);
 
 	return res;
@@ -52,6 +54,10 @@ void ax25_dev_device_up(struct net_device *dev)
 {
 	ax25_dev *ax25_dev;
 
+	// FIXME: do call this in probe or something
+	if (!ax25_dev_list.next)
+		INIT_LIST_HEAD(&ax25_dev_list);
+
 	ax25_dev = kzalloc(sizeof(*ax25_dev), GFP_KERNEL);
 	if (!ax25_dev) {
 		printk(KERN_ERR "AX.25: ax25_dev_device_up - out of memory\n");
@@ -85,8 +91,7 @@ void ax25_dev_device_up(struct net_device *dev)
 #endif
 
 	spin_lock_bh(&ax25_dev_lock);
-	ax25_dev->next = ax25_dev_list;
-	ax25_dev_list  = ax25_dev;
+	list_add(&ax25_dev->list, &ax25_dev_list);
 	spin_unlock_bh(&ax25_dev_lock);
 	ax25_dev_hold(ax25_dev);
 
@@ -111,23 +116,18 @@ void ax25_dev_device_down(struct net_device *dev)
 	/*
 	 *	Remove any packet forwarding that points to this device.
 	 */
-	for (s = ax25_dev_list; s != NULL; s = s->next)
+	list_for_each_entry(s, &ax25_dev_list, list) {
 		if (s->forward == dev)
 			s->forward = NULL;
-
-	if ((s = ax25_dev_list) == ax25_dev) {
-		ax25_dev_list = s->next;
-		goto unlock_put;
 	}
 
-	while (s != NULL && s->next != NULL) {
-		if (s->next == ax25_dev) {
-			s->next = ax25_dev->next;
+	list_for_each_entry(s, &ax25_dev_list, list) {
+		if (s == ax25_dev) {
+			list_del(&s->list);
 			goto unlock_put;
 		}
-
-		s = s->next;
 	}
+
 	spin_unlock_bh(&ax25_dev_lock);
 	dev->ax25_ptr = NULL;
 	ax25_dev_put(ax25_dev);
@@ -200,16 +200,13 @@ struct net_device *ax25_fwd_dev(struct net_device *dev)
  */
 void __exit ax25_dev_free(void)
 {
-	ax25_dev *s, *ax25_dev;
+	ax25_dev *s, *n;
 
 	spin_lock_bh(&ax25_dev_lock);
-	ax25_dev = ax25_dev_list;
-	while (ax25_dev != NULL) {
-		s        = ax25_dev;
-		netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker);
-		ax25_dev = ax25_dev->next;
+	list_for_each_entry_safe(s, n, &ax25_dev_list, list) {
+		netdev_put(s->dev, &s->dev_tracker);
+		list_del(&s->list);
 		kfree(s);
 	}
-	ax25_dev_list = NULL;
 	spin_unlock_bh(&ax25_dev_lock);
 }
Paolo Abeni May 2, 2024, 9:30 a.m. UTC | #6
On Thu, 2024-05-02 at 10:56 +0300, Dan Carpenter wrote:
> On Thu, May 02, 2024 at 12:35:44PM +0800, duoming@zju.edu.cn wrote:
> > On Wed, 1 May 2024 20:43:37 +0300 Dan Carpenter wrote:
> > > I'm always happy to take credit for stuff but the Reported by should go
> > > to Lars and Miroslav.
> > > 
> > > Reported-by: Lars Kellogg-Stedman <lars@oddbit.com>
> > > Reported-by: Miroslav Skoric <skoric@uns.ac.rs>
> > 
> > This patch is not related with the problem raised by Lars Kellogg-Stedman
> > and Miroslav Skoric, it only solves the reference counting leak issues of
> > ax25_dev in ax25_addr_ax25dev() and ax25_dev_device_down(). So I think
> > there is no need to change the "Reported by" label.
> > 
> 
> Ah...  I was really hoping it was related to the other bugs.
> 
> Okay, what about we separate this into different patches one for each
> bug?  The changes to ax25_addr_ax25dev() and ax25_dev_free() are
> obvious and could go in as-is but as two separate patches.

I agree it would be better to split this up, the changelog itself hints
at 2 separated issues and fixes.

Thanks,

Paolo
Dan Carpenter May 3, 2024, 8:36 p.m. UTC | #7
Could you test this diff?

regards,
dan carpenter

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 558e158c98d0..a7f96a4ceff4 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1129,8 +1129,10 @@ static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	/*
 	 * User already set interface with SO_BINDTODEVICE
 	 */
-	if (ax25->ax25_dev != NULL)
+	if (ax25->ax25_dev != NULL) {
+		ax25_dev_hold(ax25->ax25_dev);
 		goto done;
+	}
 
 	if (addr_len > sizeof(struct sockaddr_ax25) && addr->fsa_ax25.sax25_ndigis == 1) {
 		if (ax25cmp(&addr->fsa_digipeater[0], &null_ax25_address) != 0 &&
Lars Kellogg-Stedman May 3, 2024, 11:40 p.m. UTC | #8
On Fri, May 03, 2024 at 11:36:37PM +0300, Dan Carpenter wrote:
> Could you test this diff?

With that diff applied, there is no kernel panic, but I see the same
refcount errors that I saw before the latest series of patches from
Duoming:

  refcount_t: decrement hit 0; leaking memory.
  refcount_t: underflow; use-after-free.
Dan Carpenter May 4, 2024, 11:04 a.m. UTC | #9
On Wed, May 01, 2024 at 02:02:18PM +0800, Duoming Zhou wrote:
> @@ -58,7 +59,6 @@ void ax25_dev_device_up(struct net_device *dev)
>  		return;
>  	}
>  
> -	refcount_set(&ax25_dev->refcount, 1);

Let's keep this here, and just delete the ax25_dev_hold().  It makes
the diff smaller and I like setting the refcount earlier anyway.

>  	dev->ax25_ptr     = ax25_dev;

Let's move this assignment under the spinlock where ax25_dev_hold() was.

>  	ax25_dev->dev     = dev;
>  	netdev_hold(dev, &ax25_dev->dev_tracker, GFP_KERNEL);
> @@ -88,7 +88,7 @@ void ax25_dev_device_up(struct net_device *dev)
>  	ax25_dev->next = ax25_dev_list;
>  	ax25_dev_list  = ax25_dev;
>  	spin_unlock_bh(&ax25_dev_lock);
> -	ax25_dev_hold(ax25_dev);
> +	refcount_set(&ax25_dev->refcount, 1);
>  
>  	ax25_register_dev_sysctl(ax25_dev);
>  }
> @@ -135,7 +135,6 @@ void ax25_dev_device_down(struct net_device *dev)
>  
>  unlock_put:
>  	spin_unlock_bh(&ax25_dev_lock);
> -	ax25_dev_put(ax25_dev);
>  	dev->ax25_ptr = NULL;
>  	netdev_put(dev, &ax25_dev->dev_tracker);
>  	ax25_dev_put(ax25_dev);

So far as I can see, the ax25_dev should be on the list.  Also, I think
the dev->ax25_ptr = NULL; assignment should be under the lock.  So this
code should just look like:

        list_for_each_entry(s, &ax25_dev_list, list) {
                if (s->forward == dev)
                        s->forward = NULL;
        }

        list_for_each_entry(s, &ax25_dev_list, list) {
                if (s == ax25_dev) {
                        list_del(&s->list);
                        break;
                }
        }
        dev->ax25_ptr = NULL;
        spin_unlock_bh(&ax25_dev_lock);
        netdev_put(dev, &ax25_dev->dev_tracker);
        ax25_dev_put(ax25_dev);
}

Also it should just be on the list once...  In fact, it's impossible for
one pointer to be on a list twice.  So it would be nice to add a break;
in ax25_addr_ax25dev().  It doesn't change the code, it just makes it
more obvious.

ax25_dev *ax25_addr_ax25dev(ax25_address *addr)
{
        ax25_dev *ax25_dev, *res = NULL;

        spin_lock_bh(&ax25_dev_lock);
        list_for_each_entry(ax25_dev, &ax25_dev_list, list) {
                if (ax25cmp(addr, (const ax25_address *)ax25_dev->dev->dev_addr) == 0) {
                        res = ax25_dev;
			ax25_dev_hold(res);
                        break;
                }
        }
        spin_unlock_bh(&ax25_dev_lock);

        return res;
}

regards,
dan carpenter
Dan Carpenter May 4, 2024, 12:16 p.m. UTC | #10
On Fri, May 03, 2024 at 07:40:32PM -0400, Lars Kellogg-Stedman wrote:
> On Fri, May 03, 2024 at 11:36:37PM +0300, Dan Carpenter wrote:
> > Could you test this diff?
> 
> With that diff applied, there is no kernel panic, but I see the same
> refcount errors that I saw before the latest series of patches from
> Duoming:

Wait, which panic is this?  The NULL dereference introduce by the
"ax25_dev" vs "res" typo?

> 
>   refcount_t: decrement hit 0; leaking memory.
>   refcount_t: underflow; use-after-free.

Hm...  Is there a missing netdev_hold() in ax25_bind() on the
"User already set interface with SO_BINDTODEVICE" path?  That would
fit with the commit 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by
ax25_cb_del()") which introduced the bug.

I'm not really sure I understand how netdev_hold() works.

(My patch here is correct, but apparently that's not the bug).

regards,
dan carpenter
Lars Kellogg-Stedman May 4, 2024, 10:16 p.m. UTC | #11
On Sat, May 04, 2024 at 03:16:55PM +0300, Dan Carpenter wrote:
> Wait, which panic is this?  The NULL dereference introduce by the
> "ax25_dev" vs "res" typo?

Right, that one. Your diff was on top of Duoming's patches, which had
earlier introduced that kernel panic. Just confirming that things are
working (for some value of "working") with your latest change.

> >   refcount_t: decrement hit 0; leaking memory.
> >   refcount_t: underflow; use-after-free.
> 
> Hm...  Is there a missing netdev_hold() in ax25_bind() on the
> "User already set interface with SO_BINDTODEVICE" path?

There's a missing netdev_hold() in the path for *inbound* packets
(ax25_rcv -> ax25_accept).

I added some tracepoints [1] so that we can see calls to
netdev_{put,hold} in a function graph. Without my patches, the graph for
an incoming connections looks like [2]:

    # tracer: function_graph
    #
    # CPU  TASK/PID         DURATION                  FUNCTION CALLS
    # |     |    |           |   |                     |   |   |   |
    ------------------------------------------
    0)    <idle>-0    =>   kworker-29  
    ------------------------------------------

    0)   kworker-29   |               |  ax25_kiss_rcv() {
    0)   kworker-29   |               |    ax25_rcv.isra.0() {
    0)   kworker-29   |   0.168 us    |      ax25_addr_parse();
    0)   kworker-29   |   0.094 us    |      ax25_addr_size();
    0)   kworker-29   |   0.136 us    |      ax25cmp();
    0)   kworker-29   |   0.137 us    |      ax25_digi_invert();
    0)   kworker-29   |               |      ax25_find_cb() {
    0)   kworker-29   |   0.223 us    |        ax25cmp();
    0)   kworker-29   |   0.194 us    |        ax25cmp();
    0)   kworker-29   |   0.087 us    |        ax25cmp();
    0)   kworker-29   |   0.975 us    |      }
    0)   kworker-29   |               |      ax25_find_listener() {
    0)   kworker-29   |   0.092 us    |        ax25cmp();
    0)   kworker-29   |   0.084 us    |        ax25cmp();
    0)   kworker-29   |   0.526 us    |      }
    0)   kworker-29   |               |      ax25_make_new() {
    0)   kworker-29   |               |        ax25_create_cb() {
    0)   kworker-29   |   0.117 us    |          ax25_setup_timers();
    0)   kworker-29   |   0.478 us    |        }
    0)   kworker-29   |   1.813 us    |      }
    0)   kworker-29   |               |      ax25_send_control() {
    0)   kworker-29   |               |        ax25_transmit_buffer() {
    0)   kworker-29   |   0.086 us    |          ax25_addr_size();
    0)   kworker-29   |   0.094 us    |          ax25_addr_build();
    0)   kworker-29   |   0.081 us    |          ax25_fwd_dev();
    0)   kworker-29   |   4.854 us    |        }
    0)   kworker-29   |   5.604 us    |      }
    0)   kworker-29   |   0.108 us    |      ax25_cb_add();
    0)   kworker-29   |   0.410 us    |      ax25_start_heartbeat();
    0)   kworker-29   |   0.218 us    |      ax25_start_t3timer();
    0)   kworker-29   |   0.099 us    |      ax25_start_idletimer();
    0)   kworker-29   | + 14.957 us   |    }
    0)   kworker-29   | + 16.116 us   |  }
    ------------------------------------------
    1)   ax25ipd-63   =>    ax25d-77   
    ------------------------------------------

    1)    ax25d-77    |   1.580 us    |  ax25_accept();
    1)    ax25d-77    |   0.211 us    |  ax25_getname();
    1)    ax25d-77    |   0.275 us    |  ax25_getname();
    ------------------------------------------
    0)   kworker-29   =>   axwrapp-83  
    ------------------------------------------

    0)   axwrapp-83   |               |  ax25_sendmsg() {
    0)   axwrapp-83   |               |    ax25_output() {
    0)   axwrapp-83   |               |      ax25_kick.part.0() {
    0)   axwrapp-83   |               |        ax25_send_iframe() {
    0)   axwrapp-83   |   0.679 us    |          ax25_start_idletimer();
    0)   axwrapp-83   |               |          ax25_transmit_buffer() {
    0)   axwrapp-83   |   0.093 us    |            ax25_addr_size();
    0)   axwrapp-83   |   0.102 us    |            ax25_addr_build();
    0)   axwrapp-83   |   0.145 us    |            ax25_fwd_dev();
    0)   axwrapp-83   |   9.311 us    |          }
    0)   axwrapp-83   | + 10.464 us   |        }
    0)   axwrapp-83   |   0.096 us    |        ax25_t1timer_running();
    0)   axwrapp-83   |   0.332 us    |        ax25_stop_t3timer();
    0)   axwrapp-83   |   0.093 us    |        ax25_calculate_t1();
    0)   axwrapp-83   |   0.439 us    |        ax25_start_t1timer();
    0)   axwrapp-83   | + 18.839 us   |      }
    0)   axwrapp-83   | + 19.479 us   |    }
    0)   axwrapp-83   | + 22.647 us   |  }
    ------------------------------------------
    0)   ax25ipd-63   =>   axwrapp-83  
    ------------------------------------------

    0)   axwrapp-83   |               |  ax25_sendmsg() {
    0)   axwrapp-83   |               |    ax25_output() {
    0)   axwrapp-83   |               |      ax25_kick.part.0() {
    0)   axwrapp-83   |               |        ax25_send_iframe() {
    0)   axwrapp-83   |   0.339 us    |          ax25_start_idletimer();
    0)   axwrapp-83   |               |          ax25_transmit_buffer() {
    0)   axwrapp-83   |   0.092 us    |            ax25_addr_size();
    0)   axwrapp-83   |   0.097 us    |            ax25_addr_build();
    0)   axwrapp-83   |   0.136 us    |            ax25_fwd_dev();
    0)   axwrapp-83   |   9.116 us    |          }
    0)   axwrapp-83   |   9.908 us    |        }
    0)   axwrapp-83   |   0.091 us    |        ax25_t1timer_running();
    0)   axwrapp-83   | + 10.810 us   |      }
    0)   axwrapp-83   | + 11.168 us   |    }
    0)   axwrapp-83   | + 14.362 us   |  }
    ------------------------------------------
    0)   axwrapp-83   =>   kworker-10  
    ------------------------------------------

    0)   kworker-10   |               |  ax25_kiss_rcv() {
    0)   kworker-10   |               |    ax25_rcv.isra.0() {
    0)   kworker-10   |   0.116 us    |      ax25_addr_parse();
    0)   kworker-10   |   0.090 us    |      ax25_addr_size();
    0)   kworker-10   |   0.144 us    |      ax25cmp();
    0)   kworker-10   |   0.091 us    |      ax25_digi_invert();
    0)   kworker-10   |               |      ax25_find_cb() {
    0)   kworker-10   |   0.091 us    |        ax25cmp();
    0)   kworker-10   |   0.087 us    |        ax25cmp();
    0)   kworker-10   |   0.550 us    |      }
    0)   kworker-10   |               |      ax25_std_frame_in() {
    0)   kworker-10   |   0.105 us    |        ax25_decode();
    0)   kworker-10   |   0.117 us    |        ax25_validate_nr();
    0)   kworker-10   |               |        ax25_check_iframes_acked() {
    0)   kworker-10   |   0.670 us    |          ax25_frames_acked();
    0)   kworker-10   |               |          ax25_calculate_rtt() {
    0)   kworker-10   |   0.086 us    |            ax25_t1timer_running();
    0)   kworker-10   |   0.085 us    |            ax25_display_timer();
    0)   kworker-10   |   0.465 us    |          }
    0)   kworker-10   |   0.185 us    |          ax25_stop_t1timer();
    0)   kworker-10   |   0.358 us    |          ax25_start_t3timer();
    0)   kworker-10   |   2.186 us    |        }
    0)   kworker-10   |               |        ax25_kick() {
    0)   kworker-10   |   0.093 us    |          ax25_kick.part.0();
    0)   kworker-10   |   0.277 us    |        }
    0)   kworker-10   |   3.320 us    |      }
    0)   kworker-10   |   5.927 us    |    }
    0)   kworker-10   |   6.563 us    |  }
    ------------------------------------------
    0)   kworker-10   =>   axwrapp-83  
    ------------------------------------------

    0)   axwrapp-83   |               |  ax25_release() {
    0)   axwrapp-83   |   0.158 us    |    ax25_clear_queues();
    0)   axwrapp-83   |               |    ax25_send_control() {
    0)   axwrapp-83   |               |      ax25_transmit_buffer() {
    0)   axwrapp-83   |   0.086 us    |        ax25_addr_size();
    0)   axwrapp-83   |   0.086 us    |        ax25_addr_build();
    0)   axwrapp-83   |   0.085 us    |        ax25_fwd_dev();
    0)   axwrapp-83   |   3.972 us    |      }
    0)   axwrapp-83   |   4.356 us    |    }
    0)   axwrapp-83   |   0.142 us    |    ax25_stop_t2timer();
    0)   axwrapp-83   |   0.145 us    |    ax25_stop_t3timer();
    0)   axwrapp-83   |   0.094 us    |    ax25_stop_idletimer();
    0)   axwrapp-83   |   0.093 us    |    ax25_calculate_t1();
    0)   axwrapp-83   |   0.227 us    |    ax25_start_t1timer();
    0)   axwrapp-83   |               |  /* netdev_put: dev=ax0 */
    ------------------------------------------
    1)    ax25d-77    =>   kworker-10  
    ------------------------------------------

    1)   kworker-10   |               |  ax25_kiss_rcv() {
    1)   kworker-10   |               |    ax25_rcv.isra.0() {
    1)   kworker-10   |   0.188 us    |      ax25_addr_parse();
    1)   kworker-10   |   0.089 us    |      ax25_addr_size();
    1)   kworker-10   |   0.177 us    |      ax25cmp();
    1)   kworker-10   |   0.101 us    |      ax25_digi_invert();
    1)   kworker-10   |               |      ax25_find_cb() {
    1)   kworker-10   |   0.092 us    |        ax25cmp();
    1)   kworker-10   |   0.085 us    |        ax25cmp();
    1)   kworker-10   |   0.731 us    |      }
    1)   kworker-10   |               |      ax25_std_frame_in() {
    1)   kworker-10   |   0.098 us    |        ax25_decode();
    1)   kworker-10   |               |        ax25_disconnect() {
    1)   kworker-10   |   0.311 us    |          ax25_stop_t1timer();
    1)   kworker-10   |   0.093 us    |          ax25_stop_t2timer();
    1)   kworker-10   |   0.093 us    |          ax25_stop_t3timer();
    1)   kworker-10   |   0.086 us    |          ax25_stop_idletimer();
    1)   kworker-10   |   0.415 us    |          ax25_link_failed();
    1)   kworker-10   |   1.778 us    |        }
    1)   kworker-10   |   0.090 us    |        ax25_kick();
    1)   kworker-10   |   2.523 us    |      }
    1)   kworker-10   | + 10.078 us   |    }
    1)   kworker-10   | + 12.054 us   |  }
    0)   axwrapp-83   | * 13227.13 us |  }
    ------------------------------------------
    0)   axwrapp-83   =>    <idle>-0   
    ------------------------------------------

    0)    <idle>-0    |               |  ax25_heartbeat_expiry() {
    0)    <idle>-0    |               |    ax25_std_heartbeat_expiry() {
    0)    <idle>-0    |               |      ax25_destroy_socket() {
    0)    <idle>-0    |   0.249 us    |        ax25_cb_del();
    0)    <idle>-0    |   0.107 us    |        ax25_stop_heartbeat();
    0)    <idle>-0    |   0.140 us    |        ax25_stop_t1timer();
    0)    <idle>-0    |   0.090 us    |        ax25_stop_t2timer();
    0)    <idle>-0    |   0.146 us    |        ax25_stop_t3timer();
    0)    <idle>-0    |   0.089 us    |        ax25_stop_idletimer();
    0)    <idle>-0    |   0.740 us    |        ax25_clear_queues();
    0)    <idle>-0    |   2.703 us    |      }
    0)    <idle>-0    |   0.901 us    |      ax25_free_sock();
    0)    <idle>-0    |   5.219 us    |    }
    0)    <idle>-0    |   7.220 us    |  }

Note the call to netdev_put() in ax25_release(), and note that there is
never a corresponding call to netdev_hold(), which is why we end up with
a refcount problem.

My original patch corrected this by adding the call to netdev_hold()
right next to the ax25_cb_add() in ax25_rcv(), which solves this
problem. If it seems weird to have this login in ax25_rcv, we could move
it to ax25_accept, right around line 1430 [3]; that would look
something like:

    diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
    index 8df2b00526e..e1ce1eeed7d 100644
    --- a/net/ax25/af_ax25.c
    +++ b/net/ax25/af_ax25.c
    @@ -1383,6 +1383,8 @@ static int ax25_accept(struct socket *sock, struct socket *newsock, int flags,
            DEFINE_WAIT(wait);
            struct sock *sk;
            int err = 0;
    +       ax25_cb *ax25;
    +       ax25_dev *ax25_dev;

            if (sock->state != SS_UNCONNECTED)
                    return -EINVAL;
    @@ -1436,6 +1438,10 @@ static int ax25_accept(struct socket *sock, struct socket *newsock, int flags,
            kfree_skb(skb);
            sk_acceptq_removed(sk);
            newsock->state = SS_CONNECTED;
    +       ax25 = sk_to_ax25(newsk);
    +       ax25_dev = ax25->ax25_dev;
    +       netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC);
    +       ax25_dev_hold(ax25_dev);

    out:
            release_sock(sk);

This has the advantage that now ax25_accept() mirrors the behavior of
ax25_bind() in terms of declaring a reference on the ax25 device, which
makes a certain amount of sense.

This seems to work out just as well as the previous patch; both
eliminiate the refcount imbalance. With the above diff in place, the
function graph looks like [4], with the call to netdev_hold() in
ax25_accept:

    # tracer: function_graph
    #
    # CPU  TASK/PID         DURATION                  FUNCTION CALLS
    # |     |    |           |   |                     |   |   |   |
    ------------------------------------------
    1)    <idle>-0    =>   kworker-29  
    ------------------------------------------

    1)   kworker-29   |               |  ax25_kiss_rcv() {
    1)   kworker-29   |               |    ax25_rcv.isra.0() {
    1)   kworker-29   |   0.152 us    |      ax25_addr_parse();
    1)   kworker-29   |   0.111 us    |      ax25_addr_size();
    1)   kworker-29   |   0.149 us    |      ax25cmp();
    1)   kworker-29   |   0.139 us    |      ax25_digi_invert();
    1)   kworker-29   |               |      ax25_find_cb() {
    1)   kworker-29   |   0.229 us    |        ax25cmp();
    1)   kworker-29   |   0.243 us    |        ax25cmp();
    1)   kworker-29   |   0.088 us    |        ax25cmp();
    1)   kworker-29   |   1.076 us    |      }
    1)   kworker-29   |               |      ax25_find_listener() {
    1)   kworker-29   |   0.092 us    |        ax25cmp();
    1)   kworker-29   |   0.085 us    |        ax25cmp();
    1)   kworker-29   |   0.632 us    |      }
    1)   kworker-29   |               |      ax25_make_new() {
    1)   kworker-29   |               |        ax25_create_cb() {
    1)   kworker-29   |   0.132 us    |          ax25_setup_timers();
    1)   kworker-29   |   0.547 us    |        }
    1)   kworker-29   |   1.965 us    |      }
    1)   kworker-29   |               |      ax25_send_control() {
    1)   kworker-29   |               |        ax25_transmit_buffer() {
    1)   kworker-29   |   0.090 us    |          ax25_addr_size();
    1)   kworker-29   |   0.089 us    |          ax25_addr_build();
    1)   kworker-29   |   0.084 us    |          ax25_fwd_dev();
    1)   kworker-29   |   5.682 us    |        }
    1)   kworker-29   |   6.037 us    |      }
    1)   kworker-29   |   0.100 us    |      ax25_cb_add();
    1)   kworker-29   |   0.449 us    |      ax25_start_heartbeat();
    1)   kworker-29   |   0.200 us    |      ax25_start_t3timer();
    1)   kworker-29   |   0.104 us    |      ax25_start_idletimer();
    1)   kworker-29   | + 16.757 us   |    }
    1)   kworker-29   | + 18.240 us   |  }
    ------------------------------------------
    0)   ax25ipd-63   =>    ax25d-77   
    ------------------------------------------

    0)    ax25d-77    |               |  ax25_accept() {
    0)    ax25d-77    |               |  /* netdev_hold: dev=ax0 */
    0)    ax25d-77    |   2.067 us    |  }
    0)    ax25d-77    |   0.140 us    |  ax25_getname();
    0)    ax25d-77    |   0.189 us    |  ax25_getname();
    ------------------------------------------
    1)   kworker-29   =>   axwrapp-82  
    ------------------------------------------

    1)   axwrapp-82   |               |  ax25_sendmsg() {
    1)   axwrapp-82   |               |    ax25_output() {
    1)   axwrapp-82   |               |      ax25_kick.part.0() {
    1)   axwrapp-82   |               |        ax25_send_iframe() {
    1)   axwrapp-82   |   0.343 us    |          ax25_start_idletimer();
    1)   axwrapp-82   |               |          ax25_transmit_buffer() {
    1)   axwrapp-82   |   0.119 us    |            ax25_addr_size();
    1)   axwrapp-82   |   0.118 us    |            ax25_addr_build();
    1)   axwrapp-82   |   0.156 us    |            ax25_fwd_dev();
    1)   axwrapp-82   |   9.965 us    |          }
    1)   axwrapp-82   | + 10.844 us   |        }
    1)   axwrapp-82   |   0.107 us    |        ax25_t1timer_running();
    1)   axwrapp-82   |   0.345 us    |        ax25_stop_t3timer();
    1)   axwrapp-82   |   0.112 us    |        ax25_calculate_t1();
    1)   axwrapp-82   |   0.499 us    |        ax25_start_t1timer();
    1)   axwrapp-82   | + 21.187 us   |      }
    1)   axwrapp-82   | + 21.620 us   |    }
    1)   axwrapp-82   | + 25.665 us   |  }
    ------------------------------------------
    1)   ax25ipd-63   =>   axwrapp-82  
    ------------------------------------------

    1)   axwrapp-82   |               |  ax25_sendmsg() {
    1)   axwrapp-82   |               |    ax25_output() {
    1)   axwrapp-82   |               |      ax25_kick.part.0() {
    1)   axwrapp-82   |               |        ax25_send_iframe() {
    1)   axwrapp-82   |   1.027 us    |          ax25_start_idletimer();
    1)   axwrapp-82   |               |          ax25_transmit_buffer() {
    1)   axwrapp-82   |   0.505 us    |            ax25_addr_size();
    1)   axwrapp-82   |   0.522 us    |            ax25_addr_build();
    1)   axwrapp-82   |   0.550 us    |            ax25_fwd_dev();
    1)   axwrapp-82   | + 32.110 us   |          }
    1)   axwrapp-82   | + 35.118 us   |        }
    1)   axwrapp-82   |   0.526 us    |        ax25_t1timer_running();
    1)   axwrapp-82   | + 38.770 us   |      }
    1)   axwrapp-82   | + 40.359 us   |    }
    1)   axwrapp-82   | + 50.019 us   |  }
    ------------------------------------------
    0)    ax25d-77    =>   kworker-10  
    ------------------------------------------

    0)   kworker-10   |               |  ax25_kiss_rcv() {
    0)   kworker-10   |               |    ax25_rcv.isra.0() {
    0)   kworker-10   |   0.632 us    |      ax25_addr_parse();
    0)   kworker-10   |   0.467 us    |      ax25_addr_size();
    0)   kworker-10   |   0.600 us    |      ax25cmp();
    0)   kworker-10   |   0.506 us    |      ax25_digi_invert();
    0)   kworker-10   |               |      ax25_find_cb() {
    0)   kworker-10   |   0.572 us    |        ax25cmp();
    0)   kworker-10   |   0.484 us    |        ax25cmp();
    0)   kworker-10   |   2.737 us    |      }
    0)   kworker-10   |               |      ax25_std_frame_in() {
    0)   kworker-10   |   0.569 us    |        ax25_decode();
    0)   kworker-10   |   0.607 us    |        ax25_validate_nr();
    0)   kworker-10   |               |        ax25_check_iframes_acked() {
    0)   kworker-10   |   4.171 us    |          ax25_frames_acked();
    0)   kworker-10   |               |          ax25_calculate_rtt() {
    0)   kworker-10   |   0.442 us    |            ax25_t1timer_running();
    0)   kworker-10   |   0.500 us    |            ax25_display_timer();
    0)   kworker-10   |   2.463 us    |          }
    0)   kworker-10   |   0.889 us    |          ax25_stop_t1timer();
    0)   kworker-10   |   1.559 us    |          ax25_start_t3timer();
    0)   kworker-10   | + 11.534 us   |        }
    0)   kworker-10   |               |        ax25_kick() {
    0)   kworker-10   |   0.480 us    |          ax25_kick.part.0();
    0)   kworker-10   |   1.467 us    |        }
    0)   kworker-10   | + 16.917 us   |      }
    0)   kworker-10   | + 27.062 us   |    }
    0)   kworker-10   | + 30.583 us   |  }
    1)   axwrapp-82   |               |  ax25_release() {
    1)   axwrapp-82   |   0.932 us    |    ax25_clear_queues();
    1)   axwrapp-82   |               |    ax25_send_control() {
    1)   axwrapp-82   |               |      ax25_transmit_buffer() {
    1)   axwrapp-82   |   0.516 us    |        ax25_addr_size();
    1)   axwrapp-82   |   0.484 us    |        ax25_addr_build();
    1)   axwrapp-82   |   0.451 us    |        ax25_fwd_dev();
    1)   axwrapp-82   | + 91.128 us   |      }
    1)   axwrapp-82   | + 94.596 us   |    }
    1)   axwrapp-82   |   1.554 us    |    ax25_stop_t2timer();
    1)   axwrapp-82   |   1.151 us    |    ax25_stop_t3timer();
    1)   axwrapp-82   |   0.758 us    |    ax25_stop_idletimer();
    1)   axwrapp-82   |   0.676 us    |    ax25_calculate_t1();
    1)   axwrapp-82   |   1.642 us    |    ax25_start_t1timer();
    1)   axwrapp-82   |               |  /* netdev_put: dev=ax0 */
    1)   axwrapp-82   | ! 123.696 us  |  }
    ------------------------------------------
    1)   axwrapp-82   =>   kworker-10  
    ------------------------------------------

    1)   kworker-10   |               |  ax25_kiss_rcv() {
    1)   kworker-10   |               |    ax25_rcv.isra.0() {
    1)   kworker-10   |   0.568 us    |      ax25_addr_parse();
    1)   kworker-10   |   0.428 us    |      ax25_addr_size();
    1)   kworker-10   |   0.504 us    |      ax25cmp();
    1)   kworker-10   |   0.453 us    |      ax25_digi_invert();
    1)   kworker-10   |               |      ax25_find_cb() {
    1)   kworker-10   |   0.425 us    |        ax25cmp();
    1)   kworker-10   |   0.429 us    |        ax25cmp();
    1)   kworker-10   |   2.511 us    |      }
    1)   kworker-10   |               |      ax25_std_frame_in() {
    1)   kworker-10   |   0.518 us    |        ax25_decode();
    1)   kworker-10   |               |        ax25_disconnect() {
    1)   kworker-10   |   0.840 us    |          ax25_stop_t1timer();
    1)   kworker-10   |   0.459 us    |          ax25_stop_t2timer();
    1)   kworker-10   |   0.444 us    |          ax25_stop_t3timer();
    1)   kworker-10   |   0.413 us    |          ax25_stop_idletimer();
    1)   kworker-10   |   1.123 us    |          ax25_link_failed();
    1)   kworker-10   |   6.202 us    |        }
    1)   kworker-10   |   0.428 us    |        ax25_kick();
    1)   kworker-10   |   9.214 us    |      }
    1)   kworker-10   | + 18.606 us   |    }
    1)   kworker-10   | + 21.675 us   |  }
    ------------------------------------------
    0)   kworker-10   =>    <idle>-0   
    ------------------------------------------

    0)    <idle>-0    |               |  ax25_heartbeat_expiry() {
    0)    <idle>-0    |               |    ax25_std_heartbeat_expiry() {
    0)    <idle>-0    |               |      ax25_destroy_socket() {
    0)    <idle>-0    |   0.202 us    |        ax25_cb_del();
    0)    <idle>-0    |   0.116 us    |        ax25_stop_heartbeat();
    0)    <idle>-0    |   0.100 us    |        ax25_stop_t1timer();
    0)    <idle>-0    |   0.101 us    |        ax25_stop_t2timer();
    0)    <idle>-0    |   0.158 us    |        ax25_stop_t3timer();
    0)    <idle>-0    |   0.101 us    |        ax25_stop_idletimer();
    0)    <idle>-0    |   0.321 us    |        ax25_clear_queues();
    0)    <idle>-0    |   2.483 us    |      }
    0)    <idle>-0    |   0.531 us    |      ax25_free_sock();
    0)    <idle>-0    |   4.912 us    |    }
    0)    <idle>-0    |   6.545 us    |  }

[1]: https://gist.github.com/larsks/b658c0bd766648b16c31c8ed0fc1dc1f#file-0001-trace-netdev_hold-and-netdev_put-patch
[2]: https://gist.github.com/larsks/b658c0bd766648b16c31c8ed0fc1dc1f#file-trace1-inbound-txt
[3]: https://github.com/torvalds/linux/blob/7367539ad4b0f8f9b396baf02110962333719a48/net/ax25/af_ax25.c#L1430
[4]: https://gist.github.com/larsks/b658c0bd766648b16c31c8ed0fc1dc1f#file-trace2-inbound-txt
Lars Kellogg-Stedman May 7, 2024, 3:18 a.m. UTC | #12
On Sat, May 04, 2024 at 06:16:14PM GMT, Lars Kellogg-Stedman wrote:
> My original patch corrected this by adding the call to netdev_hold()
> right next to the ax25_cb_add() in ax25_rcv(), which solves this
> problem. If it seems weird to have this login in ax25_rcv, we could move
> it to ax25_accept, right around line 1430 [3]; that would look
> something like:

The same patch applies cleanly against the Raspberry Pi 6.6.30 kernel,
and clears up the frequeny crashes I was experiencing in that
environment as well.
Dan Carpenter May 7, 2024, 6:38 a.m. UTC | #13
On Fri, May 03, 2024 at 11:36:37PM +0300, Dan Carpenter wrote:
> Could you test this diff?
> 
> regards,
> dan carpenter
> 
> diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
> index 558e158c98d0..a7f96a4ceff4 100644
> --- a/net/ax25/af_ax25.c
> +++ b/net/ax25/af_ax25.c
> @@ -1129,8 +1129,10 @@ static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
>  	/*
>  	 * User already set interface with SO_BINDTODEVICE
>  	 */
> -	if (ax25->ax25_dev != NULL)
> +	if (ax25->ax25_dev != NULL) {
> +		ax25_dev_hold(ax25->ax25_dev);
>  		goto done;
> +	}
>  
>  	if (addr_len > sizeof(struct sockaddr_ax25) && addr->fsa_ax25.sax25_ndigis == 1) {
>  		if (ax25cmp(&addr->fsa_digipeater[0], &null_ax25_address) != 0 &&

This commit is wrong.

regards,
dan carpenter
Dan Carpenter May 7, 2024, 8:08 a.m. UTC | #14
On Mon, May 06, 2024 at 11:18:06PM -0400, Lars Kellogg-Stedman wrote:
> On Sat, May 04, 2024 at 06:16:14PM GMT, Lars Kellogg-Stedman wrote:
> > My original patch corrected this by adding the call to netdev_hold()
> > right next to the ax25_cb_add() in ax25_rcv(), which solves this
> > problem. If it seems weird to have this login in ax25_rcv, we could move
> > it to ax25_accept, right around line 1430 [3]; that would look
> > something like:
> 
> The same patch applies cleanly against the Raspberry Pi 6.6.30 kernel,
> and clears up the frequeny crashes I was experiencing in that
> environment as well.

I have reviewed this code some more.  My theory is:

ax25_dev_device_up() <- sets refcount to 1
ax25_dev_device_down() <- set refcount to 0 and frees it

If the refcount is not 1 at ax25_dev_device_down() then something is
screwed up.  So why do we even need more refcounting than that?  But
apparently we do.  I don't really understand networking that well so
maybe we can have lingering connections after the device is down.

So the next rule is if we set ax25->ax25_dev from NULL to non-NULL then
bump the refcount and decrement it if we set it back to NULL or we free
ax25. Right now that's happening in ax25_bind() and ax25_release().  And
also in ax25_kill_by_device() but not consistently.

But it needs to happen *everywhere* we set ax25->ax25_dev and we need to
decrement it when ax25 is freed in ax25_cb_put().  My patch is similar
to yours in that every ax25_rcv() will now bump the reference through
calling ax25_fillin_cb() or ax25_make_new().  The send path also bumps
the reference.

There are a few questions I don't know how to answer.  I added two
-EBUSY paths to this patch.  I'm not sure if this is correct.
Second, I don't understand the netdev_put(ax25_dev->dev, &s->dev_tracker);
stuff.  Maybe that should be done in ax25_dev_hold/put().

This patch might not work because of the netdev_hold/put() thing...

I used the Smatch cross function database to show where ax25->ax25_dev
is set to NULL/non-NULL.

$ smdb.py where ax25_cb ax25_dev | grep -v "min-max"
net/ax25/ax25_route.c          | ax25_rt_autobind               | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/af_ax25.c             | ax25_kill_by_device            | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/af_ax25.c             | ax25_fillin_cb                 | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/af_ax25.c             | ax25_setsockopt                | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/af_ax25.c             | ax25_make_new                  | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/af_ax25.c             | ax25_bind                      | (struct ax25_cb)->ax25_dev | 4096-ptr_max
net/ax25/ax25_in.c             | ax25_rcv                       | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/ax25_out.c            | ax25_send_frame                | (struct ax25_cb)->ax25_dev | 0-u64max

regards,
dan carpenter

diff --git a/include/net/ax25.h b/include/net/ax25.h
index eb9cee8252c8..2cc94352b13d 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -275,25 +275,30 @@ static inline struct ax25_cb *sk_to_ax25(const struct sock *sk)
 #define ax25_cb_hold(__ax25) \
 	refcount_inc(&((__ax25)->refcount))
 
-static __inline__ void ax25_cb_put(ax25_cb *ax25)
+static inline ax25_dev *ax25_dev_hold(ax25_dev *ax25_dev)
 {
-	if (refcount_dec_and_test(&ax25->refcount)) {
-		kfree(ax25->digipeat);
-		kfree(ax25);
-	}
+	if (ax25_dev)
+		refcount_inc(&ax25_dev->refcount);
+	return ax25_dev;
 }
 
-static inline void ax25_dev_hold(ax25_dev *ax25_dev)
+static inline void ax25_dev_put(ax25_dev *ax25_dev)
 {
-	refcount_inc(&ax25_dev->refcount);
+	if (ax25_dev && refcount_dec_and_test(&ax25_dev->refcount)) {
+		kfree(ax25_dev);
+	}
 }
 
-static inline void ax25_dev_put(ax25_dev *ax25_dev)
+static __inline__ void ax25_cb_put(ax25_cb *ax25)
 {
-	if (refcount_dec_and_test(&ax25_dev->refcount)) {
-		kfree(ax25_dev);
+	if (refcount_dec_and_test(&ax25->refcount)) {
+		if (ax25->ax25_dev)
+			ax25_dev_put(ax25->ax25_dev);
+		kfree(ax25->digipeat);
+		kfree(ax25);
 	}
 }
+
 static inline __be16 ax25_type_trans(struct sk_buff *skb, struct net_device *dev)
 {
 	skb->dev      = dev;
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 9169efb2f43a..4d1ab296d52c 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -92,6 +92,7 @@ static void ax25_kill_by_device(struct net_device *dev)
 				spin_unlock_bh(&ax25_list_lock);
 				ax25_disconnect(s, ENETUNREACH);
 				s->ax25_dev = NULL;
+				ax25_dev_put(ax25_dev);
 				ax25_cb_del(s);
 				spin_lock_bh(&ax25_list_lock);
 				goto again;
@@ -101,11 +102,8 @@ static void ax25_kill_by_device(struct net_device *dev)
 			lock_sock(sk);
 			ax25_disconnect(s, ENETUNREACH);
 			s->ax25_dev = NULL;
-			if (sk->sk_socket) {
-				netdev_put(ax25_dev->dev,
-					   &s->dev_tracker);
-				ax25_dev_put(ax25_dev);
-			}
+			netdev_put(ax25_dev->dev, &s->dev_tracker);
+			ax25_dev_put(ax25_dev);
 			ax25_cb_del(s);
 			release_sock(sk);
 			spin_lock_bh(&ax25_list_lock);
@@ -496,6 +494,7 @@ void ax25_fillin_cb(ax25_cb *ax25, ax25_dev *ax25_dev)
 	ax25->ax25_dev = ax25_dev;
 
 	if (ax25->ax25_dev != NULL) {
+		ax25_dev_hold(ax25->ax25_dev);
 		ax25_fillin_cb_from_dev(ax25, ax25_dev);
 		return;
 	}
@@ -685,6 +684,11 @@ static int ax25_setsockopt(struct socket *sock, int level, int optname,
 			break;
 		}
 
+		if (ax25->ax25_dev) {
+			rtnl_unlock();
+			res = -EBUSY;
+			break;
+		}
 		ax25->ax25_dev = ax25_dev_ax25dev(dev);
 		if (!ax25->ax25_dev) {
 			rtnl_unlock();
@@ -961,7 +965,7 @@ struct sock *ax25_make_new(struct sock *osk, struct ax25_dev *ax25_dev)
 	ax25->paclen  = oax25->paclen;
 	ax25->window  = oax25->window;
 
-	ax25->ax25_dev    = ax25_dev;
+	ax25->ax25_dev    = ax25_dev_hold(ax25_dev);
 	ax25->source_addr = oax25->source_addr;
 
 	if (oax25->digipeat != NULL) {
@@ -995,6 +999,11 @@ static int ax25_release(struct socket *sock)
 	sock_orphan(sk);
 	ax25 = sk_to_ax25(sk);
 	ax25_dev = ax25->ax25_dev;
+	/*
+	 * The ax25_destroy_socket() function decrements the reference but we
+	 * need to keep a reference until the end of the function.
+	 */
+	ax25_dev_hold(ax25_dev);
 
 	if (sk->sk_type == SOCK_SEQPACKET) {
 		switch (ax25->state) {
@@ -1147,6 +1156,12 @@ static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 
 	if (ax25_dev) {
 		ax25_fillin_cb(ax25, ax25_dev);
+		/*
+		 * both ax25_addr_ax25dev() and ax25_fillin_cb() take a
+		 * reference but we only want to take one reference so drop
+		 * the extra reference.
+		 */
+		ax25_dev_put(ax25_dev);
 		netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC);
 	}
 
diff --git a/net/ax25/ax25_route.c b/net/ax25/ax25_route.c
index b7c4d656a94b..d7f6d9f4f20c 100644
--- a/net/ax25/ax25_route.c
+++ b/net/ax25/ax25_route.c
@@ -406,6 +406,10 @@ int ax25_rt_autobind(ax25_cb *ax25, ax25_address *addr)
 		ax25_route_lock_unuse();
 		return -EHOSTUNREACH;
 	}
+	if (ax25->ax25_dev) {
+		err = -EBUSY;
+		goto put;
+	}
 	if ((ax25->ax25_dev = ax25_dev_ax25dev(ax25_rt->dev)) == NULL) {
 		err = -EHOSTUNREACH;
 		goto put;
Duoming Zhou May 7, 2024, 9:04 a.m. UTC | #15
On Tue, 7 May 2024 11:08:14 +0300 Dan Carpenter wrote:
> I have reviewed this code some more.  My theory is:
> 
> ax25_dev_device_up() <- sets refcount to 1
> ax25_dev_device_down() <- set refcount to 0 and frees it
> 
> If the refcount is not 1 at ax25_dev_device_down() then something is
> screwed up.  So why do we even need more refcounting than that?  But
> apparently we do.  I don't really understand networking that well so
> maybe we can have lingering connections after the device is down.

We do need more reference count. Because there is a race condition 
between ax25_bind() and the cleanup routine.

The cleanup routine is consisted of three parts: ax25_kill_by_device(),
ax25_rt_device_down() and ax25_dev_device_down(). The ax25_kill_by_device()
is used to cleanup the connections and the ax25_dev_device_down() is used
to cleanup the device. If we call ax25_bind() and ax25_connect() between
the window of ax25_kill_by_device() and ax25_dev_device_down(), the ax25_dev
is freed in ax25_dev_device_down(). When we call ax25_release() to release
the connections, the UAF bugs will happen. 

Best regards,
Duoming Zhou
Duoming Zhou May 9, 2024, 1:40 a.m. UTC | #16
On Tue, 7 May 2024 11:08:14 +0300 Dan Carpenter wrote:
> diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
> index 9169efb2f43a..4d1ab296d52c 100644
> --- a/net/ax25/af_ax25.c
> +++ b/net/ax25/af_ax25.c
> @@ -92,6 +92,7 @@ static void ax25_kill_by_device(struct net_device *dev)
>  				spin_unlock_bh(&ax25_list_lock);
>  				ax25_disconnect(s, ENETUNREACH);
>  				s->ax25_dev = NULL;
> +				ax25_dev_put(ax25_dev);
>  				ax25_cb_del(s);
>  				spin_lock_bh(&ax25_list_lock);
>  				goto again;
> @@ -101,11 +102,8 @@ static void ax25_kill_by_device(struct net_device *dev)
>  			lock_sock(sk);
>  			ax25_disconnect(s, ENETUNREACH);
>  			s->ax25_dev = NULL;
> -			if (sk->sk_socket) {
> -				netdev_put(ax25_dev->dev,
> -					   &s->dev_tracker);
> -				ax25_dev_put(ax25_dev);
> -			}
> +			netdev_put(ax25_dev->dev, &s->dev_tracker);
> +			ax25_dev_put(ax25_dev);

We should not decrease the refcount without checking "if (sk->sk_socket)", because 
there is a race condition between ax25_kill_by_device() and ax25_release(), if we
decrease the refcount in ax25_release(), we should not decrease it here, otherwise 
the refcount underflow will happen.

Best regards,
Duoming Zhou
Duoming Zhou May 15, 2024, 9:52 a.m. UTC | #17
On Wed, 1 May 2024 21:29:16 -0400 Lars Kellogg-Stedman wrote:
> Assume we have the following two interfaces configured on a system:
> 
>     $ cat /etc/ax25/axports
>     udp0 test0-0 9600 255 2 axudp0
>     udp1 test0-1 9600 255 2 axudp1
> 
> And we have ax25d listening on both interfaces:
> 
>     [udp0]
>     default  * * * * * *  - root  /usr/sbin/axwrapper axwrapper -- /bin/sh sh /etc/ax25/example-output.sh
>     [udp1]
>     default  * * * * * *  - root  /usr/sbin/axwrapper axwrapper -- /bin/sh sh /etc/ax25/example-output.sh
> 
> Using the 'ax-devs' and 'ax-sockets' gdb commands shown at the end of
> this message, we start with:
> 
>     (gdb) ax-devs
>     ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
>     ax0 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
>     (gdb) ax-sockets
>     0xffff8881002b6800 if:ax1 state:0 refcnt:2 dev_tracker:0xffff888100ded200
>     0xffff888101ac4e00 if:ax0 state:0 refcnt:2 dev_tracker:0xffff888100dec4c0
> 
> We initiate a connection from ax0 to ax1:
> 
>     call -r udp0 test0-1
> 
> When we first enter ax25_rcv, we have:
> 
>     (gdb) ax-devs
>     ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
>     ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
>     (gdb) ax-sockets
>     0xffff888101ac8000 if:ax0 state:1 refcnt:2 dev_tracker:0xffff888100dedb80
>     0xffff8881002b6800 if:ax1 state:0 refcnt:2 dev_tracker:0xffff888100ded200
>     0xffff888101ac4e00 if:ax0 state:0 refcnt:2 dev_tracker:0xffff888100dec4c0
> 
> After we reach line 413 (in net/ax25/ax25_in.c) and add a new control
> block:
> 
>     ax25_cb_add(ax25)
> 
> We have:
> 
>     (gdb) ax-devs
>     ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
>     ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
>     (gdb) ax-sockets
>     0xffff88810245ac00 if:ax1 state:3 refcnt:2 dev_tracker:0x0 <fixed_percpu_data>
>     0xffff88810245ba00 if:ax0 state:1 refcnt:2 dev_tracker:0xffff88810136c800
>     0xffff888100c79e00 if:ax1 state:0 refcnt:2 dev_tracker:0xffff88810136c6e0
>     0xffff8881018e9800 if:ax0 state:0 refcnt:2 dev_tracker:0xffff88810170c860
> 
> Note that (a) ax25->dev_tracker is NULL, and (b) we have incremeted the
> refcount on ax0 (the source interface), but not on ax1 (the destination
> interface). When we call ax25_release for this control block, we get to:
> 
>     netdev_put(ax25_dev->dev, &ax25->dev_tracker);
>     ax25_dev_put(ax25_dev);
> 
> With:
> 
>     (gdb) ax-devs
>     ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
>     ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
> 
> After the calls to netdev_put() and ax25_dev_put(), we have:
> 
>     (gdb) ax-devs
>     ax1 ax_refcnt:1 dev_refcnt:8 dev_untracked:-1073741824 dev_notrack:1
>     ax0 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
> 
> You can see that (a) ax25_dev->dev->refcnt_tracker->untracked is now
> invalid, and ax25_dev->dev->dev_refcnt is in trouble: it decrements by
> one for each closed connection, even though it was never incremented
> when we accepted the connection. The underflow in
> ...refcnt_tracker->untracked yields the traceback with:
> 
>     refcount_t: decrement hit 0; leaking memory.
> 
> Additional connections will eventually trigger more problems; we will
> ultimately underflow ax25_dev->dev->dev_refcnt, but we may also run into
> memory corruption because of the invalid tracker data, resulting in:
> 
>     BUG: unable to handle page fault for address: 00000010000003b0

Do you know how to trigger this bug? Could you share the POC?

Best regards,
Duoming Zhou
diff mbox series

Patch

diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index 282ec581c07..0e6dd98d3fa 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -37,8 +37,9 @@  ax25_dev *ax25_addr_ax25dev(ax25_address *addr)
 	for (ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next)
 		if (ax25cmp(addr, (const ax25_address *)ax25_dev->dev->dev_addr) == 0) {
 			res = ax25_dev;
-			ax25_dev_hold(ax25_dev);
 		}
+	if (res)
+		ax25_dev_hold(res);
 	spin_unlock_bh(&ax25_dev_lock);
 
 	return res;
@@ -58,7 +59,6 @@  void ax25_dev_device_up(struct net_device *dev)
 		return;
 	}
 
-	refcount_set(&ax25_dev->refcount, 1);
 	dev->ax25_ptr     = ax25_dev;
 	ax25_dev->dev     = dev;
 	netdev_hold(dev, &ax25_dev->dev_tracker, GFP_KERNEL);
@@ -88,7 +88,7 @@  void ax25_dev_device_up(struct net_device *dev)
 	ax25_dev->next = ax25_dev_list;
 	ax25_dev_list  = ax25_dev;
 	spin_unlock_bh(&ax25_dev_lock);
-	ax25_dev_hold(ax25_dev);
+	refcount_set(&ax25_dev->refcount, 1);
 
 	ax25_register_dev_sysctl(ax25_dev);
 }
@@ -135,7 +135,6 @@  void ax25_dev_device_down(struct net_device *dev)
 
 unlock_put:
 	spin_unlock_bh(&ax25_dev_lock);
-	ax25_dev_put(ax25_dev);
 	dev->ax25_ptr = NULL;
 	netdev_put(dev, &ax25_dev->dev_tracker);
 	ax25_dev_put(ax25_dev);
@@ -208,7 +207,7 @@  void __exit ax25_dev_free(void)
 		s        = ax25_dev;
 		netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker);
 		ax25_dev = ax25_dev->next;
-		kfree(s);
+		ax25_dev_put(s);
 	}
 	ax25_dev_list = NULL;
 	spin_unlock_bh(&ax25_dev_lock);