get_subdir: do not drop new subdir if returning it
diff mbox series

Message ID 20200531112444.GA25164@noodle
State New
Headers show
Series
  • get_subdir: do not drop new subdir if returning it
Related show

Commit Message

Boris Sukholitko May 31, 2020, 11:24 a.m. UTC
In testing of our kernel (based on 4.19, tainted, sorry!) on our aarch64 based hardware
we've come upon the following oops (lightly edited to omit irrelevant details):

000:50:01.133 Unable to handle kernel paging request at virtual address 0000000000007a12
000:50:02.209 Process brctl (pid: 14467, stack limit = 0x00000000bcf7a578)
000:50:02.209 CPU: 1 PID: 14467 Comm: brctl Tainted: P                  4.19.122 #1
000:50:02.209 Hardware name: Broadcom-v8A (DT)
000:50:02.209 pstate: 60000005 (nZCv daif -PAN -UAO)
000:50:02.209 pc : unregister_sysctl_table+0x1c/0xa0
000:50:02.209 lr : unregister_net_sysctl_table+0xc/0x20
000:50:02.209 sp : ffffff800e5ab9e0
000:50:02.209 x29: ffffff800e5ab9e0 x28: ffffffc016439ec0
000:50:02.209 x27: 0000000000000000 x26: ffffff8008804078
000:50:02.209 x25: ffffff80087b4dd8 x24: ffffffc015d65000
000:50:02.209 x23: ffffffc01f0d6010 x22: ffffffc01f0d6000
000:50:02.209 x21: ffffffc0166c4eb0 x20: 00000000000000bd
000:50:02.209 x19: ffffffc01f0d6030 x18: 0000000000000400
000:50:02.256 x17: 0000000000000000 x16: 0000000000000000
000:50:02.256 x15: 0000000000000400 x14: 0000000000000129
000:50:02.256 x13: 0000000000000001 x12: 0000000000000030
000:50:02.256 x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
000:50:02.256 x9 : feff646663687161 x8 : ffffffffffffffff
000:50:02.256 x7 : fefefefefefefefe x6 : 0000000000008080
000:50:02.256 x5 : 00000000ffffffff x4 : ffffff8008905c38
000:50:02.256 x3 : ffffffc01f0d602c x2 : 00000000000000bd
000:50:02.256 x1 : ffffffc01f0d60c0 x0 : 0000000000007a12
000:50:02.256 Call trace:
000:50:02.256  unregister_sysctl_table+0x1c/0xa0
000:50:02.256  unregister_net_sysctl_table+0xc/0x20
000:50:02.256  __devinet_sysctl_unregister.isra.0+0x2c/0x60
000:50:02.256  inetdev_event+0x198/0x510
000:50:02.256  notifier_call_chain+0x58/0xa0
000:50:02.303  raw_notifier_call_chain+0x14/0x20
000:50:02.303  call_netdevice_notifiers_info+0x34/0x80
000:50:02.303  rollback_registered_many+0x384/0x600
000:50:02.303  unregister_netdevice_queue+0x8c/0x110
000:50:02.303  br_dev_delete+0x8c/0xa0
000:50:02.303  br_del_bridge+0x44/0x70
000:50:02.303  br_ioctl_deviceless_stub+0xcc/0x310
000:50:02.303  sock_ioctl+0x194/0x3f0
000:50:02.303  compat_sock_ioctl+0x678/0xc00
000:50:02.303  __arm64_compat_sys_ioctl+0xf0/0xcb0
000:50:02.303  el0_svc_common+0x70/0x170
000:50:02.303  el0_svc_compat_handler+0x1c/0x30
000:50:02.303  el0_svc_compat+0x8/0x18
000:50:02.303 Code: a90153f3 aa0003f3 f9401000 b40000c0 (f9400001)

The crash is in the call to count_subheaders(header->ctl_table_arg).

Although the header (being in x19 == 0xffffffc01f0d6030) looks like a
normal kernel pointer, ctl_table_arg (x0 == 0x0000000000007a12) looks
invalid.

Trying to find the issue, we've started tracing header allocation being
done by kzalloc in __register_sysctl_table and header freeing being done
in drop_sysctl_table.

Then we've noticed headers being freed which where not allocated before.
The faulty freeing was done on parent->header at the end of
drop_sysctl_table.

The invariant on __register_sysctl_table seems to be that non-empty
parent dir should have its header.nreg > 1. By failing this invariant
(see WARN_ON hunk in the patch) we've come upon the conclusion that
something is fishy with nreg counting.

The root cause seems to be dropping new subdir in get_subdir function.
Here are the new subdir adventures leading to the invariant failure
above:

1. new subdir comes to being with nreg == 1
2. the nreg is being incremented in the found clause, nreg == 2
3. nreg is being decremented by the if (new) drop, nreg == 1
4. coming out of get_subdir, insert_header increments nreg: nreg == 2
5. nreg is being decremented at the end of __register_sysctl_table

The fix seems to be avoiding step 3 if new subdir is the one being
returned. The patch does this and also adds warning for the nreg
invariant.
---
 fs/proc/proc_sysctl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Eric W. Biederman June 1, 2020, 3:01 p.m. UTC | #1
Boris Sukholitko <boris.sukholitko@broadcom.com> writes:

> In testing of our kernel (based on 4.19, tainted, sorry!) on our aarch64 based hardware
> we've come upon the following oops (lightly edited to omit irrelevant
> details):


I don't doubt you are seeing problems.

I need to refresh my knowledge of that code to know for certain but I am
not yet convinced this is the problem.

I don't see any reason that the assertion your code makes would
necessarily be a problem.

Why can't a directory only have a single entry?

I am not saying you are wrong.  Just that I have not looked enough to
verify there is an invariant being violated there.


A very confusiong part of this situation is the fact he code has been
stable for quite a while.  I would have suspected someone's fuzzer to
trigger problems on something other than aarch64.  Especially if
registering and unregistering a network device is enough to cause this.
As that can be performed as non-root.

Eric

> The crash is in the call to count_subheaders(header->ctl_table_arg).
>
> Although the header (being in x19 == 0xffffffc01f0d6030) looks like a
> normal kernel pointer, ctl_table_arg (x0 == 0x0000000000007a12) looks
> invalid.
>
> Trying to find the issue, we've started tracing header allocation being
> done by kzalloc in __register_sysctl_table and header freeing being done
> in drop_sysctl_table.
>
> Then we've noticed headers being freed which where not allocated before.
> The faulty freeing was done on parent->header at the end of
> drop_sysctl_table.
>
> The invariant on __register_sysctl_table seems to be that non-empty
> parent dir should have its header.nreg > 1. By failing this invariant
> (see WARN_ON hunk in the patch) we've come upon the conclusion that
> something is fishy with nreg counting.
>
> The root cause seems to be dropping new subdir in get_subdir function.
> Here are the new subdir adventures leading to the invariant failure
> above:
>
> 1. new subdir comes to being with nreg == 1
> 2. the nreg is being incremented in the found clause, nreg == 2
> 3. nreg is being decremented by the if (new) drop, nreg == 1
> 4. coming out of get_subdir, insert_header increments nreg: nreg == 2
> 5. nreg is being decremented at the end of __register_sysctl_table
>
> The fix seems to be avoiding step 3 if new subdir is the one being
> returned. The patch does this and also adds warning for the nreg
> invariant.
> ---
>  fs/proc/proc_sysctl.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index b6f5d459b087..12fa5803dcb3 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -1010,7 +1010,7 @@ static struct ctl_dir *get_subdir(struct ctl_dir *dir,
>  			namelen, namelen, name, PTR_ERR(subdir));
>  	}
>  	drop_sysctl_table(&dir->header);
> -	if (new)
> +	if (new && new != subdir)
>  		drop_sysctl_table(&new->header);
>  	spin_unlock(&sysctl_lock);
>  	return subdir;
> @@ -1334,6 +1334,7 @@ struct ctl_table_header *__register_sysctl_table(
>  		goto fail_put_dir_locked;
>  
>  	drop_sysctl_table(&dir->header);
> +	WARN_ON(dir->header.nreg < 2);
>  	spin_unlock(&sysctl_lock);
>  
>  	return header;

Patch
diff mbox series

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index b6f5d459b087..12fa5803dcb3 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -1010,7 +1010,7 @@  static struct ctl_dir *get_subdir(struct ctl_dir *dir,
 			namelen, namelen, name, PTR_ERR(subdir));
 	}
 	drop_sysctl_table(&dir->header);
-	if (new)
+	if (new && new != subdir)
 		drop_sysctl_table(&new->header);
 	spin_unlock(&sysctl_lock);
 	return subdir;
@@ -1334,6 +1334,7 @@  struct ctl_table_header *__register_sysctl_table(
 		goto fail_put_dir_locked;
 
 	drop_sysctl_table(&dir->header);
+	WARN_ON(dir->header.nreg < 2);
 	spin_unlock(&sysctl_lock);
 
 	return header;