diff mbox

slab corruption with current -git (was Re: [git pull] vfs pile 1 (splice))

Message ID CA+55aFzcweyWCqq-fRAnJkFB7neKwKsXH=TMcLtRQRs2Wn6XEQ@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Linus Torvalds Oct. 11, 2016, 5:39 a.m. UTC
On Sun, Oct 9, 2016 at 8:41 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> This COMPLETELY UNTESTED patch tries to fix the nf_hook_entry code to do this.
>
> I repeat: it's ENTIRELY UNTESTED.

Gaah.

That patch was subtle garbage.

The "add to list" thing did this:

        rcu_assign_pointer(entry->next, p);
        rcu_assign_pointer(*pp, p);

which is not so subtly broken - that second assignment just assigns
"p" to "*pp", but that was what *pp already contained. Too much
cut-and-paste.

That also explains why I then get the NOT FOUND case, because the add
never actually worked.

It *should* be

        rcu_assign_pointer(entry->next, p);
        rcu_assign_pointer(*pp, entry);

and then the warnings about "not found" are gone.

Duh.

I guess I will have to double-check that the slub corruption is gone
still with that fixed.

Anyway, new version of the patch (just that one line changed) attached.

                Linus
net/netfilter/core.c | 108 ++++++++++++++++-----------------------------------
 1 file changed, 33 insertions(+), 75 deletions(-)

Comments

Linus Torvalds Oct. 11, 2016, 5:47 a.m. UTC | #1
On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I guess I will have to double-check that the slub corruption is gone
> still with that fixed.

So I'm not getting any warnings now from SLUB debugging. So the
original bug seems to not have re-surfaced, and the registration bug
is gone, so now the unregistration doesn't warn about anything either.

But I only rebooted three times.

           Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 11, 2016, 8:57 a.m. UTC | #2
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 10 Oct 2016 22:47:50 -0700

> On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I guess I will have to double-check that the slub corruption is gone
>> still with that fixed.
> 
> So I'm not getting any warnings now from SLUB debugging. So the
> original bug seems to not have re-surfaced, and the registration bug
> is gone, so now the unregistration doesn't warn about anything either.
> 
> But I only rebooted three times.

Looks good to me, I applied it to my tree with your signoff and will
send you a pull request right now.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Markus Trippelsdorf Oct. 13, 2016, 6:02 a.m. UTC | #3
On 2016.10.11 at 04:57 -0400, David Miller wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Mon, 10 Oct 2016 22:47:50 -0700
> 
> > On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> I guess I will have to double-check that the slub corruption is gone
> >> still with that fixed.
> > 
> > So I'm not getting any warnings now from SLUB debugging. So the
> > original bug seems to not have re-surfaced, and the registration bug
> > is gone, so now the unregistration doesn't warn about anything either.
> > 
> > But I only rebooted three times.
> 
> Looks good to me, I applied it to my tree with your signoff and will
> send you a pull request right now.

I'm still seeing:

nf_conntrack version 0.5.0 (4096 buckets, 16384 max)
ctnetlink v0.93: registering with nfnetlink.
ip_tables: (C) 2000-2006 Netfilter Core Team
WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88001e605480)
4055601e0088ffff000000000000000090686d81ffffffff0000000000000000
 u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
 ^
RIP: 0010:[<ffffffff8166e561>]  [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160
RSP: 0018:ffffc9000000bcc0  EFLAGS: 00010286
RAX: ffff88001e5af9c0 RBX: ffff88001e605480 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88001e5b0a20
RBP: ffffc9000000bcd8 R08: 000000001fd0e000 R09: 0000000000000000
R10: ffff88001e5b09c0 R11: 0000000000000067 R12: ffff88001e5af9c0
R13: ffffffff81c5c0c8 R14: 0000000000000003 R15: ffff88001e605480
FS:  0000000000000000(0000) GS:ffff88001fa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88001f45ca18 CR3: 0000000001c07000 CR4: 00000000000006f0
 [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160
 [<ffffffff8166eaaf>] nf_register_net_hooks+0x3f/0xa0
 [<ffffffff816d6715>] ipt_register_table+0xe5/0x110
 [<ffffffff816d6815>] iptable_filter_table_init.part.1+0x55/0x80
 [<ffffffff816d688b>] iptable_filter_net_init+0x2b/0x30
 [<ffffffff8163edd7>] ops_init+0x47/0x150
 [<ffffffff8163f0c6>] register_pernet_operations+0xd6/0x170
 [<ffffffff8163fb77>] register_pernet_subsys+0x27/0x40
 [<ffffffff81cb9de3>] iptable_filter_init+0x33/0x4b
 [<ffffffff81c8bef0>] do_one_initcall+0x8b/0x113
 [<ffffffff81c8c091>] kernel_init_freeable+0x119/0x1a1
 [<ffffffff816efd09>] kernel_init+0x9/0x100
 [<ffffffff816f4e12>] ret_from_fork+0x22/0x30
 [<ffffffffffffffff>] 0xffffffffffffffff
NET: Registered protocol family 17
9pnet: Installing 9P2000 support
Markus Trippelsdorf Oct. 13, 2016, 6:06 a.m. UTC | #4
On 2016.10.13 at 08:02 +0200, Markus Trippelsdorf wrote:
> On 2016.10.11 at 04:57 -0400, David Miller wrote:
> > From: Linus Torvalds <torvalds@linux-foundation.org>
> > Date: Mon, 10 Oct 2016 22:47:50 -0700
> > 
> > > On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > >>
> > >> I guess I will have to double-check that the slub corruption is gone
> > >> still with that fixed.
> > > 
> > > So I'm not getting any warnings now from SLUB debugging. So the
> > > original bug seems to not have re-surfaced, and the registration bug
> > > is gone, so now the unregistration doesn't warn about anything either.
> > > 
> > > But I only rebooted three times.
> > 
> > Looks good to me, I applied it to my tree with your signoff and will
> > send you a pull request right now.
> 
> I'm still seeing:
> 
> nf_conntrack version 0.5.0 (4096 buckets, 16384 max)
> ctnetlink v0.93: registering with nfnetlink.
> ip_tables: (C) 2000-2006 Netfilter Core Team
> WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88001e605480)
> 4055601e0088ffff000000000000000090686d81ffffffff0000000000000000
>  u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
>  ^
> RIP: 0010:[<ffffffff8166e561>]  [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160

This is nf_register_net_hook at net/netfilter/core.c:106
Markus Trippelsdorf Oct. 13, 2016, 6:27 a.m. UTC | #5
On 2016.10.12 at 23:18 -0700, Linus Torvalds wrote:
> On Oct 12, 2016 23:07, "Markus Trippelsdorf" <markus@trippelsdorf.de> wrote:
> >
> > This is nf_register_net_hook at net/netfilter/core.c:106
>
> The "*regs" access?

Yeah.

105         entry->orig_ops = reg;
106         entry->ops      = *reg;
107         entry->next     = NULL;

--
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Oct. 13, 2016, 7:49 p.m. UTC | #6
On Wed, Oct 12, 2016 at 11:27 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
>
> Yeah.
>
> 105         entry->orig_ops = reg;
> 106         entry->ops      = *reg;
> 107         entry->next     = NULL;

So ipt_register_table() does:

        ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));

and then nf_register_net_hooks() just does

        for (i = 0; i < n; i++) {
                err = nf_register_net_hook(net, &reg[i]);

so if the *reg is uninitialized, it means that it's the 'ops[]' array
that isn't actually really valid in "valid_hooks". Odd. They should
all be initialized by xt_hook_ops_alloc(), no?

That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the
loop that initializes things:

        for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0;
             hook_mask >>= 1, ++hooknum) {

and it makes no sense to me how that tests *both* "i < num_hools" and
"hook_mask != 0".

Why? Because

    num_hooks = hweight32(hook_mask);

so it's entirely redundant. num_hooks is already how many bits are on
in hook_mask, so that test is just duplicating the same thing twice
("have we done less than that number of bits" and "do we have any bits
less").

I don't know. There's something odd going on. Regardless, thsi is a
different problem from the nf_register_net_hook() list handling, so
I'll leave it to the networking people. David?

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Oct. 13, 2016, 9:32 p.m. UTC | #7
On Thu, Oct 13, 2016 at 12:49:33PM -0700, Linus Torvalds wrote:

> That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the
> loop that initializes things:
> 
>         for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0;
>              hook_mask >>= 1, ++hooknum) {
> 
> and it makes no sense to me how that tests *both* "i < num_hools" and
> "hook_mask != 0".
> 
> Why? Because
> 
>     num_hooks = hweight32(hook_mask);
> 
> so it's entirely redundant. num_hooks is already how many bits are on
> in hook_mask, so that test is just duplicating the same thing twice
> ("have we done less than that number of bits" and "do we have any bits
> less").
> 
> I don't know. There's something odd going on. Regardless, thsi is a
> different problem from the nf_register_net_hook() list handling, so
> I'll leave it to the networking people. David?

Hey, I remember looking through that stuff. <checks>  There it is, in
a thread started by Krause Randomness(tm)...  Short version: nf_hook_ops
is a mess - it's embedded into different objects, with different subsets
of fields used depending on the containing object and I would seriously
suggest moving some of those into those containing objects.

--------------------------------------------------------------------------
On Thu, Sep 01, 2016 at 08:10:44AM -0500, Eric Sandeen wrote:
> On 8/4/16 8:57 AM, Al Viro wrote:
> 
> > Don't feed the troll.  On all paths leading to that place we have
> >         result->name = kname;
> >         len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
> > or
> >                 result->name = kname;
> >                 len = strncpy_from_user(kname, filename, PATH_MAX);
> > with failure exits taken if strncpy_from_user() returns an error, which means
> > that the damn thing has already been copied into.
> > 
> > FWIW, it looks a lot like buggered kmemcheck; as usual, he can't be bothered
> > to mention which kernel version would it be (let alone how to reproduce it
> > on the kernel in question), but IIRC davej had run into some instrumentation
> > breakage lately.
> 
> The original report is in https://bugzilla.kernel.org/show_bug.cgi?id=120651
> if anyone is interested in it.

	What the hell does that one have to getname_flags(), other than having
attracted the same... something on the edge of failing the Turing Test?

	FWIW, looking at the netfilter one...  That's nf_register_net_hook()
hitting
        entry->ops      = *reg;
with reg pointing to something uninitialized (according to kmemcheck, that is,
and presuming that it's not an instrumentation bug).  With the callchain
in report, it came (all in the same assumptions) from
	nf_register_net_hooks(net, ops, hweight32(table->valid_hooks))
with hweight32(table->valid_hooks) being greater than the amount of
initialized entries in ops[] (call site in ipt_register_table()).

	This "ops" ought to be net/ipv4/netfilter/iptable_filter.c:filter_ops,
allocated by
        filter_ops = xt_hook_ops_alloc(&packet_filter, iptable_filter_hook);
in iptable_filter_init().  "table" is &packet_filter and its contents ought
to be unchanged, so ->valid_hooks in there is FILTER_VALID_HOOKS, i.e.
((1 << NF_INET_LOCAL_IN) | (1 << NF_INET_FORWARD) | (1 << NF_INET_LOCAL_OUT)).

	Which is to say, filter_ops[] had fewer than 3 initialized elements
when it got to the call of iptable_filter_table_init()...  Since filter_ops
hadn't been NULL, the xt_hook_ops_alloc() call above must've already been
done.  Said xt_hook_ops_alloc() should've allocated a 3-element array and
hooked through all of it, so it's not a wholesale uninitialized element, it's
uninitialized parts of one...

	What gets initialized is ->hook, ->pf, ->hooknum and ->priority.
Let's figure out the offsets:
	0: list (two pointers, i.e. 16 bytes)
	0x10: hook (8)
	0x18: dev (8)
	0x20: priv (8)
	0x28: pf (1)
	0x29: padding (3)
	0x2c: hooknum (4)
	0x30: priority (4)
	0x34: padding (8)
	
OK...  The address of the damn thing is apparently ffff880037b4bd80 and
we see complaint about the accesses at offsets 0, 0x18, 8, 0x20 and then
the same pattern with 0x38 and 0x70 added (i.e. the same fields in the next
two elements of the same array).  Then there are similar complaints, but
with a different call chain (iptable_mangle instead of iptable_filter).

These offsets are ->list, ->dev and ->priv, and those are exactly the ones
not initialized by xt_hook_ops_alloc().  Looking at the nf_register_net_hook(),
we have
        list_add_rcu(&entry->ops.list, elem->list.prev);
a bit further down the road.  ->dev and ->priv are left uninitialized (and
very likely - unused).

I would say it's a false positive.  struct nf_hook_ops is embedded into a
bunch of different objects, with different subsets of fields getting used.
IMO it's a bad idea (in particular, I really wonder if ->list would've
been better off moved into (some of) the containing suckers), but it's
not a bug per se, just a design choice asking for trouble.  One way of
getting kmemcheck off your back would be to switch xt_hook_ops_alloc() from
	ops = kmalloc(sizeof(*ops) * num_hooks, GFP_KERNEL);
to
	ops = kcalloc(num_hooks, sizeof(*ops), GFP_KERNEL);
which might have some merits beyond making kmemcheck STFU...
--------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index c9d90eb64046..fcb5d1df11e9 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -65,49 +65,24 @@  static DEFINE_MUTEX(nf_hook_mutex);
 #define nf_entry_dereference(e) \
 	rcu_dereference_protected(e, lockdep_is_held(&nf_hook_mutex))
 
-static struct nf_hook_entry *nf_hook_entry_head(struct net *net,
-						const struct nf_hook_ops *reg)
+static struct nf_hook_entry __rcu **nf_hook_entry_head(struct net *net, const struct nf_hook_ops *reg)
 {
-	struct nf_hook_entry *hook_head = NULL;
-
 	if (reg->pf != NFPROTO_NETDEV)
-		hook_head = nf_entry_dereference(net->nf.hooks[reg->pf]
-						 [reg->hooknum]);
-	else if (reg->hooknum == NF_NETDEV_INGRESS) {
+		return net->nf.hooks[reg->pf]+reg->hooknum;
+
 #ifdef CONFIG_NETFILTER_INGRESS
+	if (reg->hooknum == NF_NETDEV_INGRESS) {
 		if (reg->dev && dev_net(reg->dev) == net)
-			hook_head =
-				nf_entry_dereference(
-					reg->dev->nf_hooks_ingress);
-#endif
+			return &reg->dev->nf_hooks_ingress;
 	}
-	return hook_head;
-}
-
-/* must hold nf_hook_mutex */
-static void nf_set_hooks_head(struct net *net, const struct nf_hook_ops *reg,
-			      struct nf_hook_entry *entry)
-{
-	switch (reg->pf) {
-	case NFPROTO_NETDEV:
-#ifdef CONFIG_NETFILTER_INGRESS
-		/* We already checked in nf_register_net_hook() that this is
-		 * used from ingress.
-		 */
-		rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
 #endif
-		break;
-	default:
-		rcu_assign_pointer(net->nf.hooks[reg->pf][reg->hooknum],
-				   entry);
-		break;
-	}
+	return NULL;
 }
 
 int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)
 {
-	struct nf_hook_entry *hooks_entry;
-	struct nf_hook_entry *entry;
+	struct nf_hook_entry __rcu **pp;
+	struct nf_hook_entry *entry, *p;
 
 	if (reg->pf == NFPROTO_NETDEV) {
 #ifndef CONFIG_NETFILTER_INGRESS
@@ -119,6 +94,10 @@  int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)
 			return -EINVAL;
 	}
 
+	pp = nf_hook_entry_head(net, reg);
+	if (!pp)
+		return -EINVAL;
+
 	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
 	if (!entry)
 		return -ENOMEM;
@@ -128,26 +107,15 @@  int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)
 	entry->next	= NULL;
 
 	mutex_lock(&nf_hook_mutex);
-	hooks_entry = nf_hook_entry_head(net, reg);
-
-	if (hooks_entry && hooks_entry->orig_ops->priority > reg->priority) {
-		/* This is the case where we need to insert at the head */
-		entry->next = hooks_entry;
-		hooks_entry = NULL;
-	}
-
-	while (hooks_entry &&
-		reg->priority >= hooks_entry->orig_ops->priority &&
-		nf_entry_dereference(hooks_entry->next)) {
-		hooks_entry = nf_entry_dereference(hooks_entry->next);
-	}
 
-	if (hooks_entry) {
-		entry->next = nf_entry_dereference(hooks_entry->next);
-		rcu_assign_pointer(hooks_entry->next, entry);
-	} else {
-		nf_set_hooks_head(net, reg, entry);
+	/* Find the spot in the list */
+	while ((p = nf_entry_dereference(*pp)) != NULL) {
+		if (reg->priority < p->orig_ops->priority)
+			break;
+		pp = &p->next;
 	}
+	rcu_assign_pointer(entry->next, p);
+	rcu_assign_pointer(*pp, entry);
 
 	mutex_unlock(&nf_hook_mutex);
 #ifdef CONFIG_NETFILTER_INGRESS
@@ -163,33 +131,23 @@  EXPORT_SYMBOL(nf_register_net_hook);
 
 void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)
 {
-	struct nf_hook_entry *hooks_entry;
+	struct nf_hook_entry __rcu **pp;
+	struct nf_hook_entry *p;
 
-	mutex_lock(&nf_hook_mutex);
-	hooks_entry = nf_hook_entry_head(net, reg);
-	if (hooks_entry && hooks_entry->orig_ops == reg) {
-		nf_set_hooks_head(net, reg,
-				  nf_entry_dereference(hooks_entry->next));
-		goto unlock;
-	}
-	while (hooks_entry && nf_entry_dereference(hooks_entry->next)) {
-		struct nf_hook_entry *next =
-			nf_entry_dereference(hooks_entry->next);
-		struct nf_hook_entry *nnext;
+	pp = nf_hook_entry_head(net, reg);
+	if (WARN_ON_ONCE(!pp))
+		return;
 
-		if (next->orig_ops != reg) {
-			hooks_entry = next;
-			continue;
+	mutex_lock(&nf_hook_mutex);
+	while ((p = nf_entry_dereference(*pp)) != NULL) {
+		if (p->orig_ops == reg) {
+			rcu_assign_pointer(*pp, p->next);
+			break;
 		}
-		nnext = nf_entry_dereference(next->next);
-		rcu_assign_pointer(hooks_entry->next, nnext);
-		hooks_entry = next;
-		break;
+		pp = &p->next;
 	}
-
-unlock:
 	mutex_unlock(&nf_hook_mutex);
-	if (!hooks_entry) {
+	if (!p) {
 		WARN(1, "nf_unregister_net_hook: hook not found!\n");
 		return;
 	}
@@ -201,10 +159,10 @@  void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)
 	static_key_slow_dec(&nf_hooks_needed[reg->pf][reg->hooknum]);
 #endif
 	synchronize_net();
-	nf_queue_nf_hook_drop(net, hooks_entry);
+	nf_queue_nf_hook_drop(net, p);
 	/* other cpu might still process nfqueue verdict that used reg */
 	synchronize_net();
-	kfree(hooks_entry);
+	kfree(p);
 }
 EXPORT_SYMBOL(nf_unregister_net_hook);