diff mbox series

[net,1/1] net: sched: cls_u32: Fix match key mis-addressing

Message ID 20230724151849.323497-1-jhs@mojatatu.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,1/1] net: sched: cls_u32: Fix match key mis-addressing | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1342 this patch: 1342
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 1365 this patch: 1365
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1365 this patch: 1365
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jamal Hadi Salim July 24, 2023, 3:18 p.m. UTC
A match entry is uniquely identified with an "address" or "path" in the
form of: hashtable ID(12b):bucketid(8b):nodeid(12b).
A table on which the entry is inserted assumes the address with nodeid of
zero.

When creating table match entries all of hash table id, bucket id and
node (match entry id) are needed to be either specified by the user or
via reasonable in-kernel defaults. The in-kernel default for a table
id is 0x800(omnipresent root table); for bucketid it is 0x0. Prior to
this fix there was none for a nodeid i.e. the code assumed that the
user passed the correct nodeid and if the user passes a nodeid of 0
(as Mingi Cho did) then that is what was assumed. But nodeid of 0
is reserved for identifying the table. This is not a problem until
we dump. The dump code notices that the nodeid is zero and assumes
it is referencing a table and therefore table struct tc_u_hnode instead
of what was created i.e match entry struct tc_u_knode.

Ming does an equivalent of:
tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \
protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok

Essentially specifying a table id 0, bucketid 1 and nodeid of zero
Tableid 0 is remapped to the default of 0x800.
Bucketid 1 is ignored and defaults to 0x00.
Nodeid was assumed to be what Ming passed - 0x000

dumping before fix shows:
~$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591

Note that the last line reports a table instead of a match entry
(you can tell this because it says "ht divisor...").
As a result of reporting the wrong data type (misinterpretting of struct
tc_u_knode as being struct tc_u_hnode) the divisor is reported with value
of -30591. Ming identified this as part of the heap address
(physmap_base is 0xffff8880 (-30591 - 1)).

The fix is to ensure that when table entry matches are added and no
nodeid is specified (i.e nodeid == 0) then we get the next available
nodeid from the table's pool.

After the fix, this is what the dump shows:
$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw
  match 0a000001/ffffffff at 12
	action order 1: gact action pass
	 random type none pass val 0
	 index 1 ref 1 bind 1

Reported-by: Mingi Cho <mgcho.minic@gmail.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/cls_u32.c | 48 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)

Comments

Linus Torvalds July 24, 2023, 5:49 p.m. UTC | #1
On Mon, 24 Jul 2023 at 08:19, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> +               /* The tableid from handle and tableid from htid must match */
>                 if (TC_U32_HTID(handle) && TC_U32_HTID(handle ^ htid)) {

Well, I guess the comment at least talks about matching.

I still do think that most people aren't going to read

    "Oh, TC_U32_HTID(handle ^ htid) being non-zero means that they
they differ in the hash table bits".

because the whole trick depends on bit op details, and depends on the
fact that TC_U32_HTID() is purely a bit masking operation.

But whatever.

I would also like to see some explanation of this pattern

                handle = htid | TC_U32_NODE(handle);

and why that "binary or" makes sense. Are the node bits in 'htid'
guaranteed to be zero?

Because if 'htid' can have node bits set, what's the logical reason
for or'ing things together?

And why is that variable called 'htid', when clearly it also has the
bucket ID, and the comment even says we have a valid bucket id?

So I do still find this code to be explicitly written to be confusing.
It's literally using variable names *designed* to not make sense, and
have other meanings.

Hmm?

I'm not hating the patch, but when I look at that code I just go "this
is confusing". And I don't think it's because I'm confused. I think
it's the code.

               Linus
Jamal Hadi Salim July 25, 2023, 5:04 p.m. UTC | #2
On Mon, Jul 24, 2023 at 1:49 PM Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
>
> On Mon, 24 Jul 2023 at 08:19, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > +               /* The tableid from handle and tableid from htid must match */
> >                 if (TC_U32_HTID(handle) && TC_U32_HTID(handle ^ htid)) {
>
> Well, I guess the comment at least talks about matching.
>
> I still do think that most people aren't going to read
>
>     "Oh, TC_U32_HTID(handle ^ htid) being non-zero means that they
> they differ in the hash table bits".
>
> because the whole trick depends on bit op details, and depends on the
> fact that TC_U32_HTID() is purely a bit masking operation.
>
> But whatever.

I will improve the text leading to this to state that the goal of
building the new handle is to come up with an "address" or "path" for
the table entry. Then the table id mismatch is understood from that
context as part of the rules for building the table entry "address".

> I would also like to see some explanation of this pattern
>
>                 handle = htid | TC_U32_NODE(handle);
>
> and why that "binary or" makes sense. Are the node bits in 'htid'
> guaranteed to be zero?

Per existing user space tools, yes - they are guaranteed to be zero
(per my gitchelogy of both kernel +  iproute2 since inception this has
been the behavior);

> Because if 'htid' can have node bits set, what's the logical reason
> for or'ing things together?
>

Hrm. I am not sure if this is what you are getting to: but you caught
a different bug there.
While none of existing user space tooling(I know of) sends htid with
nodeid set (i.e it is always zero) syzkaller or someone else's poc
could send a non-zero nodeid in the htid. It is not catastrophic but
user expectation will be skewed - both the handle nodeid + htid node
id would be used to derive the nodeid. It will fail from the idr
allocation if the nodeid is in use.
It feels like a second patch though -  rejecting htid with a specified
nodeid (that way expected user space behavior is maintained).

> And why is that variable called 'htid', when clearly it also has the
> bucket ID, and the comment even says we have a valid bucket id?
>

The name could be changed. I am not sure what a good name is but the
semantics are: it carries a minimum of hash table id and at most hash
table id + bucket id. This sounds lame: ht_maybe_plus_bktid ? or make
them two separate variables htid and bktid.

> So I do still find this code to be explicitly written to be confusing.
> It's literally using variable names *designed* to not make sense, and
> have other meanings.
>
> Hmm?
>
> I'm not hating the patch, but when I look at that code I just go "this
> is confusing". And I don't think it's because I'm confused. I think
> it's the code.

I dont think it was intended that way - or maybe i have been taking it
for granted given on/off staring at the code for different reasons
over the years.
If it was to be written today we should certainly have made the 3 ids
independent and really should have allowed only one (instead of two
ways) to specify the different object ids from user space. It is at
least 2 decades old, in those days maybe saving a bit of space was a
good thing.

cheers,
jamal
>                Linus
Linus Torvalds July 25, 2023, 5:44 p.m. UTC | #3
On Tue, 25 Jul 2023 at 10:04, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> > I would also like to see some explanation of this pattern
> >
> >                 handle = htid | TC_U32_NODE(handle);
> >
> > and why that "binary or" makes sense. Are the node bits in 'htid'
> > guaranteed to be zero?
>
> Per existing user space tools, yes - they are guaranteed to be zero
> (per my gitchelogy of both kernel +  iproute2 since inception this has
> been the behavior);

Ok, if the htid bits are zero, it's all good.  It's fine to use a
binary 'or' as a way to 'insert bits in the word', but only if the old
bits were zero, and that wasn't obvious to me.

The *normal* pattern would be to explicitly mask off the bits you want
to use, so that you don't get some random mixing of bits of the
fields.

Of course, that's a bit inconvenient here, since you don't have the
obvious accessors.

And while using bitfields would make the source code look fine,
bitfields are *horrible* for any ABI, since they have very weakly
defined semantics (ie litte-endian vs big-endian, and the *bit* order
is not at all guaranteed to match the *byte* order).

> > Because if 'htid' can have node bits set, what's the logical reason
> > for or'ing things together?
>
> Hrm. I am not sure if this is what you are getting to: but you caught
> a different bug there.

So what I'm getting at was just that *if* you can have mixing of bits
of that NODE part of the variable, I think the end result doesn't end
up being very sensible.

It's not that 'binary or' isn't a valid operation. But it normally
isn't all that sane to randomly just mix bits in these kinds of
things. What would it mean to mix node bits from an old value with the
user-supplied one?

So that pattern just looked odd to me.

                     Linus
diff mbox series

Patch

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 5abf31e432ca..e0eabbcce9d4 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -1024,18 +1024,54 @@  static int u32_change(struct net *net, struct sk_buff *in_skb,
 		return -EINVAL;
 	}
 
+	/* If we got this far _we have the table(ht)_ with its matching htid.
+	 * Note that earlier the ht selection is a result of either a) the user
+	 * providing the htid specified via TCA_U32_HASH attribute or b) when
+	 * no such attribute is passed then a default ht, typically root at
+	 * 0x80000000, is chosen.
+	 * The passed htid will also hold the bucketid. 0 is fine. For example
+	 * of the root ht, 0x80000000 is indicating bucketid 0, whereas a user
+	 * passed htid may have 0x60001000 indicating hash bucketid 1.
+	 *
+	 * We may also have a handle, if the user passed one. The handle carries
+	 * annotation of htid(12b):bucketid(8b):node/entryid(12b).
+	 * The value of bucketid on the handle is ignored even if one was passed;
+	 * rather the value on the htid is always assumed to be the bucketid.
+	 */
 	if (handle) {
+		/* The tableid from handle and tableid from htid must match */
 		if (TC_U32_HTID(handle) && TC_U32_HTID(handle ^ htid)) {
 			NL_SET_ERR_MSG_MOD(extack, "Handle specified hash table address mismatch");
 			return -EINVAL;
 		}
-		handle = htid | TC_U32_NODE(handle);
-		err = idr_alloc_u32(&ht->handle_idr, NULL, &handle, handle,
-				    GFP_KERNEL);
-		if (err)
-			return err;
-	} else
+		/* Ok, so far we have a valid htid(12b):bucketid(8b) but we
+		 * need to finalize our handle to point to the entry as well
+		 * (with a proper node/entryid(12b)). Nodeid _cannot be 0_ for
+		 * entries since it is reserved only for tables(see earlier
+		 * code which processes TC_U32_DIVISOR attribute).
+		 * if the handle did not specify a non-zero nodeid (example
+		 * passed 0x60000000) then pick a new one from the pool of IDs
+		 * this hash table has been allocating from.
+		 * If OTOH it is specified (i.e for example the user passed a
+		 * handle such as 0x60000123), then we use it generate our final
+		 * handle which is used to uniquely identify the match entry.
+		 */
+		if (!TC_U32_NODE(handle)) {
+			handle = gen_new_kid(ht, htid);
+		} else {
+			handle = htid | TC_U32_NODE(handle);
+			err = idr_alloc_u32(&ht->handle_idr, NULL, &handle,
+					    handle, GFP_KERNEL);
+			if (err)
+				return err;
+		}
+	} else {
+		/* we dont have a handle lets just generate one based on htid
+		 * recall that htid has both the table and bucket ids already
+		 * encoded, only missing piece is the nodeid.
+		 */
 		handle = gen_new_kid(ht, htid);
+	}
 
 	if (tb[TCA_U32_SEL] == NULL) {
 		NL_SET_ERR_MSG_MOD(extack, "Selector not specified");