diff mbox series

[iproute] tc: u32: Fix key folding in sample option

Message ID 20210202183051.21022-1-phil@nwl.cc (mailing list archive)
State New, archived
Delegated to: Stephen Hemminger
Headers show
Series [iproute] tc: u32: Fix key folding in sample option | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Phil Sutter Feb. 2, 2021, 6:30 p.m. UTC
In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
in kernel space. When iproute2 dropped support for the older algorithm,
the wrong code was removed and kernel 2.4 folding method remained in
place. To get things functional for recent kernels again, restoring the
old code alone was not sufficient - additional byteorder fixes were
needed.

While being at it, make use of ffs() and thereby align the code with how
kernel determines the shift width.

Fixes: 267480f55383c ("Backout the 2.4 utsname hash patch.")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
Initially I considered changing the kernel's key folding instead as the
old method didn't just ignore key bits beyond the first byte. Yet I am
not sure if this would cause problems with hardware offloading. And
given the fact that this simplified key folding is in place since the
dawn of 2.6, it is probably not such a big problem anyway.
---
 tc/f_u32.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Jamal Hadi Salim Feb. 4, 2021, 1:19 p.m. UTC | #1
Hi Phil,

I couldnt tell by inspection if what used to work before continues to.
In particular the kernel version does consider the divisor when folding.

Two examples that currently work, if you can try them:

Most used scheme:
---
tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
ht 2:: \
sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \
action ok
----

and this i also found in one of my scripts:
----
tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
ht 2:: \
sample u32 0x00000806 0x0000ffff at 12 \
match u32 0x00000800 0x0000ff00 at 12 flowid 1:10 \
action ok
----

Probably a simple meaning of "working" is:
the values before and after (your changes) are consistent.

If also you will do us a kindness and add maybe a testcase in tdc?
This way next person wanting to fix it can run the tests first before
posting a patch.

cheers,
jamal

On 2021-02-02 1:30 p.m., Phil Sutter wrote:
> In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
> in kernel space. When iproute2 dropped support for the older algorithm,
> the wrong code was removed and kernel 2.4 folding method remained in
> place. To get things functional for recent kernels again, restoring the
> old code alone was not sufficient - additional byteorder fixes were
> needed.
> 
> While being at it, make use of ffs() and thereby align the code with how
> kernel determines the shift width.
> 
> Fixes: 267480f55383c ("Backout the 2.4 utsname hash patch.")
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
> Initially I considered changing the kernel's key folding instead as the
> old method didn't just ignore key bits beyond the first byte. Yet I am
> not sure if this would cause problems with hardware offloading. And
> given the fact that this simplified key folding is in place since the
> dawn of 2.6, it is probably not such a big problem anyway.
> ---
>   tc/f_u32.c | 11 ++++++++---
>   1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/tc/f_u32.c b/tc/f_u32.c
> index 2ed5254a40d5f..a5747f671e1ea 100644
> --- a/tc/f_u32.c
> +++ b/tc/f_u32.c
> @@ -978,6 +978,13 @@ show_k:
>   	goto show_k;
>   }
>   
> +static __u32 u32_hash_fold(struct tc_u32_key *key)
> +{
> +	__u8 fshift = key->mask ? ffs(ntohl(key->mask)) - 1 : 0;
> +
> +	return ntohl(key->val & key->mask) >> fshift;
> +}
> +
>   static int u32_parse_opt(struct filter_util *qu, char *handle,
>   			 int argc, char **argv, struct nlmsghdr *n)
>   {
> @@ -1110,9 +1117,7 @@ static int u32_parse_opt(struct filter_util *qu, char *handle,
>   				}
>   				NEXT_ARG();
>   			}
> -			hash = sel2.keys[0].val & sel2.keys[0].mask;
> -			hash ^= hash >> 16;
> -			hash ^= hash >> 8;
> +			hash = u32_hash_fold(&sel2.keys[0]);
>   			htid = ((hash % divisor) << 12) | (htid & 0xFFF00000);
>   			sample_ok = 1;
>   			continue;
>
Phil Sutter Feb. 4, 2021, 2:04 p.m. UTC | #2
Jamal,

On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote:
> I couldnt tell by inspection if what used to work before continues to.
> In particular the kernel version does consider the divisor when folding.

That's correct. And so does tc. What's the matter?

> Two examples that currently work, if you can try them:

Both lack information about the used hashkey and divisor.

> Most used scheme:
> ---
> tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
> ht 2:: \
> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \
> action ok
> ----

htid before: 0x201000
htid after: 0x201000

> 
> and this i also found in one of my scripts:
> ----
> tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
> ht 2:: \
> sample u32 0x00000806 0x0000ffff at 12 \
> match u32 0x00000800 0x0000ff00 at 12 flowid 1:10 \
> action ok
> ----

htid before: 0x20e000 (0x8 ^ 0x6 = 0xe)
htid after: 0x206000

Are you sure this still works with current kernel and iproute2
(excluding my patch)? What divisor and hashkey is used?

> Probably a simple meaning of "working" is:
> the values before and after (your changes) are consistent.
> 
> If also you will do us a kindness and add maybe a testcase in tdc?
> This way next person wanting to fix it can run the tests first before
> posting a patch.

What is "tdc"?

Cheers, Phil
Jamal Hadi Salim Feb. 4, 2021, 2:34 p.m. UTC | #3
On 2021-02-04 9:04 a.m., Phil Sutter wrote:
> Jamal,
> 
> On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote:
>> I couldnt tell by inspection if what used to work before continues to.
>> In particular the kernel version does consider the divisor when folding.
> 
> That's correct. And so does tc. What's the matter?
> 

tc assumes 256 when undefined. Maybe man page needs to be
updated to state we need divisor specified otherwise default
is 256.

>> Two examples that currently work, if you can try them:
> 
> Both lack information about the used hashkey and divisor.
> 
>> Most used scheme:
>> ---
>> tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
>> ht 2:: \
>> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \
>> action ok
>> ----
> 
> htid before: 0x201000
> htid after: 0x201000
> 

Ok, this is the most common use-case. So we are good.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>


cheers,
jamal
Jamal Hadi Salim Feb. 4, 2021, 2:36 p.m. UTC | #4
sorry - meant to say, tdc is in kernel tree:
tools/testing/selftests/tc-testing

cheers,
jamal
Phil Sutter Feb. 4, 2021, 2:50 p.m. UTC | #5
On Thu, Feb 04, 2021 at 09:34:01AM -0500, Jamal Hadi Salim wrote:
> On 2021-02-04 9:04 a.m., Phil Sutter wrote:
> > Jamal,
> > 
> > On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote:
> >> I couldnt tell by inspection if what used to work before continues to.
> >> In particular the kernel version does consider the divisor when folding.
> > 
> > That's correct. And so does tc. What's the matter?
> > 
> 
> tc assumes 256 when undefined. Maybe man page needs to be
> updated to state we need divisor specified otherwise default
> is 256.

tc-u32.8 mentions the default in 'sample' option description. Specifying
divisor is mandatory when creating a hash table, so that path is
covered, too. I still don't get how this is related to my patch, though.

> >> Two examples that currently work, if you can try them:
> > 
> > Both lack information about the used hashkey and divisor.
> > 
> >> Most used scheme:
> >> ---
> >> tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
> >> ht 2:: \
> >> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \
> >> action ok
> >> ----
> > 
> > htid before: 0x201000
> > htid after: 0x201000
> > 
> 
> Ok, this is the most common use-case. So we are good.

Whatever.

Thanks, Phil
Jamal Hadi Salim Feb. 4, 2021, 3:28 p.m. UTC | #6
On 2021-02-04 9:50 a.m., Phil Sutter wrote:
> On Thu, Feb 04, 2021 at 09:34:01AM -0500, Jamal Hadi Salim wrote:
>> On 2021-02-04 9:04 a.m., Phil Sutter wrote:
>>> Jamal,
>>>
>>> On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote:
>>>> I couldnt tell by inspection if what used to work before continues to.
>>>> In particular the kernel version does consider the divisor when folding.
>>>
>>> That's correct. And so does tc. What's the matter?
>>>
>>
>> tc assumes 256 when undefined. Maybe man page needs to be
>> updated to state we need divisor specified otherwise default
>> is 256.
> 
> tc-u32.8 mentions the default in 'sample' option description. Specifying
> divisor is mandatory when creating a hash table, so that path is
> covered, too. I still don't get how this is related to my patch, though.
> 

It is a general comment related to this code (that you are modifying).
You mentioned divisor in your earlier email as part of the syntax for
sample. So it made me wonder:
Does the bucket placement assume a specific number of buckets in a
table? Example if i had a hash table with 4 buckets, would the sample
then pick the correct bucket? Would it be also correct for 32 buckets,
etc. Or it didnt matter before and it doesnt matter now.

>>>> Two examples that currently work, if you can try them:
>>>
>>> Both lack information about the used hashkey and divisor.
>>>
>>>> Most used scheme:
>>>> ---
>>>> tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
>>>> ht 2:: \
>>>> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \
>>>> action ok
>>>> ----
>>>
>>> htid before: 0x201000
>>> htid after: 0x201000
>>>
>>
>> Ok, this is the most common use-case. So we are good.
> 
> Whatever.

Meaning?


cheers,
jamal
Phil Sutter Feb. 4, 2021, 4:50 p.m. UTC | #7
On Thu, Feb 04, 2021 at 10:28:26AM -0500, Jamal Hadi Salim wrote:
> On 2021-02-04 9:50 a.m., Phil Sutter wrote:
> > On Thu, Feb 04, 2021 at 09:34:01AM -0500, Jamal Hadi Salim wrote:
> >> On 2021-02-04 9:04 a.m., Phil Sutter wrote:
> >>> Jamal,
> >>>
> >>> On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote:
> >>>> I couldnt tell by inspection if what used to work before continues to.
> >>>> In particular the kernel version does consider the divisor when folding.
> >>>
> >>> That's correct. And so does tc. What's the matter?
> >>>
> >>
> >> tc assumes 256 when undefined. Maybe man page needs to be
> >> updated to state we need divisor specified otherwise default
> >> is 256.
> > 
> > tc-u32.8 mentions the default in 'sample' option description. Specifying
> > divisor is mandatory when creating a hash table, so that path is
> > covered, too. I still don't get how this is related to my patch, though.
> > 
> 
> It is a general comment related to this code (that you are modifying).
> You mentioned divisor in your earlier email as part of the syntax for
> sample. So it made me wonder:
> Does the bucket placement assume a specific number of buckets in a
> table? Example if i had a hash table with 4 buckets, would the sample
> then pick the correct bucket? Would it be also correct for 32 buckets,
> etc. Or it didnt matter before and it doesnt matter now.

My patch doesn't change how divisor is applied. And yes, with a smaller
than 256 buckets hash table, specifying the divisor along with sample is
necessary.

> >>>> Two examples that currently work, if you can try them:
> >>>
> >>> Both lack information about the used hashkey and divisor.
> >>>
> >>>> Most used scheme:
> >>>> ---
> >>>> tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
> >>>> ht 2:: \
> >>>> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \
> >>>> action ok
> >>>> ----
> >>>
> >>> htid before: 0x201000
> >>> htid after: 0x201000
> >>>
> >>
> >> Ok, this is the most common use-case. So we are good.
> > 
> > Whatever.
> 
> Meaning?

Things are always fine if we only consider the positive results.

Cheers, Phil
Jamal Hadi Salim Feb. 4, 2021, 6:08 p.m. UTC | #8
On 2021-02-04 11:50 a.m., Phil Sutter wrote:
> On Thu, Feb 04, 2021 at 10:28:26AM -0500, Jamal Hadi Salim wrote:
>> On 2021-02-04 9:50 a.m., Phil Sutter wrote:
>>> On Thu, Feb 04, 2021 at 09:34:01AM -0500, Jamal Hadi Salim wrote:
>>>> On 2021-02-04 9:04 a.m., Phil Sutter wrote:
>>>>> Jamal,
>>>>>
>>>>> On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote:
>>>>>> I couldnt tell by inspection if what used to work before continues to.
>>>>>> In particular the kernel version does consider the divisor when folding.
>>>>>
>>>>> That's correct. And so does tc. What's the matter?
>>>>>
>>>>
>>>> tc assumes 256 when undefined. Maybe man page needs to be
>>>> updated to state we need divisor specified otherwise default
>>>> is 256.
>>>
>>> tc-u32.8 mentions the default in 'sample' option description. Specifying
>>> divisor is mandatory when creating a hash table, so that path is
>>> covered, too. I still don't get how this is related to my patch, though.
>>>
>>
>> It is a general comment related to this code (that you are modifying).
>> You mentioned divisor in your earlier email as part of the syntax for
>> sample. So it made me wonder:
>> Does the bucket placement assume a specific number of buckets in a
>> table? Example if i had a hash table with 4 buckets, would the sample
>> then pick the correct bucket? Would it be also correct for 32 buckets,
>> etc. Or it didnt matter before and it doesnt matter now.
> 
> My patch doesn't change how divisor is applied. And yes, with a smaller
> than 256 buckets hash table, specifying the divisor along with sample is
> necessary.
> 

Sorry - hadnt looked at the man page earlier. I think the text added is
sufficient. I am wondering why we made the divisor optional to begin
with.

cheers,
jamal
Phil Sutter July 5, 2021, 2:17 p.m. UTC | #9
Hi,

On Tue, Feb 02, 2021 at 07:30:51PM +0100, Phil Sutter wrote:
> In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
> in kernel space. When iproute2 dropped support for the older algorithm,
> the wrong code was removed and kernel 2.4 folding method remained in
> place. To get things functional for recent kernels again, restoring the
> old code alone was not sufficient - additional byteorder fixes were
> needed.
> 
> While being at it, make use of ffs() and thereby align the code with how
> kernel determines the shift width.
> 
> Fixes: 267480f55383c ("Backout the 2.4 utsname hash patch.")
> Signed-off-by: Phil Sutter <phil@nwl.cc>

Seems this patch fell off the table? Or was there an objection I missed?

FWIW, the related kernel selftests patch[1] which asserts this patch's
change is upstream already.

Cheers, Phil

[1] 373e13bc63639 ("selftests: tc-testing: u32: Add tests covering
sample option")
Jamal Hadi Salim July 5, 2021, 5:25 p.m. UTC | #10
On 2021-07-05 10:17 a.m., Phil Sutter wrote:
> Hi,
> 
> On Tue, Feb 02, 2021 at 07:30:51PM +0100, Phil Sutter wrote:
>> In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
>> in kernel space. When iproute2 dropped support for the older algorithm,
>> the wrong code was removed and kernel 2.4 folding method remained in
>> place. To get things functional for recent kernels again, restoring the
>> old code alone was not sufficient - additional byteorder fixes were
>> needed.
>>
>> While being at it, make use of ffs() and thereby align the code with how
>> kernel determines the shift width.
>>
>> Fixes: 267480f55383c ("Backout the 2.4 utsname hash patch.")
>> Signed-off-by: Phil Sutter <phil@nwl.cc>
> 
> Seems this patch fell off the table? Or was there an objection I missed?
> 

None i am aware of. I did ACK the patch.

cheers,
jamal
Phil Sutter Aug. 3, 2021, 4:42 p.m. UTC | #11
David,

On Mon, Jul 05, 2021 at 01:25:21PM -0400, Jamal Hadi Salim wrote:
> On 2021-07-05 10:17 a.m., Phil Sutter wrote:
[...]
> > Seems this patch fell off the table? Or was there an objection I missed?
> > 
> 
> None i am aware of. I did ACK the patch.

Could you please consider applying this patch? I am aware this is a fix
and therefore shouldn't have to go via iproute2-next but it's almost a
month since I pinged Stephen and didn't receive a reply.

Thanks, Phil
diff mbox series

Patch

diff --git a/tc/f_u32.c b/tc/f_u32.c
index 2ed5254a40d5f..a5747f671e1ea 100644
--- a/tc/f_u32.c
+++ b/tc/f_u32.c
@@ -978,6 +978,13 @@  show_k:
 	goto show_k;
 }
 
+static __u32 u32_hash_fold(struct tc_u32_key *key)
+{
+	__u8 fshift = key->mask ? ffs(ntohl(key->mask)) - 1 : 0;
+
+	return ntohl(key->val & key->mask) >> fshift;
+}
+
 static int u32_parse_opt(struct filter_util *qu, char *handle,
 			 int argc, char **argv, struct nlmsghdr *n)
 {
@@ -1110,9 +1117,7 @@  static int u32_parse_opt(struct filter_util *qu, char *handle,
 				}
 				NEXT_ARG();
 			}
-			hash = sel2.keys[0].val & sel2.keys[0].mask;
-			hash ^= hash >> 16;
-			hash ^= hash >> 8;
+			hash = u32_hash_fold(&sel2.keys[0]);
 			htid = ((hash % divisor) << 12) | (htid & 0xFFF00000);
 			sample_ok = 1;
 			continue;