diff mbox series

net: sched: Fix an endian bug in tcf_proto_create

Message ID 20231117093110.1842011-1-chentao@kylinos.cn (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: sched: Fix an endian bug in tcf_proto_create | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1136 this patch: 1135
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 1155 this patch: 1155
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes fail Problems with Fixes tag: 1
netdev/build_allmodconfig_warn success Errors and warnings before: 1163 this patch: 1162
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kunwu Chan Nov. 17, 2023, 9:31 a.m. UTC
net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types)
net/sched/cls_api.c:390:22:    expected restricted __be16 [usertype] protocol
net/sched/cls_api.c:390:22:    got unsigned int [usertype] protocol

Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")

Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
---
 net/sched/cls_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Pedro Tammela Nov. 17, 2023, 12:06 p.m. UTC | #1
On 17/11/2023 06:31, Kunwu Chan wrote:
> net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types)
> net/sched/cls_api.c:390:22:    expected restricted __be16 [usertype] protocol
> net/sched/cls_api.c:390:22:    got unsigned int [usertype] protocol
> 
> Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
> 
> Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
> ---
>   net/sched/cls_api.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 1976bd163986..f73f39f61f66 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
>   		goto errout;
>   	}
>   	tp->classify = tp->ops->classify;
> -	tp->protocol = protocol;
> +	tp->protocol = cpu_to_be16(protocol);
>   	tp->prio = prio;
>   	tp->chain = chain;
>   	spin_lock_init(&tp->lock);
I don't believe there's something to fix here either
Simon Horman Nov. 20, 2023, 10:04 a.m. UTC | #2
On Fri, Nov 17, 2023 at 09:06:45AM -0300, Pedro Tammela wrote:
> On 17/11/2023 06:31, Kunwu Chan wrote:
> > net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types)
> > net/sched/cls_api.c:390:22:    expected restricted __be16 [usertype] protocol
> > net/sched/cls_api.c:390:22:    got unsigned int [usertype] protocol
> > 
> > Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
> > 
> > Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
> > ---
> >   net/sched/cls_api.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> > index 1976bd163986..f73f39f61f66 100644
> > --- a/net/sched/cls_api.c
> > +++ b/net/sched/cls_api.c
> > @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
> >   		goto errout;
> >   	}
> >   	tp->classify = tp->ops->classify;
> > -	tp->protocol = protocol;
> > +	tp->protocol = cpu_to_be16(protocol);
> >   	tp->prio = prio;
> >   	tp->chain = chain;
> >   	spin_lock_init(&tp->lock);
> I don't believe there's something to fix here either

Hi Pedro and Kunwu,

I suspect that updating the byte order of protocol isn't correct
here - else I'd assume we would have seen a user-visible bug on
little-endian systems buy now.

But nonetheless I think there is a problem, which is that the appropriate
types aren't being used, which means the tooling isn't helping us wrt any
bugs that might subsequently be added or already lurking. So I think an
appropriate question is, what is the endien and width of protocol, and how
can we use an appropriate type throughout the call-path?
Kunwu Chan Nov. 20, 2023, 11:15 a.m. UTC | #3
Hi Simon,

Thanks for your reply.
For a lot of newcomers who aren't proficient in this part of the code, 
like me, it might be confusing what is the  correct endien and width of 
a protocol.

In response to your question, I wonder if it is necessary to implement a 
unified checking mechanism with a strict parameter validation for all 
invocation parameters?

For example, add an input parameter to the 'tcf_proto_create' to 
represent the endien and width of the protocol, and check the validity 
of the input parameter at the beginning of the function.

I don't have a good idea of how to make sure that the right type is used 
in the call path.
This is just my personal opinion, welcome to discuss.

On 2023/11/20 18:04, Simon Horman wrote:
> On Fri, Nov 17, 2023 at 09:06:45AM -0300, Pedro Tammela wrote:
>> On 17/11/2023 06:31, Kunwu Chan wrote:
>>> net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types)
>>> net/sched/cls_api.c:390:22:    expected restricted __be16 [usertype] protocol
>>> net/sched/cls_api.c:390:22:    got unsigned int [usertype] protocol
>>>
>>> Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
>>>
>>> Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
>>> ---
>>>    net/sched/cls_api.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>> index 1976bd163986..f73f39f61f66 100644
>>> --- a/net/sched/cls_api.c
>>> +++ b/net/sched/cls_api.c
>>> @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
>>>    		goto errout;
>>>    	}
>>>    	tp->classify = tp->ops->classify;
>>> -	tp->protocol = protocol;
>>> +	tp->protocol = cpu_to_be16(protocol);
>>>    	tp->prio = prio;
>>>    	tp->chain = chain;
>>>    	spin_lock_init(&tp->lock);
>> I don't believe there's something to fix here either
> 
> Hi Pedro and Kunwu,
> 
> I suspect that updating the byte order of protocol isn't correct
> here - else I'd assume we would have seen a user-visible bug on
> little-endian systems buy now.
> 
> But nonetheless I think there is a problem, which is that the appropriate
> types aren't being used, which means the tooling isn't helping us wrt any
> bugs that might subsequently be added or already lurking. So I think an
> appropriate question is, what is the endien and width of protocol, and how
> can we use an appropriate type throughout the call-path?
Pedro Tammela Nov. 20, 2023, 3:33 p.m. UTC | #4
On 20/11/2023 07:04, Simon Horman wrote:
> On Fri, Nov 17, 2023 at 09:06:45AM -0300, Pedro Tammela wrote:
>> On 17/11/2023 06:31, Kunwu Chan wrote:
>>> net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types)
>>> net/sched/cls_api.c:390:22:    expected restricted __be16 [usertype] protocol
>>> net/sched/cls_api.c:390:22:    got unsigned int [usertype] protocol
>>>
>>> Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
>>>
>>> Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
>>> ---
>>>    net/sched/cls_api.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>> index 1976bd163986..f73f39f61f66 100644
>>> --- a/net/sched/cls_api.c
>>> +++ b/net/sched/cls_api.c
>>> @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
>>>    		goto errout;
>>>    	}
>>>    	tp->classify = tp->ops->classify;
>>> -	tp->protocol = protocol;
>>> +	tp->protocol = cpu_to_be16(protocol);
>>>    	tp->prio = prio;
>>>    	tp->chain = chain;
>>>    	spin_lock_init(&tp->lock);
>> I don't believe there's something to fix here either
> 
> Hi Pedro and Kunwu,
> 
> I suspect that updating the byte order of protocol isn't correct
> here - else I'd assume we would have seen a user-visible bug on
> little-endian systems buy now.
> 
> But nonetheless I think there is a problem, which is that the appropriate
> types aren't being used, which means the tooling isn't helping us wrt any
> bugs that might subsequently be added or already lurking. So I think an
> appropriate question is, what is the endien and width of protocol, and how
> can we use an appropriate type throughout the call-path?

Agreed and I'm all in for improving any tooling integration.
I believe a better patch would be to have protocol as a be16 since it's 
creation everywhere. I looked quickly and it will be a "viral" change, 
meaning a couple of places will require a one line change.
kernel test robot Nov. 27, 2023, 1:59 p.m. UTC | #5
Hello,

kernel test robot noticed "kernel-selftests.net.fib_tests.sh.IPv4_rp_filter_tests.rp_filter_passes_loopback_packets.fail" on:

commit: e706cea62528a99bf2a6ea5e420b9726540dde39 ("[PATCH] net: sched: Fix an endian bug in tcf_proto_create")
url: https://github.com/intel-lab-lkp/linux/commits/Kunwu-Chan/net-sched-Fix-an-endian-bug-in-tcf_proto_create/20231117-173323
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git 18de1e517ed37ebaf33e771e46faf052e966e163
patch link: https://lore.kernel.org/all/20231117093110.1842011-1-chentao@kylinos.cn/
patch subject: [PATCH] net: sched: Fix an endian bug in tcf_proto_create

in testcase: kernel-selftests
version: kernel-selftests-x86_64-60acb023-1_20230329
with following parameters:

	group: net



compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


besides, we also noticed below tests failed upon this patch.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/group:
  lkp-csl-d01/kernel-selftests/debian-12-x86_64-20220629.cgz/x86_64-rhel-8.3-bpf/gcc-12/net

commit:
  18de1e517ed37 ("gve: add gve_features_check()")
  e706cea62528a ("net: sched: Fix an endian bug in tcf_proto_create")

18de1e517ed37eba e706cea62528a99bf2a6ea5e420
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv4.fail
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv4_multiproto_mode.fail
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv6.fail
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv6_multiproto_mode.fail
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv4.fail
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv4_multiproto_mode.fail
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv6.fail
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv6_multiproto_mode.fail
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.Unicast_MPLS_packets_over_UDPv4.fail
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.Unicast_MPLS_packets_over_UDPv6.fail
           :10         100%          10:10    kernel-selftests.net.bareudp.sh.fail
           :10         100%          10:10    kernel-selftests.net.drop_monitor_tests.sh..Capturing_active_software_drops.fail
           :10         100%          10:10    kernel-selftests.net.drop_monitor_tests.sh.fail
           :10         100%          10:10    kernel-selftests.net.fib_tests.sh.IPv4_rp_filter_tests.rp_filter_passes_local_packets.fail
           :10         100%          10:10    kernel-selftests.net.fib_tests.sh.IPv4_rp_filter_tests.rp_filter_passes_loopback_packets.fail
           :10         100%          10:10    kernel-selftests.net.rtnetlink.sh.tc_htb_hierarchy.fail
           :10         100%          10:10    kernel-selftests.net.test_ingress_egress_chaining.sh.fail



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202311271643.12f713c2-oliver.sang@intel.com



# timeout set to 1500
# selftests: net: bareudp.sh
# TEST: IPv4 packets over UDPv4                                       [FAIL]
# TEST: IPv4 packets over UDPv6                                       [FAIL]
# TEST: IPv6 packets over UDPv4                                       [FAIL]
# TEST: IPv6 packets over UDPv6                                       [FAIL]
# TEST: IPv4 packets over UDPv4 (multiproto mode)                     [FAIL]
# TEST: IPv6 packets over UDPv4 (multiproto mode)                     [FAIL]
# TEST: IPv4 packets over UDPv6 (multiproto mode)                     [FAIL]
# TEST: IPv6 packets over UDPv6 (multiproto mode)                     [FAIL]
# TEST: Unicast MPLS packets over UDPv4                               [FAIL]
# TEST: Unicast MPLS packets over UDPv6                               [FAIL]
# Some tests failed.
not ok 49 selftests: net: bareudp.sh # exit=1



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231127/202311271643.12f713c2-oliver.sang@intel.com
diff mbox series

Patch

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1976bd163986..f73f39f61f66 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -387,7 +387,7 @@  static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 		goto errout;
 	}
 	tp->classify = tp->ops->classify;
-	tp->protocol = protocol;
+	tp->protocol = cpu_to_be16(protocol);
 	tp->prio = prio;
 	tp->chain = chain;
 	spin_lock_init(&tp->lock);