Message ID | 20231117093110.1842011-1-chentao@kylinos.cn (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: sched: Fix an endian bug in tcf_proto_create | expand |
On 17/11/2023 06:31, Kunwu Chan wrote: > net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types) > net/sched/cls_api.c:390:22: expected restricted __be16 [usertype] protocol > net/sched/cls_api.c:390:22: got unsigned int [usertype] protocol > > Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function") > > Signed-off-by: Kunwu Chan <chentao@kylinos.cn> > --- > net/sched/cls_api.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c > index 1976bd163986..f73f39f61f66 100644 > --- a/net/sched/cls_api.c > +++ b/net/sched/cls_api.c > @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol, > goto errout; > } > tp->classify = tp->ops->classify; > - tp->protocol = protocol; > + tp->protocol = cpu_to_be16(protocol); > tp->prio = prio; > tp->chain = chain; > spin_lock_init(&tp->lock); I don't believe there's something to fix here either
On Fri, Nov 17, 2023 at 09:06:45AM -0300, Pedro Tammela wrote: > On 17/11/2023 06:31, Kunwu Chan wrote: > > net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types) > > net/sched/cls_api.c:390:22: expected restricted __be16 [usertype] protocol > > net/sched/cls_api.c:390:22: got unsigned int [usertype] protocol > > > > Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function") > > > > Signed-off-by: Kunwu Chan <chentao@kylinos.cn> > > --- > > net/sched/cls_api.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c > > index 1976bd163986..f73f39f61f66 100644 > > --- a/net/sched/cls_api.c > > +++ b/net/sched/cls_api.c > > @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol, > > goto errout; > > } > > tp->classify = tp->ops->classify; > > - tp->protocol = protocol; > > + tp->protocol = cpu_to_be16(protocol); > > tp->prio = prio; > > tp->chain = chain; > > spin_lock_init(&tp->lock); > I don't believe there's something to fix here either Hi Pedro and Kunwu, I suspect that updating the byte order of protocol isn't correct here - else I'd assume we would have seen a user-visible bug on little-endian systems buy now. But nonetheless I think there is a problem, which is that the appropriate types aren't being used, which means the tooling isn't helping us wrt any bugs that might subsequently be added or already lurking. So I think an appropriate question is, what is the endien and width of protocol, and how can we use an appropriate type throughout the call-path?
Hi Simon, Thanks for your reply. For a lot of newcomers who aren't proficient in this part of the code, like me, it might be confusing what is the correct endien and width of a protocol. In response to your question, I wonder if it is necessary to implement a unified checking mechanism with a strict parameter validation for all invocation parameters? For example, add an input parameter to the 'tcf_proto_create' to represent the endien and width of the protocol, and check the validity of the input parameter at the beginning of the function. I don't have a good idea of how to make sure that the right type is used in the call path. This is just my personal opinion, welcome to discuss. On 2023/11/20 18:04, Simon Horman wrote: > On Fri, Nov 17, 2023 at 09:06:45AM -0300, Pedro Tammela wrote: >> On 17/11/2023 06:31, Kunwu Chan wrote: >>> net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types) >>> net/sched/cls_api.c:390:22: expected restricted __be16 [usertype] protocol >>> net/sched/cls_api.c:390:22: got unsigned int [usertype] protocol >>> >>> Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function") >>> >>> Signed-off-by: Kunwu Chan <chentao@kylinos.cn> >>> --- >>> net/sched/cls_api.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c >>> index 1976bd163986..f73f39f61f66 100644 >>> --- a/net/sched/cls_api.c >>> +++ b/net/sched/cls_api.c >>> @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol, >>> goto errout; >>> } >>> tp->classify = tp->ops->classify; >>> - tp->protocol = protocol; >>> + tp->protocol = cpu_to_be16(protocol); >>> tp->prio = prio; >>> tp->chain = chain; >>> spin_lock_init(&tp->lock); >> I don't believe there's something to fix here either > > Hi Pedro and Kunwu, > > I suspect that updating the byte order of protocol isn't correct > here - else I'd assume we would have seen a user-visible bug on > little-endian systems buy now. > > But nonetheless I think there is a problem, which is that the appropriate > types aren't being used, which means the tooling isn't helping us wrt any > bugs that might subsequently be added or already lurking. So I think an > appropriate question is, what is the endien and width of protocol, and how > can we use an appropriate type throughout the call-path?
On 20/11/2023 07:04, Simon Horman wrote: > On Fri, Nov 17, 2023 at 09:06:45AM -0300, Pedro Tammela wrote: >> On 17/11/2023 06:31, Kunwu Chan wrote: >>> net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types) >>> net/sched/cls_api.c:390:22: expected restricted __be16 [usertype] protocol >>> net/sched/cls_api.c:390:22: got unsigned int [usertype] protocol >>> >>> Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function") >>> >>> Signed-off-by: Kunwu Chan <chentao@kylinos.cn> >>> --- >>> net/sched/cls_api.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c >>> index 1976bd163986..f73f39f61f66 100644 >>> --- a/net/sched/cls_api.c >>> +++ b/net/sched/cls_api.c >>> @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol, >>> goto errout; >>> } >>> tp->classify = tp->ops->classify; >>> - tp->protocol = protocol; >>> + tp->protocol = cpu_to_be16(protocol); >>> tp->prio = prio; >>> tp->chain = chain; >>> spin_lock_init(&tp->lock); >> I don't believe there's something to fix here either > > Hi Pedro and Kunwu, > > I suspect that updating the byte order of protocol isn't correct > here - else I'd assume we would have seen a user-visible bug on > little-endian systems buy now. > > But nonetheless I think there is a problem, which is that the appropriate > types aren't being used, which means the tooling isn't helping us wrt any > bugs that might subsequently be added or already lurking. So I think an > appropriate question is, what is the endien and width of protocol, and how > can we use an appropriate type throughout the call-path? Agreed and I'm all in for improving any tooling integration. I believe a better patch would be to have protocol as a be16 since it's creation everywhere. I looked quickly and it will be a "viral" change, meaning a couple of places will require a one line change.
Hello, kernel test robot noticed "kernel-selftests.net.fib_tests.sh.IPv4_rp_filter_tests.rp_filter_passes_loopback_packets.fail" on: commit: e706cea62528a99bf2a6ea5e420b9726540dde39 ("[PATCH] net: sched: Fix an endian bug in tcf_proto_create") url: https://github.com/intel-lab-lkp/linux/commits/Kunwu-Chan/net-sched-Fix-an-endian-bug-in-tcf_proto_create/20231117-173323 base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git 18de1e517ed37ebaf33e771e46faf052e966e163 patch link: https://lore.kernel.org/all/20231117093110.1842011-1-chentao@kylinos.cn/ patch subject: [PATCH] net: sched: Fix an endian bug in tcf_proto_create in testcase: kernel-selftests version: kernel-selftests-x86_64-60acb023-1_20230329 with following parameters: group: net compiler: gcc-12 test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) besides, we also noticed below tests failed upon this patch. ========================================================================================= tbox_group/testcase/rootfs/kconfig/compiler/group: lkp-csl-d01/kernel-selftests/debian-12-x86_64-20220629.cgz/x86_64-rhel-8.3-bpf/gcc-12/net commit: 18de1e517ed37 ("gve: add gve_features_check()") e706cea62528a ("net: sched: Fix an endian bug in tcf_proto_create") 18de1e517ed37eba e706cea62528a99bf2a6ea5e420 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | :10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv4.fail :10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv4_multiproto_mode.fail :10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv6.fail :10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv6_multiproto_mode.fail :10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv4.fail :10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv4_multiproto_mode.fail :10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv6.fail :10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv6_multiproto_mode.fail :10 100% 10:10 kernel-selftests.net.bareudp.sh.Unicast_MPLS_packets_over_UDPv4.fail :10 100% 10:10 kernel-selftests.net.bareudp.sh.Unicast_MPLS_packets_over_UDPv6.fail :10 100% 10:10 kernel-selftests.net.bareudp.sh.fail :10 100% 10:10 kernel-selftests.net.drop_monitor_tests.sh..Capturing_active_software_drops.fail :10 100% 10:10 kernel-selftests.net.drop_monitor_tests.sh.fail :10 100% 10:10 kernel-selftests.net.fib_tests.sh.IPv4_rp_filter_tests.rp_filter_passes_local_packets.fail :10 100% 10:10 kernel-selftests.net.fib_tests.sh.IPv4_rp_filter_tests.rp_filter_passes_loopback_packets.fail :10 100% 10:10 kernel-selftests.net.rtnetlink.sh.tc_htb_hierarchy.fail :10 100% 10:10 kernel-selftests.net.test_ingress_egress_chaining.sh.fail If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202311271643.12f713c2-oliver.sang@intel.com # timeout set to 1500 # selftests: net: bareudp.sh # TEST: IPv4 packets over UDPv4 [FAIL] # TEST: IPv4 packets over UDPv6 [FAIL] # TEST: IPv6 packets over UDPv4 [FAIL] # TEST: IPv6 packets over UDPv6 [FAIL] # TEST: IPv4 packets over UDPv4 (multiproto mode) [FAIL] # TEST: IPv6 packets over UDPv4 (multiproto mode) [FAIL] # TEST: IPv4 packets over UDPv6 (multiproto mode) [FAIL] # TEST: IPv6 packets over UDPv6 (multiproto mode) [FAIL] # TEST: Unicast MPLS packets over UDPv4 [FAIL] # TEST: Unicast MPLS packets over UDPv6 [FAIL] # Some tests failed. not ok 49 selftests: net: bareudp.sh # exit=1 The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20231127/202311271643.12f713c2-oliver.sang@intel.com
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 1976bd163986..f73f39f61f66 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol, goto errout; } tp->classify = tp->ops->classify; - tp->protocol = protocol; + tp->protocol = cpu_to_be16(protocol); tp->prio = prio; tp->chain = chain; spin_lock_init(&tp->lock);
net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types) net/sched/cls_api.c:390:22: expected restricted __be16 [usertype] protocol net/sched/cls_api.c:390:22: got unsigned int [usertype] protocol Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> --- net/sched/cls_api.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)