diff mbox series

[net-next] net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config

Message ID 20231027135142.11555-1-daniel@iogearbox.net (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1417 this patch: 1417
netdev/cc_maintainers warning 15 maintainers not CCed: song@kernel.org ast@kernel.org pabeni@redhat.com edumazet@google.com andrii@kernel.org yonghong.song@linux.dev xiyou.wangcong@gmail.com jolsa@kernel.org kpsingh@kernel.org john.fastabend@gmail.com davem@davemloft.net sdf@google.com haoluo@google.com martin.lau@linux.dev jiri@resnulli.us
netdev/build_clang success Errors and warnings before: 1388 this patch: 1388
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1470 this patch: 1470
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 82 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Daniel Borkmann Oct. 27, 2023, 1:51 p.m. UTC
Ido reported:

  [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
  reproducer [2]. Problem seems to be that classifiers clear 'struct
  tcf_result::drop_reason', thereby triggering the warning in
  __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]

  [1]
  WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
  Modules linked in:
  CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
  RIP: 0010:kfree_skb_reason+0x38/0x130
  [...]
  Call Trace:
   <IRQ>
   __netif_receive_skb_core.constprop.0+0x837/0xdb0
   __netif_receive_skb_one_core+0x3c/0x70
   process_backlog+0x95/0x130
   __napi_poll+0x25/0x1b0
   net_rx_action+0x29b/0x310
   __do_softirq+0xc0/0x29b
   do_softirq+0x43/0x60
   </IRQ>

  [2]
  #!/bin/bash

  ip link add name veth0 type veth peer name veth1
  ip link set dev veth0 up
  ip link set dev veth1 up
  tc qdisc add dev veth1 clsact
  tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
  mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1

What happens is that inside most classifiers the tcf_result is copied over
from a filter template e.g. *res = f->res which then implicitly overrides
the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
set via sch_handle_{ingress,egress}() for kfree_skb_reason().

Add a small helper tcf_set_result() and convert classifiers over to it.
The latter leaves the drop code intact and classifiers, actions as well
as the action engine in tcf_exts_exec() can then in future make use of
tcf_set_drop_reason(), too.

Tested that the splat is fixed under CONFIG_DEBUG_NET=y with the repro.

Fixes: 54a59aed395c ("net, sched: Make tc-related drop reason more flexible")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
---
 include/net/pkt_cls.h    | 12 ++++++++++++
 net/sched/cls_basic.c    |  2 +-
 net/sched/cls_bpf.c      |  2 +-
 net/sched/cls_flower.c   |  2 +-
 net/sched/cls_fw.c       |  2 +-
 net/sched/cls_matchall.c |  2 +-
 net/sched/cls_route.c    |  4 ++--
 net/sched/cls_u32.c      |  2 +-
 8 files changed, 20 insertions(+), 8 deletions(-)

Comments

Jamal Hadi Salim Oct. 27, 2023, 5:24 p.m. UTC | #1
On Fri, Oct 27, 2023 at 9:51 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Ido reported:
>
>   [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
>   reproducer [2]. Problem seems to be that classifiers clear 'struct
>   tcf_result::drop_reason', thereby triggering the warning in
>   __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]
>
>   [1]
>   WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
>   Modules linked in:
>   CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
>   RIP: 0010:kfree_skb_reason+0x38/0x130
>   [...]
>   Call Trace:
>    <IRQ>
>    __netif_receive_skb_core.constprop.0+0x837/0xdb0
>    __netif_receive_skb_one_core+0x3c/0x70
>    process_backlog+0x95/0x130
>    __napi_poll+0x25/0x1b0
>    net_rx_action+0x29b/0x310
>    __do_softirq+0xc0/0x29b
>    do_softirq+0x43/0x60
>    </IRQ>
>
>   [2]
>   #!/bin/bash
>
>   ip link add name veth0 type veth peer name veth1
>   ip link set dev veth0 up
>   ip link set dev veth1 up
>   tc qdisc add dev veth1 clsact
>   tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
>   mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
>
> What happens is that inside most classifiers the tcf_result is copied over
> from a filter template e.g. *res = f->res which then implicitly overrides
> the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
> set via sch_handle_{ingress,egress}() for kfree_skb_reason().
>
> Add a small helper tcf_set_result() and convert classifiers over to it.
> The latter leaves the drop code intact and classifiers, actions as well
> as the action engine in tcf_exts_exec() can then in future make use of
> tcf_set_drop_reason(), too.
>
> Tested that the splat is fixed under CONFIG_DEBUG_NET=y with the repro.
>
> Fixes: 54a59aed395c ("net, sched: Make tc-related drop reason more flexible")
> Reported-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
> ---
>  include/net/pkt_cls.h    | 12 ++++++++++++
>  net/sched/cls_basic.c    |  2 +-
>  net/sched/cls_bpf.c      |  2 +-
>  net/sched/cls_flower.c   |  2 +-
>  net/sched/cls_fw.c       |  2 +-
>  net/sched/cls_matchall.c |  2 +-
>  net/sched/cls_route.c    |  4 ++--
>  net/sched/cls_u32.c      |  2 +-
>  8 files changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index a76c9171db0e..31d8e8587824 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -160,6 +160,18 @@ static inline void tcf_set_drop_reason(struct tcf_result *res,
>         res->drop_reason = reason;
>  }
>
> +static inline void tcf_set_result(struct tcf_result *to,
> +                                 const struct tcf_result *from)
> +{
> +       /* tcf_result's drop_reason which is the last member must be
> +        * preserved and cannot be copied from the cls'es tcf_result
> +        * template given this is carried all the way and potentially
> +        * set to a concrete tc drop reason upon error or intentional
> +        * drop. See tcf_set_drop_reason() locations.
> +        */
> +       memcpy(to, from, offsetof(typeof(*to), drop_reason));
> +}
>

I believe our bigger issue here is we are using this struct now for
both policy set by the control plane and for runtime decisions
(drop_reason) - whereas the original assumption was this struct only
held set policy. In retrospect we should have put the verdict(which is
policy) here and return the error code (as was in the first patch). I
am also not sure humans would not make a mistake on "this field must
be at the end of the struct". Can we put some assert (or big comment
on the struct) to make sure someone does not overwrite this field?
Also what happens if "from" above has a set drop_reason - is that
lost? Do you need an assert there as well?
BTW: The simple patch i posted fixes the problem as well (i actually
tested it minus the typo i sent).

cheers,
jamal


>  static inline void
>  __tcf_bind_filter(struct Qdisc *q, struct tcf_result *r, unsigned long base)
>  {
> diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
> index 1b92c33b5f81..d7ead3fc3c45 100644
> --- a/net/sched/cls_basic.c
> +++ b/net/sched/cls_basic.c
> @@ -50,7 +50,7 @@ TC_INDIRECT_SCOPE int basic_classify(struct sk_buff *skb,
>                 if (!tcf_em_tree_match(skb, &f->ematches, NULL))
>                         continue;
>                 __this_cpu_inc(f->pf->rhit);
> -               *res = f->res;
> +               tcf_set_result(res, &f->res);
>                 r = tcf_exts_exec(skb, &f->exts, res);
>                 if (r < 0)
>                         continue;
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index 382c7a71f81f..e4620a462bc3 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -124,7 +124,7 @@ TC_INDIRECT_SCOPE int cls_bpf_classify(struct sk_buff *skb,
>                         res->class   = 0;
>                         res->classid = filter_res;
>                 } else {
> -                       *res = prog->res;
> +                       tcf_set_result(res, &prog->res);
>                 }
>
>                 ret = tcf_exts_exec(skb, &prog->exts, res);
> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index e5314a31f75a..eb94090fb26c 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -341,7 +341,7 @@ TC_INDIRECT_SCOPE int fl_classify(struct sk_buff *skb,
>
>                 f = fl_mask_lookup(mask, &skb_key);
>                 if (f && !tc_skip_sw(f->flags)) {
> -                       *res = f->res;
> +                       tcf_set_result(res, &f->res);
>                         return tcf_exts_exec(skb, &f->exts, res);
>                 }
>         }
> diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
> index c49d6af0e048..70b873f8771f 100644
> --- a/net/sched/cls_fw.c
> +++ b/net/sched/cls_fw.c
> @@ -63,7 +63,7 @@ TC_INDIRECT_SCOPE int fw_classify(struct sk_buff *skb,
>                 for (f = rcu_dereference_bh(head->ht[fw_hash(id)]); f;
>                      f = rcu_dereference_bh(f->next)) {
>                         if (f->id == id) {
> -                               *res = f->res;
> +                               tcf_set_result(res, &f->res);
>                                 if (!tcf_match_indev(skb, f->ifindex))
>                                         continue;
>                                 r = tcf_exts_exec(skb, &f->exts, res);
> diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
> index c4ed11df6254..a4018db80a60 100644
> --- a/net/sched/cls_matchall.c
> +++ b/net/sched/cls_matchall.c
> @@ -37,7 +37,7 @@ TC_INDIRECT_SCOPE int mall_classify(struct sk_buff *skb,
>         if (tc_skip_sw(head->flags))
>                 return -1;
>
> -       *res = head->res;
> +       tcf_set_result(res, &head->res);
>         __this_cpu_inc(head->pf->rhit);
>         return tcf_exts_exec(skb, &head->exts, res);
>  }
> diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
> index 1424bfeaca73..cbfaa1d1820f 100644
> --- a/net/sched/cls_route.c
> +++ b/net/sched/cls_route.c
> @@ -109,7 +109,7 @@ static inline int route4_hash_wild(void)
>
>  #define ROUTE4_APPLY_RESULT()                                  \
>  {                                                              \
> -       *res = f->res;                                          \
> +       tcf_set_result(res, &f->res);                           \
>         if (tcf_exts_has_actions(&f->exts)) {                   \
>                 int r = tcf_exts_exec(skb, &f->exts, res);      \
>                 if (r < 0) {                                    \
> @@ -152,7 +152,7 @@ TC_INDIRECT_SCOPE int route4_classify(struct sk_buff *skb,
>                         goto failure;
>                 }
>
> -               *res = f->res;
> +               tcf_set_result(res, &f->res);
>                 spin_unlock(&fastmap_lock);
>                 return 0;
>         }
> diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
> index 6663e971a13e..f50ae40a29d5 100644
> --- a/net/sched/cls_u32.c
> +++ b/net/sched/cls_u32.c
> @@ -172,7 +172,7 @@ TC_INDIRECT_SCOPE int u32_classify(struct sk_buff *skb,
>  check_terminal:
>                         if (n->sel.flags & TC_U32_TERMINAL) {
>
> -                               *res = n->res;
> +                               tcf_set_result(res, &n->res);
>                                 if (!tcf_match_indev(skb, n->ifindex)) {
>                                         n = rcu_dereference_bh(n->next);
>                                         goto next_knode;
> --
> 2.34.1
>
Daniel Borkmann Oct. 27, 2023, 6:21 p.m. UTC | #2
On 10/27/23 7:24 PM, Jamal Hadi Salim wrote:
> On Fri, Oct 27, 2023 at 9:51 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> Ido reported:
>>
>>    [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
>>    reproducer [2]. Problem seems to be that classifiers clear 'struct
>>    tcf_result::drop_reason', thereby triggering the warning in
>>    __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]
>>
>>    [1]
>>    WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
>>    Modules linked in:
>>    CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
>>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
>>    RIP: 0010:kfree_skb_reason+0x38/0x130
>>    [...]
>>    Call Trace:
>>     <IRQ>
>>     __netif_receive_skb_core.constprop.0+0x837/0xdb0
>>     __netif_receive_skb_one_core+0x3c/0x70
>>     process_backlog+0x95/0x130
>>     __napi_poll+0x25/0x1b0
>>     net_rx_action+0x29b/0x310
>>     __do_softirq+0xc0/0x29b
>>     do_softirq+0x43/0x60
>>     </IRQ>
>>
>>    [2]
>>    #!/bin/bash
>>
>>    ip link add name veth0 type veth peer name veth1
>>    ip link set dev veth0 up
>>    ip link set dev veth1 up
>>    tc qdisc add dev veth1 clsact
>>    tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
>>    mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
>>
>> What happens is that inside most classifiers the tcf_result is copied over
>> from a filter template e.g. *res = f->res which then implicitly overrides
>> the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
>> set via sch_handle_{ingress,egress}() for kfree_skb_reason().
>>
>> Add a small helper tcf_set_result() and convert classifiers over to it.
>> The latter leaves the drop code intact and classifiers, actions as well
>> as the action engine in tcf_exts_exec() can then in future make use of
>> tcf_set_drop_reason(), too.
>>
>> Tested that the splat is fixed under CONFIG_DEBUG_NET=y with the repro.
>>
>> Fixes: 54a59aed395c ("net, sched: Make tc-related drop reason more flexible")
>> Reported-by: Ido Schimmel <idosch@idosch.org>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
>> ---
>>   include/net/pkt_cls.h    | 12 ++++++++++++
>>   net/sched/cls_basic.c    |  2 +-
>>   net/sched/cls_bpf.c      |  2 +-
>>   net/sched/cls_flower.c   |  2 +-
>>   net/sched/cls_fw.c       |  2 +-
>>   net/sched/cls_matchall.c |  2 +-
>>   net/sched/cls_route.c    |  4 ++--
>>   net/sched/cls_u32.c      |  2 +-
>>   8 files changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> index a76c9171db0e..31d8e8587824 100644
>> --- a/include/net/pkt_cls.h
>> +++ b/include/net/pkt_cls.h
>> @@ -160,6 +160,18 @@ static inline void tcf_set_drop_reason(struct tcf_result *res,
>>          res->drop_reason = reason;
>>   }
>>
>> +static inline void tcf_set_result(struct tcf_result *to,
>> +                                 const struct tcf_result *from)
>> +{
>> +       /* tcf_result's drop_reason which is the last member must be
>> +        * preserved and cannot be copied from the cls'es tcf_result
>> +        * template given this is carried all the way and potentially
>> +        * set to a concrete tc drop reason upon error or intentional
>> +        * drop. See tcf_set_drop_reason() locations.
>> +        */
>> +       memcpy(to, from, offsetof(typeof(*to), drop_reason));
>> +}
> 
> I believe our bigger issue here is we are using this struct now for
> both policy set by the control plane and for runtime decisions

Hm, but that was also either way in the original rfc.

> (drop_reason) - whereas the original assumption was this struct only
> held set policy. In retrospect we should have put the verdict(which is
> policy) here and return the error code (as was in the first patch). I
> am also not sure humans would not make a mistake on "this field must
> be at the end of the struct". Can we put some assert (or big comment
> on the struct) to make sure someone does not overwrite this field?

Yeah that can be done.

> Also what happens if "from" above has a set drop_reason - is that
> lost? Do you need an assert there as well?

Why it's needed, do you have a use case for it?

> BTW: The simple patch i posted fixes the problem as well (i actually
> tested it minus the typo i sent).

It didn't compile for me, but if you think it's a better approach, yes,
feel free to post it as a proper patch then.

What I'm not quite following though is, I thought your original use case
was that you want to be able to troubleshoot drops from unexpected
locations (aka not policy) in the tc engine so won't this miss cases when
you would then want to use tcf_set_drop_reason() e.g. from tcf_action_exec()
upon 'exception' cases (like the one for example I pointed out)? With the
diff you proposed it will basically fallback to SKB_DROP_REASON_TC_{INGRESS,
EGRESS}, so override anything that would have been set from there.

Thanks,
Daniel
Jamal Hadi Salim Oct. 28, 2023, 4:50 p.m. UTC | #3
On Fri, Oct 27, 2023 at 2:21 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 10/27/23 7:24 PM, Jamal Hadi Salim wrote:
> > On Fri, Oct 27, 2023 at 9:51 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>
> >> Ido reported:
> >>
> >>    [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
> >>    reproducer [2]. Problem seems to be that classifiers clear 'struct
> >>    tcf_result::drop_reason', thereby triggering the warning in
> >>    __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]
> >>
> >>    [1]
> >>    WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
> >>    Modules linked in:
> >>    CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
> >>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
> >>    RIP: 0010:kfree_skb_reason+0x38/0x130
> >>    [...]
> >>    Call Trace:
> >>     <IRQ>
> >>     __netif_receive_skb_core.constprop.0+0x837/0xdb0
> >>     __netif_receive_skb_one_core+0x3c/0x70
> >>     process_backlog+0x95/0x130
> >>     __napi_poll+0x25/0x1b0
> >>     net_rx_action+0x29b/0x310
> >>     __do_softirq+0xc0/0x29b
> >>     do_softirq+0x43/0x60
> >>     </IRQ>
> >>
> >>    [2]
> >>    #!/bin/bash
> >>
> >>    ip link add name veth0 type veth peer name veth1
> >>    ip link set dev veth0 up
> >>    ip link set dev veth1 up
> >>    tc qdisc add dev veth1 clsact
> >>    tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
> >>    mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
> >>
> >> What happens is that inside most classifiers the tcf_result is copied over
> >> from a filter template e.g. *res = f->res which then implicitly overrides
> >> the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
> >> set via sch_handle_{ingress,egress}() for kfree_skb_reason().
> >>
> >> Add a small helper tcf_set_result() and convert classifiers over to it.
> >> The latter leaves the drop code intact and classifiers, actions as well
> >> as the action engine in tcf_exts_exec() can then in future make use of
> >> tcf_set_drop_reason(), too.
> >>
> >> Tested that the splat is fixed under CONFIG_DEBUG_NET=y with the repro.
> >>
> >> Fixes: 54a59aed395c ("net, sched: Make tc-related drop reason more flexible")
> >> Reported-by: Ido Schimmel <idosch@idosch.org>
> >> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> >> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> >> Cc: Jakub Kicinski <kuba@kernel.org>
> >> Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
> >> ---
> >>   include/net/pkt_cls.h    | 12 ++++++++++++
> >>   net/sched/cls_basic.c    |  2 +-
> >>   net/sched/cls_bpf.c      |  2 +-
> >>   net/sched/cls_flower.c   |  2 +-
> >>   net/sched/cls_fw.c       |  2 +-
> >>   net/sched/cls_matchall.c |  2 +-
> >>   net/sched/cls_route.c    |  4 ++--
> >>   net/sched/cls_u32.c      |  2 +-
> >>   8 files changed, 20 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> >> index a76c9171db0e..31d8e8587824 100644
> >> --- a/include/net/pkt_cls.h
> >> +++ b/include/net/pkt_cls.h
> >> @@ -160,6 +160,18 @@ static inline void tcf_set_drop_reason(struct tcf_result *res,
> >>          res->drop_reason = reason;
> >>   }
> >>
> >> +static inline void tcf_set_result(struct tcf_result *to,
> >> +                                 const struct tcf_result *from)
> >> +{
> >> +       /* tcf_result's drop_reason which is the last member must be
> >> +        * preserved and cannot be copied from the cls'es tcf_result
> >> +        * template given this is carried all the way and potentially
> >> +        * set to a concrete tc drop reason upon error or intentional
> >> +        * drop. See tcf_set_drop_reason() locations.
> >> +        */
> >> +       memcpy(to, from, offsetof(typeof(*to), drop_reason));
> >> +}
> >
> > I believe our bigger issue here is we are using this struct now for
> > both policy set by the control plane and for runtime decisions
>
> Hm, but that was also either way in the original rfc.
>

I lost track of the different versions, but one of the earlier ones did.

> > (drop_reason) - whereas the original assumption was this struct only
> > held set policy. In retrospect we should have put the verdict(which is
> > policy) here and return the error code (as was in the first patch). I
> > am also not sure humans would not make a mistake on "this field must
> > be at the end of the struct". Can we put some assert (or big comment
> > on the struct) to make sure someone does not overwrite this field?
>
> Yeah that can be done.
>
> > Also what happens if "from" above has a set drop_reason - is that
> > lost? Do you need an assert there as well?
>
> Why it's needed, do you have a use case for it?
>

I dont have an exact use case - more like making the API more future proof.

> > BTW: The simple patch i posted fixes the problem as well (i actually
> > tested it minus the typo i sent).
>
> It didn't compile for me, but if you think it's a better approach, yes,
> feel free to post it as a proper patch then.
>

Sorry, yes - what i posted wouldnt have compiled, it was just to
illustrate the idea (but i did test it after). In regards to the other
point you made on the action code the idea was Victor was going to
post a patch afterwards to cover other cases that are not covered in
the current case. But i can add the one check and will let Victor take
care of the rest.

> What I'm not quite following though is, I thought your original use case
> was that you want to be able to troubleshoot drops from unexpected
> locations (aka not policy) in the tc engine so won't this miss cases when
> you would then want to use tcf_set_drop_reason() e.g. from tcf_action_exec()
> upon 'exception' cases (like the one for example I pointed out)? With the
> diff you proposed it will basically fallback to SKB_DROP_REASON_TC_{INGRESS,
> EGRESS}, so override anything that would have been set from there.

The initial motivation was to deal with syzkaller injecting trickery
where the distinction between a verdict and a failure was in the grey
zone. CBQ for example was so susceptible we ended just deleting it
(well, it was also not really being used that much otherwise we would
have seen bug reports over the years).
This evolved to us wanting to track more where the errors were
happening in the code path (and tracing the return codes) to
eventually deciding what we needed was drop reason(and now drop
monitor).
The initialization you have is fine - if we go the tc_cb field setup
then we could initialize to SKB_DROP_REASON_TC_EGRESS for the other
qdiscs.

cheers,
jamal
> Thanks,
> Daniel
Paolo Abeni Nov. 2, 2023, 10:17 a.m. UTC | #4
On Fri, 2023-10-27 at 20:21 +0200, Daniel Borkmann wrote:
> On 10/27/23 7:24 PM, Jamal Hadi Salim wrote:
> > On Fri, Oct 27, 2023 at 9:51 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > 
> > > Ido reported:
> > > 
> > >    [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
> > >    reproducer [2]. Problem seems to be that classifiers clear 'struct
> > >    tcf_result::drop_reason', thereby triggering the warning in
> > >    __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]
> > > 
> > >    [1]
> > >    WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
> > >    Modules linked in:
> > >    CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
> > >    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
> > >    RIP: 0010:kfree_skb_reason+0x38/0x130
> > >    [...]
> > >    Call Trace:
> > >     <IRQ>
> > >     __netif_receive_skb_core.constprop.0+0x837/0xdb0
> > >     __netif_receive_skb_one_core+0x3c/0x70
> > >     process_backlog+0x95/0x130
> > >     __napi_poll+0x25/0x1b0
> > >     net_rx_action+0x29b/0x310
> > >     __do_softirq+0xc0/0x29b
> > >     do_softirq+0x43/0x60
> > >     </IRQ>
> > > 
> > >    [2]
> > >    #!/bin/bash
> > > 
> > >    ip link add name veth0 type veth peer name veth1
> > >    ip link set dev veth0 up
> > >    ip link set dev veth1 up
> > >    tc qdisc add dev veth1 clsact
> > >    tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
> > >    mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
> > > 
> > > What happens is that inside most classifiers the tcf_result is copied over
> > > from a filter template e.g. *res = f->res which then implicitly overrides
> > > the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
> > > set via sch_handle_{ingress,egress}() for kfree_skb_reason().
> > > 
> > > Add a small helper tcf_set_result() and convert classifiers over to it.
> > > The latter leaves the drop code intact and classifiers, actions as well
> > > as the action engine in tcf_exts_exec() can then in future make use of
> > > tcf_set_drop_reason(), too.
> > > 
> > > Tested that the splat is fixed under CONFIG_DEBUG_NET=y with the repro.
> > > 
> > > Fixes: 54a59aed395c ("net, sched: Make tc-related drop reason more flexible")
> > > Reported-by: Ido Schimmel <idosch@idosch.org>
> > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
> > > ---
> > >   include/net/pkt_cls.h    | 12 ++++++++++++
> > >   net/sched/cls_basic.c    |  2 +-
> > >   net/sched/cls_bpf.c      |  2 +-
> > >   net/sched/cls_flower.c   |  2 +-
> > >   net/sched/cls_fw.c       |  2 +-
> > >   net/sched/cls_matchall.c |  2 +-
> > >   net/sched/cls_route.c    |  4 ++--
> > >   net/sched/cls_u32.c      |  2 +-
> > >   8 files changed, 20 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> > > index a76c9171db0e..31d8e8587824 100644
> > > --- a/include/net/pkt_cls.h
> > > +++ b/include/net/pkt_cls.h
> > > @@ -160,6 +160,18 @@ static inline void tcf_set_drop_reason(struct tcf_result *res,
> > >          res->drop_reason = reason;
> > >   }
> > > 
> > > +static inline void tcf_set_result(struct tcf_result *to,
> > > +                                 const struct tcf_result *from)
> > > +{
> > > +       /* tcf_result's drop_reason which is the last member must be
> > > +        * preserved and cannot be copied from the cls'es tcf_result
> > > +        * template given this is carried all the way and potentially
> > > +        * set to a concrete tc drop reason upon error or intentional
> > > +        * drop. See tcf_set_drop_reason() locations.
> > > +        */
> > > +       memcpy(to, from, offsetof(typeof(*to), drop_reason));
> > > +}
> > 
> > I believe our bigger issue here is we are using this struct now for
> > both policy set by the control plane and for runtime decisions
> 
> Hm, but that was also either way in the original rfc.
> 
> > (drop_reason) - whereas the original assumption was this struct only
> > held set policy. In retrospect we should have put the verdict(which is
> > policy) here and return the error code (as was in the first patch). I
> > am also not sure humans would not make a mistake on "this field must
> > be at the end of the struct". Can we put some assert (or big comment
> > on the struct) to make sure someone does not overwrite this field?
> 
> Yeah that can be done.

FTR, I agree the comment or even better a build_bug_on() somewhere
should be better.

Thanks!

Paolo
Jamal Hadi Salim Nov. 2, 2023, 12:47 p.m. UTC | #5
On Thu, Nov 2, 2023 at 6:17 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Fri, 2023-10-27 at 20:21 +0200, Daniel Borkmann wrote:
> > On 10/27/23 7:24 PM, Jamal Hadi Salim wrote:
> > > On Fri, Oct 27, 2023 at 9:51 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > >
> > > > Ido reported:
> > > >
> > > >    [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
> > > >    reproducer [2]. Problem seems to be that classifiers clear 'struct
> > > >    tcf_result::drop_reason', thereby triggering the warning in
> > > >    __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]
> > > >
> > > >    [1]
> > > >    WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
> > > >    Modules linked in:
> > > >    CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
> > > >    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
> > > >    RIP: 0010:kfree_skb_reason+0x38/0x130
> > > >    [...]
> > > >    Call Trace:
> > > >     <IRQ>
> > > >     __netif_receive_skb_core.constprop.0+0x837/0xdb0
> > > >     __netif_receive_skb_one_core+0x3c/0x70
> > > >     process_backlog+0x95/0x130
> > > >     __napi_poll+0x25/0x1b0
> > > >     net_rx_action+0x29b/0x310
> > > >     __do_softirq+0xc0/0x29b
> > > >     do_softirq+0x43/0x60
> > > >     </IRQ>
> > > >
> > > >    [2]
> > > >    #!/bin/bash
> > > >
> > > >    ip link add name veth0 type veth peer name veth1
> > > >    ip link set dev veth0 up
> > > >    ip link set dev veth1 up
> > > >    tc qdisc add dev veth1 clsact
> > > >    tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
> > > >    mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
> > > >
> > > > What happens is that inside most classifiers the tcf_result is copied over
> > > > from a filter template e.g. *res = f->res which then implicitly overrides
> > > > the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
> > > > set via sch_handle_{ingress,egress}() for kfree_skb_reason().
> > > >
> > > > Add a small helper tcf_set_result() and convert classifiers over to it.
> > > > The latter leaves the drop code intact and classifiers, actions as well
> > > > as the action engine in tcf_exts_exec() can then in future make use of
> > > > tcf_set_drop_reason(), too.
> > > >
> > > > Tested that the splat is fixed under CONFIG_DEBUG_NET=y with the repro.
> > > >
> > > > Fixes: 54a59aed395c ("net, sched: Make tc-related drop reason more flexible")
> > > > Reported-by: Ido Schimmel <idosch@idosch.org>
> > > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > > > Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> > > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > > Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
> > > > ---
> > > >   include/net/pkt_cls.h    | 12 ++++++++++++
> > > >   net/sched/cls_basic.c    |  2 +-
> > > >   net/sched/cls_bpf.c      |  2 +-
> > > >   net/sched/cls_flower.c   |  2 +-
> > > >   net/sched/cls_fw.c       |  2 +-
> > > >   net/sched/cls_matchall.c |  2 +-
> > > >   net/sched/cls_route.c    |  4 ++--
> > > >   net/sched/cls_u32.c      |  2 +-
> > > >   8 files changed, 20 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> > > > index a76c9171db0e..31d8e8587824 100644
> > > > --- a/include/net/pkt_cls.h
> > > > +++ b/include/net/pkt_cls.h
> > > > @@ -160,6 +160,18 @@ static inline void tcf_set_drop_reason(struct tcf_result *res,
> > > >          res->drop_reason = reason;
> > > >   }
> > > >
> > > > +static inline void tcf_set_result(struct tcf_result *to,
> > > > +                                 const struct tcf_result *from)
> > > > +{
> > > > +       /* tcf_result's drop_reason which is the last member must be
> > > > +        * preserved and cannot be copied from the cls'es tcf_result
> > > > +        * template given this is carried all the way and potentially
> > > > +        * set to a concrete tc drop reason upon error or intentional
> > > > +        * drop. See tcf_set_drop_reason() locations.
> > > > +        */
> > > > +       memcpy(to, from, offsetof(typeof(*to), drop_reason));
> > > > +}
> > >
> > > I believe our bigger issue here is we are using this struct now for
> > > both policy set by the control plane and for runtime decisions
> >
> > Hm, but that was also either way in the original rfc.
> >
> > > (drop_reason) - whereas the original assumption was this struct only
> > > held set policy. In retrospect we should have put the verdict(which is
> > > policy) here and return the error code (as was in the first patch). I
> > > am also not sure humans would not make a mistake on "this field must
> > > be at the end of the struct". Can we put some assert (or big comment
> > > on the struct) to make sure someone does not overwrite this field?
> >
> > Yeah that can be done.
>
> FTR, I agree the comment or even better a build_bug_on() somewhere
> should be better.

Paolo - Did you see the patch i posted? Ido/Daniel?

cheers,
jamal
>
> Thanks!
>
> Paolo
>
Paolo Abeni Nov. 2, 2023, 2:45 p.m. UTC | #6
On Thu, 2023-11-02 at 08:47 -0400, Jamal Hadi Salim wrote:
> On Thu, Nov 2, 2023 at 6:17 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > FTR, I agree the comment or even better a build_bug_on() somewhere
> > should be better.
> 
> Paolo - Did you see the patch i posted? Ido/Daniel?

Nope, not in my inbox, lore nor PW. I guess a repost will be needed?!?

Thanks

Paolo
Jamal Hadi Salim Nov. 2, 2023, 2:51 p.m. UTC | #7
On Thu, Nov 2, 2023 at 10:45 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Thu, 2023-11-02 at 08:47 -0400, Jamal Hadi Salim wrote:
> > On Thu, Nov 2, 2023 at 6:17 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > FTR, I agree the comment or even better a build_bug_on() somewhere
> > > should be better.
> >
> > Paolo - Did you see the patch i posted? Ido/Daniel?
>
> Nope, not in my inbox, lore nor PW. I guess a repost will be needed?!?
>

Probably the mistake was to use exactly the same title and most of the
commit message.
Its here:
https://lore.kernel.org/netdev/20231028171610.28596-1-jhs@mojatatu.com/
and
https://patchwork.kernel.org/project/netdevbpf/patch/20231028171610.28596-1-jhs@mojatatu.com/

cheers,
jamal

> Thanks
>
> Paolo
>
diff mbox series

Patch

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index a76c9171db0e..31d8e8587824 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -160,6 +160,18 @@  static inline void tcf_set_drop_reason(struct tcf_result *res,
 	res->drop_reason = reason;
 }
 
+static inline void tcf_set_result(struct tcf_result *to,
+				  const struct tcf_result *from)
+{
+	/* tcf_result's drop_reason which is the last member must be
+	 * preserved and cannot be copied from the cls'es tcf_result
+	 * template given this is carried all the way and potentially
+	 * set to a concrete tc drop reason upon error or intentional
+	 * drop. See tcf_set_drop_reason() locations.
+	 */
+	memcpy(to, from, offsetof(typeof(*to), drop_reason));
+}
+
 static inline void
 __tcf_bind_filter(struct Qdisc *q, struct tcf_result *r, unsigned long base)
 {
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 1b92c33b5f81..d7ead3fc3c45 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -50,7 +50,7 @@  TC_INDIRECT_SCOPE int basic_classify(struct sk_buff *skb,
 		if (!tcf_em_tree_match(skb, &f->ematches, NULL))
 			continue;
 		__this_cpu_inc(f->pf->rhit);
-		*res = f->res;
+		tcf_set_result(res, &f->res);
 		r = tcf_exts_exec(skb, &f->exts, res);
 		if (r < 0)
 			continue;
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 382c7a71f81f..e4620a462bc3 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -124,7 +124,7 @@  TC_INDIRECT_SCOPE int cls_bpf_classify(struct sk_buff *skb,
 			res->class   = 0;
 			res->classid = filter_res;
 		} else {
-			*res = prog->res;
+			tcf_set_result(res, &prog->res);
 		}
 
 		ret = tcf_exts_exec(skb, &prog->exts, res);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e5314a31f75a..eb94090fb26c 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -341,7 +341,7 @@  TC_INDIRECT_SCOPE int fl_classify(struct sk_buff *skb,
 
 		f = fl_mask_lookup(mask, &skb_key);
 		if (f && !tc_skip_sw(f->flags)) {
-			*res = f->res;
+			tcf_set_result(res, &f->res);
 			return tcf_exts_exec(skb, &f->exts, res);
 		}
 	}
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index c49d6af0e048..70b873f8771f 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -63,7 +63,7 @@  TC_INDIRECT_SCOPE int fw_classify(struct sk_buff *skb,
 		for (f = rcu_dereference_bh(head->ht[fw_hash(id)]); f;
 		     f = rcu_dereference_bh(f->next)) {
 			if (f->id == id) {
-				*res = f->res;
+				tcf_set_result(res, &f->res);
 				if (!tcf_match_indev(skb, f->ifindex))
 					continue;
 				r = tcf_exts_exec(skb, &f->exts, res);
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index c4ed11df6254..a4018db80a60 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -37,7 +37,7 @@  TC_INDIRECT_SCOPE int mall_classify(struct sk_buff *skb,
 	if (tc_skip_sw(head->flags))
 		return -1;
 
-	*res = head->res;
+	tcf_set_result(res, &head->res);
 	__this_cpu_inc(head->pf->rhit);
 	return tcf_exts_exec(skb, &head->exts, res);
 }
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index 1424bfeaca73..cbfaa1d1820f 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -109,7 +109,7 @@  static inline int route4_hash_wild(void)
 
 #define ROUTE4_APPLY_RESULT()					\
 {								\
-	*res = f->res;						\
+	tcf_set_result(res, &f->res);				\
 	if (tcf_exts_has_actions(&f->exts)) {			\
 		int r = tcf_exts_exec(skb, &f->exts, res);	\
 		if (r < 0) {					\
@@ -152,7 +152,7 @@  TC_INDIRECT_SCOPE int route4_classify(struct sk_buff *skb,
 			goto failure;
 		}
 
-		*res = f->res;
+		tcf_set_result(res, &f->res);
 		spin_unlock(&fastmap_lock);
 		return 0;
 	}
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 6663e971a13e..f50ae40a29d5 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -172,7 +172,7 @@  TC_INDIRECT_SCOPE int u32_classify(struct sk_buff *skb,
 check_terminal:
 			if (n->sel.flags & TC_U32_TERMINAL) {
 
-				*res = n->res;
+				tcf_set_result(res, &n->res);
 				if (!tcf_match_indev(skb, n->ifindex)) {
 					n = rcu_dereference_bh(n->next);
 					goto next_knode;