Message ID | 20240705025056.12712-1-chengen.du@canonical.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] net/sched: Fix UAF when resolving a clash | expand |
On Fri, Jul 05, 2024 at 10:50:56AM +0800, Chengen Du wrote: > KASAN reports the following UAF: > > BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] > Read of size 1 at addr ffff888c07603600 by task handler130/6469 > > Call Trace: > <IRQ> > dump_stack_lvl+0x48/0x70 > print_address_description.constprop.0+0x33/0x3d0 > print_report+0xc0/0x2b0 > kasan_report+0xd0/0x120 > __asan_load1+0x6c/0x80 > tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] > tcf_ct_act+0x886/0x1350 [act_ct] > tcf_action_exec+0xf8/0x1f0 > fl_classify+0x355/0x360 [cls_flower] > __tcf_classify+0x1fd/0x330 > tcf_classify+0x21c/0x3c0 > sch_handle_ingress.constprop.0+0x2c5/0x500 > __netif_receive_skb_core.constprop.0+0xb25/0x1510 > __netif_receive_skb_list_core+0x220/0x4c0 > netif_receive_skb_list_internal+0x446/0x620 > napi_complete_done+0x157/0x3d0 > gro_cell_poll+0xcf/0x100 > __napi_poll+0x65/0x310 > net_rx_action+0x30c/0x5c0 > __do_softirq+0x14f/0x491 > __irq_exit_rcu+0x82/0xc0 > irq_exit_rcu+0xe/0x20 > common_interrupt+0xa1/0xb0 > </IRQ> > <TASK> > asm_common_interrupt+0x27/0x40 > > Allocated by task 6469: > kasan_save_stack+0x38/0x70 > kasan_set_track+0x25/0x40 > kasan_save_alloc_info+0x1e/0x40 > __kasan_krealloc+0x133/0x190 > krealloc+0xaa/0x130 > nf_ct_ext_add+0xed/0x230 [nf_conntrack] > tcf_ct_act+0x1095/0x1350 [act_ct] > tcf_action_exec+0xf8/0x1f0 > fl_classify+0x355/0x360 [cls_flower] > __tcf_classify+0x1fd/0x330 > tcf_classify+0x21c/0x3c0 > sch_handle_ingress.constprop.0+0x2c5/0x500 > __netif_receive_skb_core.constprop.0+0xb25/0x1510 > __netif_receive_skb_list_core+0x220/0x4c0 > netif_receive_skb_list_internal+0x446/0x620 > napi_complete_done+0x157/0x3d0 > gro_cell_poll+0xcf/0x100 > __napi_poll+0x65/0x310 > net_rx_action+0x30c/0x5c0 > __do_softirq+0x14f/0x491 > > Freed by task 6469: > kasan_save_stack+0x38/0x70 > kasan_set_track+0x25/0x40 > kasan_save_free_info+0x2b/0x60 > ____kasan_slab_free+0x180/0x1f0 > __kasan_slab_free+0x12/0x30 > slab_free_freelist_hook+0xd2/0x1a0 > __kmem_cache_free+0x1a2/0x2f0 > kfree+0x78/0x120 > nf_conntrack_free+0x74/0x130 [nf_conntrack] > nf_ct_destroy+0xb2/0x140 [nf_conntrack] > __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] > nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] > __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] > tcf_ct_act+0x12ad/0x1350 [act_ct] > tcf_action_exec+0xf8/0x1f0 > fl_classify+0x355/0x360 [cls_flower] > __tcf_classify+0x1fd/0x330 > tcf_classify+0x21c/0x3c0 > sch_handle_ingress.constprop.0+0x2c5/0x500 > __netif_receive_skb_core.constprop.0+0xb25/0x1510 > __netif_receive_skb_list_core+0x220/0x4c0 > netif_receive_skb_list_internal+0x446/0x620 > napi_complete_done+0x157/0x3d0 > gro_cell_poll+0xcf/0x100 > __napi_poll+0x65/0x310 > net_rx_action+0x30c/0x5c0 > __do_softirq+0x14f/0x491 > > The ct may be dropped if a clash has been resolved but is still passed to > the tcf_ct_flow_table_process_conn function for further usage. This issue > can be fixed by retrieving ct from skb again after confirming conntrack. > > Fixes: 0cc254e5aa37 ("net/sched: act_ct: Offload connections with commit action") > Co-developed-by: Gerald Yang <gerald.yang@canonical.com> > Signed-off-by: Gerald Yang <gerald.yang@canonical.com> > Signed-off-by: Chengen Du <chengen.du@canonical.com> > --- > net/sched/act_ct.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c > index 2a96d9c1db65..6f41796115e3 100644 > --- a/net/sched/act_ct.c > +++ b/net/sched/act_ct.c > @@ -1077,6 +1077,14 @@ TC_INDIRECT_SCOPE int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, > */ > if (nf_conntrack_confirm(skb) != NF_ACCEPT) > goto drop; > + > + /* The ct may be dropped if a clash has been resolved, > + * so it's necessary to retrieve it from skb again to > + * prevent UAF. > + */ > + ct = nf_ct_get(skb, &ctinfo); > + if (!ct) > + goto drop; After taking a closer look at this change, I have a question: Why do we need to change an action returned by "nf_conntrack_confirm()" (NF_ACCEPT) and actually perform the flow for NF_DROP? From the commit message I understand that you only want to prevent calling "tcf_ct_flow_table_process_conn()". But for such reason we have a bool variable: "skip_add". Shouldn't we just set "skip_add" to true to prevent the UAF? Would the following example code make sense in this case? ct = nf_ct_get(skb, &ctinfo); if (!ct) skip_add = true; > } > > if (!skip_add) > -- > 2.43.0 > > Thanks, Michal
Michal Kubiak <michal.kubiak@intel.com> wrote: > On Fri, Jul 05, 2024 at 10:50:56AM +0800, Chengen Du wrote: > The ct may be dropped if a clash has been resolved but is still passed to > > the tcf_ct_flow_table_process_conn function for further usage. This issue > > can be fixed by retrieving ct from skb again after confirming conntrack. Right, ct can be stale after confirm. > > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c > > index 2a96d9c1db65..6f41796115e3 100644 > > --- a/net/sched/act_ct.c > > +++ b/net/sched/act_ct.c > > @@ -1077,6 +1077,14 @@ TC_INDIRECT_SCOPE int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, > > */ > > if (nf_conntrack_confirm(skb) != NF_ACCEPT) > > goto drop; > > + > > + /* The ct may be dropped if a clash has been resolved, > > + * so it's necessary to retrieve it from skb again to > > + * prevent UAF. > > + */ > > + ct = nf_ct_get(skb, &ctinfo); > > + if (!ct) > > + goto drop; > > After taking a closer look at this change, I have a question: Why do we > need to change an action returned by "nf_conntrack_confirm()" > (NF_ACCEPT) and actually perform the flow for NF_DROP? > > From the commit message I understand that you only want to prevent > calling "tcf_ct_flow_table_process_conn()". But for such reason we have > a bool variable: "skip_add". > Shouldn't we just set "skip_add" to true to prevent the UAF? > Would the following example code make sense in this case? > > ct = nf_ct_get(skb, &ctinfo); > if (!ct) > skip_add = true; It depends on what tc wants do to here. For netfilter, the skb is not dropped and continues passing through the stack. Its up to user to decide what to do with it, e.g. doing "ct state invalid drop".
On Fri, Jul 5, 2024 at 5:35 PM Florian Westphal <fw@strlen.de> wrote: > > Michal Kubiak <michal.kubiak@intel.com> wrote: > > On Fri, Jul 05, 2024 at 10:50:56AM +0800, Chengen Du wrote: > > The ct may be dropped if a clash has been resolved but is still passed to > > > the tcf_ct_flow_table_process_conn function for further usage. This issue > > > can be fixed by retrieving ct from skb again after confirming conntrack. > > Right, ct can be stale after confirm. > > > > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c > > > index 2a96d9c1db65..6f41796115e3 100644 > > > --- a/net/sched/act_ct.c > > > +++ b/net/sched/act_ct.c > > > @@ -1077,6 +1077,14 @@ TC_INDIRECT_SCOPE int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, > > > */ > > > if (nf_conntrack_confirm(skb) != NF_ACCEPT) > > > goto drop; > > > + > > > + /* The ct may be dropped if a clash has been resolved, > > > + * so it's necessary to retrieve it from skb again to > > > + * prevent UAF. > > > + */ > > > + ct = nf_ct_get(skb, &ctinfo); > > > + if (!ct) > > > + goto drop; > > > > After taking a closer look at this change, I have a question: Why do we > > need to change an action returned by "nf_conntrack_confirm()" > > (NF_ACCEPT) and actually perform the flow for NF_DROP? > > > > From the commit message I understand that you only want to prevent > > calling "tcf_ct_flow_table_process_conn()". But for such reason we have > > a bool variable: "skip_add". > > Shouldn't we just set "skip_add" to true to prevent the UAF? > > Would the following example code make sense in this case? > > > > ct = nf_ct_get(skb, &ctinfo); > > if (!ct) > > skip_add = true; The fix is followed by the KASAN analysis. The ct is freed while resolving a clash in the __nf_ct_resolve_clash function, but it is still accessed in the tcf_ct_flow_table_process_conn function. If I understand correctly, the original logic still adds the ct to the flow table after resolving a clash once the skip_add is false. The chance of encountering a drop case is rare because the skb's ct is already substituted into the hashes one. However, if we still encounter a NULL ct, the situation is unusual and might warrant dropping it as a precaution. I am not an expert in this area and might have some misunderstandings. Please share your opinions if you have any concerns. > > It depends on what tc wants do to here. > > For netfilter, the skb is not dropped and continues passing > through the stack. Its up to user to decide what to do with it, > e.g. doing "ct state invalid drop".
On Sat, Jul 06, 2024 at 09:42:00AM +0800, Chengen Du wrote: [...] > > > > > > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c > > > > index 2a96d9c1db65..6f41796115e3 100644 > > > > --- a/net/sched/act_ct.c > > > > +++ b/net/sched/act_ct.c > > > > @@ -1077,6 +1077,14 @@ TC_INDIRECT_SCOPE int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, > > > > */ > > > > if (nf_conntrack_confirm(skb) != NF_ACCEPT) > > > > goto drop; > > > > + > > > > + /* The ct may be dropped if a clash has been resolved, > > > > + * so it's necessary to retrieve it from skb again to > > > > + * prevent UAF. > > > > + */ > > > > + ct = nf_ct_get(skb, &ctinfo); > > > > + if (!ct) > > > > + goto drop; > > > > > > After taking a closer look at this change, I have a question: Why do we > > > need to change an action returned by "nf_conntrack_confirm()" > > > (NF_ACCEPT) and actually perform the flow for NF_DROP? > > > > > > From the commit message I understand that you only want to prevent > > > calling "tcf_ct_flow_table_process_conn()". But for such reason we have > > > a bool variable: "skip_add". > > > Shouldn't we just set "skip_add" to true to prevent the UAF? > > > Would the following example code make sense in this case? > > > > > > ct = nf_ct_get(skb, &ctinfo); > > > if (!ct) > > > skip_add = true; > > The fix is followed by the KASAN analysis. The ct is freed while > resolving a clash in the __nf_ct_resolve_clash function, but it is > still accessed in the tcf_ct_flow_table_process_conn function. If I > understand correctly, the original logic still adds the ct to the flow > table after resolving a clash once the skip_add is false. The chance > of encountering a drop case is rare because the skb's ct is already > substituted into the hashes one. However, if we still encounter a NULL > ct, the situation is unusual and might warrant dropping it as a > precaution. I am not an expert in this area and might have some > misunderstandings. Please share your opinions if you have any > concerns. > I'm also not an expert in this part of code. I understand the scenario of UAF found by KASAN analysis. My only concern is that the patch changes the flow of the function: in case of NF_ACCEPT we will go to "drop" instead of performing a normal flow. For example, if "nf_conntrack_confirm()" returns NF_ACCEPT, (even after the clash resolving), I would not expect calling "goto drop". That is why I suggested a less invasive solution which is just blocking calling "tcf_ct_flow_table_process_conn()" where there is a risk of UAF. So, I asked if such solution would work in case of this function. Thanks, Michal > > > > It depends on what tc wants do to here. > > > > For netfilter, the skb is not dropped and continues passing > > through the stack. Its up to user to decide what to do with it, > > e.g. doing "ct state invalid drop".
On Mon, Jul 8, 2024 at 4:33 PM Michal Kubiak <michal.kubiak@intel.com> wrote: > > On Sat, Jul 06, 2024 at 09:42:00AM +0800, Chengen Du wrote: > > [...] > > > > > > > > > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c > > > > > index 2a96d9c1db65..6f41796115e3 100644 > > > > > --- a/net/sched/act_ct.c > > > > > +++ b/net/sched/act_ct.c > > > > > @@ -1077,6 +1077,14 @@ TC_INDIRECT_SCOPE int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, > > > > > */ > > > > > if (nf_conntrack_confirm(skb) != NF_ACCEPT) > > > > > goto drop; > > > > > + > > > > > + /* The ct may be dropped if a clash has been resolved, > > > > > + * so it's necessary to retrieve it from skb again to > > > > > + * prevent UAF. > > > > > + */ > > > > > + ct = nf_ct_get(skb, &ctinfo); > > > > > + if (!ct) > > > > > + goto drop; > > > > > > > > After taking a closer look at this change, I have a question: Why do we > > > > need to change an action returned by "nf_conntrack_confirm()" > > > > (NF_ACCEPT) and actually perform the flow for NF_DROP? > > > > > > > > From the commit message I understand that you only want to prevent > > > > calling "tcf_ct_flow_table_process_conn()". But for such reason we have > > > > a bool variable: "skip_add". > > > > Shouldn't we just set "skip_add" to true to prevent the UAF? > > > > Would the following example code make sense in this case? > > > > > > > > ct = nf_ct_get(skb, &ctinfo); > > > > if (!ct) > > > > skip_add = true; > > > > The fix is followed by the KASAN analysis. The ct is freed while > > resolving a clash in the __nf_ct_resolve_clash function, but it is > > still accessed in the tcf_ct_flow_table_process_conn function. If I > > understand correctly, the original logic still adds the ct to the flow > > table after resolving a clash once the skip_add is false. The chance > > of encountering a drop case is rare because the skb's ct is already > > substituted into the hashes one. However, if we still encounter a NULL > > ct, the situation is unusual and might warrant dropping it as a > > precaution. I am not an expert in this area and might have some > > misunderstandings. Please share your opinions if you have any > > concerns. > > > > I'm also not an expert in this part of code. I understand the scenario > of UAF found by KASAN analysis. > My only concern is that the patch changes the flow of the function: > in case of NF_ACCEPT we will go to "drop" instead of performing a normal > flow. > > For example, if "nf_conntrack_confirm()" returns NF_ACCEPT, (even after > the clash resolving), I would not expect calling "goto drop". > That is why I suggested a less invasive solution which is just blocking > calling "tcf_ct_flow_table_process_conn()" where there is a risk of UAF. > So, I asked if such solution would work in case of this function. Thank you for expressing your concerns in detail. In my humble opinion, skipping the addition of an entry in the flow table is controlled by other logic and may not be suitable to mix with error handling. If nf_conntrack_confirm returns NF_ACCEPT, I believe there is no reason for nf_ct_get to fail. The nf_ct_get function simply converts skb->_nfct into a struct nf_conn type. The only instance it might fail is when CONFIG_NF_CONNTRACK is disabled. The CONFIG_NET_ACT_CT depends on this configuration and determines whether act_ct.c needs to be compiled. Actually, the "goto drop" logic is included for completeness and might only be relevant if the memory is corrupted. Perhaps we could wrap the judgment with "unlikely" to emphasize this point? > > Thanks, > Michal > > > > > > > It depends on what tc wants do to here. > > > > > > For netfilter, the skb is not dropped and continues passing > > > through the stack. Its up to user to decide what to do with it, > > > e.g. doing "ct state invalid drop".
On Mon, 2024-07-08 at 17:39 +0800, Chengen Du wrote: > On Mon, Jul 8, 2024 at 4:33 PM Michal Kubiak <michal.kubiak@intel.com> wrote: > > For example, if "nf_conntrack_confirm()" returns NF_ACCEPT, (even after > > the clash resolving), I would not expect calling "goto drop". > > That is why I suggested a less invasive solution which is just blocking > > calling "tcf_ct_flow_table_process_conn()" where there is a risk of UAF. > > So, I asked if such solution would work in case of this function. > > Thank you for expressing your concerns in detail. > > In my humble opinion, skipping the addition of an entry in the flow > table is controlled by other logic and may not be suitable to mix with > error handling. If nf_conntrack_confirm returns NF_ACCEPT, I believe > there is no reason for nf_ct_get to fail. The nf_ct_get function > simply converts skb->_nfct into a struct nf_conn type. The only > instance it might fail is when CONFIG_NF_CONNTRACK is disabled. The > CONFIG_NET_ACT_CT depends on this configuration and determines whether > act_ct.c needs to be compiled. Actually, the "goto drop" logic is > included for completeness and might only be relevant if the memory is > corrupted. Perhaps we could wrap the judgment with "unlikely" to > emphasize this point? I agree with Michal, I think it should be better to just skip tcf_ct_flow_table_process_conn() in case of clash to avoid potential behavior changes. Thanks, Paolo
On Tue, Jul 9, 2024 at 6:40 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Mon, 2024-07-08 at 17:39 +0800, Chengen Du wrote: > > On Mon, Jul 8, 2024 at 4:33 PM Michal Kubiak <michal.kubiak@intel.com> wrote: > > > For example, if "nf_conntrack_confirm()" returns NF_ACCEPT, (even after > > > the clash resolving), I would not expect calling "goto drop". > > > That is why I suggested a less invasive solution which is just blocking > > > calling "tcf_ct_flow_table_process_conn()" where there is a risk of UAF. > > > So, I asked if such solution would work in case of this function. > > > > Thank you for expressing your concerns in detail. > > > > In my humble opinion, skipping the addition of an entry in the flow > > table is controlled by other logic and may not be suitable to mix with > > error handling. If nf_conntrack_confirm returns NF_ACCEPT, I believe > > there is no reason for nf_ct_get to fail. The nf_ct_get function > > simply converts skb->_nfct into a struct nf_conn type. The only > > instance it might fail is when CONFIG_NF_CONNTRACK is disabled. The > > CONFIG_NET_ACT_CT depends on this configuration and determines whether > > act_ct.c needs to be compiled. Actually, the "goto drop" logic is > > included for completeness and might only be relevant if the memory is > > corrupted. Perhaps we could wrap the judgment with "unlikely" to > > emphasize this point? > > I agree with Michal, I think it should be better to just skip > tcf_ct_flow_table_process_conn() in case of clash to avoid potential > behavior changes. Based on your suggestions, I took a deeper look at the code and found that the drop logic simply adds a count to qstats->drops. It did not work as I expected in terms of dropping the packet. I apologize for any confusion this may have caused in our discussion. I will send a v3 to modify the error handling. Thank you for your advice. > > Thanks, > > Paolo >
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 2a96d9c1db65..6f41796115e3 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1077,6 +1077,14 @@ TC_INDIRECT_SCOPE int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, */ if (nf_conntrack_confirm(skb) != NF_ACCEPT) goto drop; + + /* The ct may be dropped if a clash has been resolved, + * so it's necessary to retrieve it from skb again to + * prevent UAF. + */ + ct = nf_ct_get(skb, &ctinfo); + if (!ct) + goto drop; } if (!skip_add)