Message ID | 20230927130309.30891-1-xiaolinkui@126.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | netfilter: ipset: add ip_set lock to ip_set_test | expand |
On Wed, Sep 27, 2023 at 09:03:09PM +0800, xiaolinkui wrote: > From: Linkui Xiao <xiaolinkui@kylinos.cn> > > If the ip_set is not locked during ip_set_test, the following situations > may occur: > > CPU0 CPU1 > ip_rcv-> > ip_rcv_finish-> > ip_local_deliver-> > nf_hook_slow-> > iptable_filter_hook-> > ipt_do_table-> > set_match_v4-> > ip_set_test-> list_set_destroy-> > hash_net4_kadt-> set->data = NULL Hi, I'm having a bit of trouble analysing this. In particular, I'm concerned that in such a scenario set itself will be also freed, which seems likely to lead to problems. Can you provide a more complete call stack for CPU1 ? > h = set->data > .cidr = INIT_CIDR(h->nets[0].cidr[0], HOST_MASK) > > The set->data is empty, continuing to access set->data will result in a > kernel NULL pointer. The call trace is as follows: > > [2350616.024418] Call trace: > [2350616.024670] hash_net4_kadt+0x38/0x148 [ip_set_hash_net] > [2350616.025147] ip_set_test+0xbc/0x230 [ip_set] > [2350616.025549] set_match_v4+0xac/0xd0 [xt_set] > [2350616.025951] ipt_do_table+0x32c/0x678 [ip_tables] > [2350616.026391] iptable_filter_hook+0x30/0x40 [iptable_filter] > [2350616.026905] nf_hook_slow+0x50/0x100 > [2350616.027256] ip_local_deliver+0xd4/0xe8 > [2350616.027616] ip_rcv_finish+0x90/0xb0 > [2350616.027961] ip_rcv+0x50/0xb0 > [2350616.028261] __netif_receive_skb_one_core+0x58/0x68 > [2350616.028716] __netif_receive_skb+0x28/0x80 > [2350616.029098] netif_receive_skb_internal+0x3c/0xa8 > [2350616.029533] napi_gro_receive+0xf8/0x170 > [2350616.029898] receive_buf+0xec/0xa08 [virtio_net] > [2350616.030323] virtnet_poll+0x144/0x310 [virtio_net] > [2350616.030761] net_rx_action+0x158/0x3a0 > [2350616.031124] __do_softirq+0x11c/0x33c > [2350616.031470] irq_exit+0x11c/0x128 > [2350616.031793] __handle_domain_irq+0x6c/0xc0 > [2350616.032172] gic_handle_irq+0x6c/0x170 > [2350616.032528] el1_irq+0xb8/0x140 > [2350616.032835] arch_cpu_idle+0x38/0x1c0 > [2350616.033183] default_idle_call+0x24/0x58 > [2350616.033549] do_idle+0x1a4/0x268 > [2350616.033859] cpu_startup_entry+0x28/0x78 > [2350616.034234] secondary_start_kernel+0x17c/0x1c8 > > Signed-off-by: Linkui Xiao <xiaolinkui@kylinos.cn> > --- > net/netfilter/ipset/ip_set_core.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c > index 35d2f9c9ada0..46f4f47e29e4 100644 > --- a/net/netfilter/ipset/ip_set_core.c > +++ b/net/netfilter/ipset/ip_set_core.c > @@ -747,7 +747,9 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb, > !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) > return 0; > > + ip_set_lock(set); > ret = set->variant->kadt(set, skb, par, IPSET_TEST, opt); > + ip_set_unlock(set); > > if (ret == -EAGAIN) { > /* Type requests element to be completed */ > -- > 2.17.1 > >
Hi, On Mon, 2 Oct 2023, Simon Horman wrote: > On Wed, Sep 27, 2023 at 09:03:09PM +0800, xiaolinkui wrote: > > From: Linkui Xiao <xiaolinkui@kylinos.cn> > > > > If the ip_set is not locked during ip_set_test, the following situations > > may occur: > > > > CPU0 CPU1 > > ip_rcv-> > > ip_rcv_finish-> > > ip_local_deliver-> > > nf_hook_slow-> > > iptable_filter_hook-> > > ipt_do_table-> > > set_match_v4-> > > ip_set_test-> list_set_destroy-> > > hash_net4_kadt-> set->data = NULL > > I'm having a bit of trouble analysing this. > In particular, I'm concerned that in such a scenario set > itself will be also freed, which seems likely to lead to problems. > > Can you provide a more complete call stack for CPU1 ? ip_set_test() runs intentionally without holding a spinlock, it uses RCU. But I don't understand the scenario at all: CPU0: CPU1: hash_net4_kadt list_set_destroy so it's a hash:net type which works on a list of set type of sets only The list type of set can freely be destroyed (when not referenced), the destroy operation has no effect whatsoever on its possible hash:net type of member set. Moreover, kernel side add/del/test can only be performed when the set in question is referenced. Referenced sets cannot be deleted. So what is the scenario really in this case? Best regards, Jozsef > > h = set->data > > .cidr = INIT_CIDR(h->nets[0].cidr[0], HOST_MASK) > > > > The set->data is empty, continuing to access set->data will result in a > > kernel NULL pointer. The call trace is as follows: > > > > [2350616.024418] Call trace: > > [2350616.024670] hash_net4_kadt+0x38/0x148 [ip_set_hash_net] > > [2350616.025147] ip_set_test+0xbc/0x230 [ip_set] > > [2350616.025549] set_match_v4+0xac/0xd0 [xt_set] > > [2350616.025951] ipt_do_table+0x32c/0x678 [ip_tables] > > [2350616.026391] iptable_filter_hook+0x30/0x40 [iptable_filter] > > [2350616.026905] nf_hook_slow+0x50/0x100 > > [2350616.027256] ip_local_deliver+0xd4/0xe8 > > [2350616.027616] ip_rcv_finish+0x90/0xb0 > > [2350616.027961] ip_rcv+0x50/0xb0 > > [2350616.028261] __netif_receive_skb_one_core+0x58/0x68 > > [2350616.028716] __netif_receive_skb+0x28/0x80 > > [2350616.029098] netif_receive_skb_internal+0x3c/0xa8 > > [2350616.029533] napi_gro_receive+0xf8/0x170 > > [2350616.029898] receive_buf+0xec/0xa08 [virtio_net] > > [2350616.030323] virtnet_poll+0x144/0x310 [virtio_net] > > [2350616.030761] net_rx_action+0x158/0x3a0 > > [2350616.031124] __do_softirq+0x11c/0x33c > > [2350616.031470] irq_exit+0x11c/0x128 > > [2350616.031793] __handle_domain_irq+0x6c/0xc0 > > [2350616.032172] gic_handle_irq+0x6c/0x170 > > [2350616.032528] el1_irq+0xb8/0x140 > > [2350616.032835] arch_cpu_idle+0x38/0x1c0 > > [2350616.033183] default_idle_call+0x24/0x58 > > [2350616.033549] do_idle+0x1a4/0x268 > > [2350616.033859] cpu_startup_entry+0x28/0x78 > > [2350616.034234] secondary_start_kernel+0x17c/0x1c8 > > > > Signed-off-by: Linkui Xiao <xiaolinkui@kylinos.cn> > > --- > > net/netfilter/ipset/ip_set_core.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c > > index 35d2f9c9ada0..46f4f47e29e4 100644 > > --- a/net/netfilter/ipset/ip_set_core.c > > +++ b/net/netfilter/ipset/ip_set_core.c > > @@ -747,7 +747,9 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb, > > !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) > > return 0; > > > > + ip_set_lock(set); > > ret = set->variant->kadt(set, skb, par, IPSET_TEST, opt); > > + ip_set_unlock(set); > > > > if (ret == -EAGAIN) { > > /* Type requests element to be completed */ > > -- > > 2.17.1 > > > > > - E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics H-1525 Budapest 114, POB. 49, Hungary
Hi, On 10/3/23 03:06, Jozsef Kadlecsik wrote: > Hi, > > On Mon, 2 Oct 2023, Simon Horman wrote: > >> On Wed, Sep 27, 2023 at 09:03:09PM +0800, xiaolinkui wrote: >>> From: Linkui Xiao <xiaolinkui@kylinos.cn> >>> >>> If the ip_set is not locked during ip_set_test, the following situations >>> may occur: >>> >>> CPU0 CPU1 >>> ip_rcv-> >>> ip_rcv_finish-> >>> ip_local_deliver-> >>> nf_hook_slow-> >>> iptable_filter_hook-> >>> ipt_do_table-> >>> set_match_v4-> >>> ip_set_test-> list_set_destroy-> >>> hash_net4_kadt-> set->data = NULL >> I'm having a bit of trouble analysing this. >> In particular, I'm concerned that in such a scenario set >> itself will be also freed, which seems likely to lead to problems. >> >> Can you provide a more complete call stack for CPU1 ? > ip_set_test() runs intentionally without holding a spinlock, it uses RCU. > > But I don't understand the scenario at all: > > CPU0: CPU1: > hash_net4_kadt list_set_destroy > > so it's a hash:net type which works on a list > of set type of sets only > > The list type of set can freely be destroyed (when not referenced), the > destroy operation has no effect whatsoever on its possible hash:net type > of member set. > > Moreover, kernel side add/del/test can only be performed when the set in > question is referenced. Referenced sets cannot be deleted. > > So what is the scenario really in this case? The case I want to express should be like this: CPU0 CPU1 ip_set_test | (1) iptables -D -> set->ref -- | (2) ipset destroy -> set->data=NULL | hash_net4_kadt | hash_net4_test For the convenience of description, the definition is as follows: cmd(1): iptables -D cmd(2): ipset destroy When the ip_set test has already started in CPU0, but before it ends. For example, when CPU0 runs between ip_set_test and hash_net4_kadt, CPU1 executes cmd (1) and cmd (2). In addition, if CPU 0 runs between hash_net4_kadt and hash_net4_test, CPU1 executes cmd (1) and cmd (2). The following call trace will be triggered: crash> bt PID: 0 TASK: ffff8003a1cd2680 CPU: 7 COMMAND: "swapper/7" #0 [ffff8003fff3f460] crash_kexec at ffff0000081af828 #1 [ffff8003fff3f490] die at ffff00000808f754 #2 [ffff8003fff3f4d0] die_kernel_fault at ffff0000080aa9ac #3 [ffff8003fff3f500] __do_kernel_fault at ffff0000080aa67c #4 [ffff8003fff3f530] do_page_fault at ffff000008bfa66c #5 [ffff8003fff3f620] do_translation_fault at ffff000008bfab64 #6 [ffff8003fff3f650] do_mem_abort at ffff000008081284 #7 [ffff8003fff3f830] el1_ia at ffff00000808310c PC: ffff00000342225c [hash_net4_test+68] LR: ffff000003420200 [hash_net4_kadt+208] SP: ffff8003fff3f840 PSTATE: 60400005 X29: ffff8003fff3f840 X28: ffff8003a78ca600 X27: ffff8003fff3f908 X26: 0000000000000000 X25: ffff000000c70600 X24: ffff8003b8232400 X23: ffff000002f90fcc X22: 0000000000000000 X21: ffff8003fff3f9d0 X20: ffff8003fff3f910 X19: ffff8003fff3f9c8 X18: 0000000000000000 X17: 0000000000000000 X16: 0000000000000000 X15: 0000000000000000 X14: 970000002d494600 X13: 0000000000000000 X12: a40d15d8df825036 X11: ffff000000c70600 X10: ffff8003b2deb000 X9: 0000000000000001 X8: 0000000000000000 X7: 00000000637c7464 X6: ffff000003422218 X5: 00000000637c7464 X4: 0000000000000000 X3: ffff8003fff3f9d0 X2: ffff8003fff3f910 X1: ffff8003fff3f908 X0: 0000000000000020 #8 [ffff8003fff3f840] hash_net4_test at ffff000003422258 [ip_set_hash_net] #9 [ffff8003fff3f8d0] hash_net4_kadt at ffff0000034201fc [ip_set_hash_net] #10 [ffff8003fff3f940] ip_set_test at ffff000002c011b8 [ip_set] #11 [ffff8003fff3f990] set_match_v4 at ffff000002f90fc8 [xt_set] #12 [ffff8003fff3fa20] ipt_do_table at ffff000000c504e0 [ip_tables] #13 [ffff8003fff3fb60] iptable_filter_hook at ffff0000026e006c [iptable_filter] #14 [ffff8003fff3fb80] nf_hook_slow at ffff000008ac7a84 #15 [ffff8003fff3fbc0] ip_local_deliver at ffff000008ad5d88 #16 [ffff8003fff3fc10] ip_rcv_finish at ffff000008ad59b4 #17 [ffff8003fff3fc40] ip_rcv at ffff000008ad5dec #18 [ffff8003fff3fca0] __netif_receive_skb_one_core at ffff000008a6c344 #19 [ffff8003fff3fce0] __netif_receive_skb at ffff000008a6c3ac #20 [ffff8003fff3fd00] netif_receive_skb_internal at ffff000008a6c440 #21 [ffff8003fff3fd30] napi_gro_receive at ffff000008a6d3ec #22 [ffff8003fff3fd60] receive_buf at ffff000001c734d8 [virtio_net] #23 [ffff8003fff3fe20] virtnet_poll at ffff000001c753e8 [virtio_net] #24 [ffff8003fff3fec0] net_rx_action at ffff000008a6c9ec #25 [ffff8003fff3ff60] __softirqentry_text_start at ffff0000080819f0 #26 [ffff8003fff3fff0] irq_exit at ffff0000080f1228 #27 [ffff8003fff40010] __handle_domain_irq at ffff000008162a10 Of course, the ip_set_test execution cycle is very short. During this period, another CPU needs to complete the cmd1 and cmd2 operations on the ip_set, the probability of triggering this problem will be very low. This problem has also occurred in Red Hat: https://access.redhat.com/solutions/6839381 But I think the solution mentioned in the link is not applicable. Commit c120959387ef(netfilter: fix a use-after-free in mtype_destroy()) applies to ip_set_bitmap instead of ip_set_hash. Best regards, Linkui > > Best regards, > Jozsef > >>> h = set->data >>> .cidr = INIT_CIDR(h->nets[0].cidr[0], HOST_MASK) >>> >>> The set->data is empty, continuing to access set->data will result in a >>> kernel NULL pointer. The call trace is as follows: >>> >>> [2350616.024418] Call trace: >>> [2350616.024670] hash_net4_kadt+0x38/0x148 [ip_set_hash_net] >>> [2350616.025147] ip_set_test+0xbc/0x230 [ip_set] >>> [2350616.025549] set_match_v4+0xac/0xd0 [xt_set] >>> [2350616.025951] ipt_do_table+0x32c/0x678 [ip_tables] >>> [2350616.026391] iptable_filter_hook+0x30/0x40 [iptable_filter] >>> [2350616.026905] nf_hook_slow+0x50/0x100 >>> [2350616.027256] ip_local_deliver+0xd4/0xe8 >>> [2350616.027616] ip_rcv_finish+0x90/0xb0 >>> [2350616.027961] ip_rcv+0x50/0xb0 >>> [2350616.028261] __netif_receive_skb_one_core+0x58/0x68 >>> [2350616.028716] __netif_receive_skb+0x28/0x80 >>> [2350616.029098] netif_receive_skb_internal+0x3c/0xa8 >>> [2350616.029533] napi_gro_receive+0xf8/0x170 >>> [2350616.029898] receive_buf+0xec/0xa08 [virtio_net] >>> [2350616.030323] virtnet_poll+0x144/0x310 [virtio_net] >>> [2350616.030761] net_rx_action+0x158/0x3a0 >>> [2350616.031124] __do_softirq+0x11c/0x33c >>> [2350616.031470] irq_exit+0x11c/0x128 >>> [2350616.031793] __handle_domain_irq+0x6c/0xc0 >>> [2350616.032172] gic_handle_irq+0x6c/0x170 >>> [2350616.032528] el1_irq+0xb8/0x140 >>> [2350616.032835] arch_cpu_idle+0x38/0x1c0 >>> [2350616.033183] default_idle_call+0x24/0x58 >>> [2350616.033549] do_idle+0x1a4/0x268 >>> [2350616.033859] cpu_startup_entry+0x28/0x78 >>> [2350616.034234] secondary_start_kernel+0x17c/0x1c8 >>> >>> Signed-off-by: Linkui Xiao <xiaolinkui@kylinos.cn> >>> --- >>> net/netfilter/ipset/ip_set_core.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c >>> index 35d2f9c9ada0..46f4f47e29e4 100644 >>> --- a/net/netfilter/ipset/ip_set_core.c >>> +++ b/net/netfilter/ipset/ip_set_core.c >>> @@ -747,7 +747,9 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb, >>> !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) >>> return 0; >>> >>> + ip_set_lock(set); >>> ret = set->variant->kadt(set, skb, par, IPSET_TEST, opt); >>> + ip_set_unlock(set); >>> >>> if (ret == -EAGAIN) { >>> /* Type requests element to be completed */ >>> -- >>> 2.17.1 >>> >>> > - > E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt > Address : Wigner Research Centre for Physics > H-1525 Budapest 114, POB. 49, Hungary
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index 35d2f9c9ada0..46f4f47e29e4 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -747,7 +747,9 @@ ip_set_test(ip_set_id_t index, const struct sk_buff *skb, !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) return 0; + ip_set_lock(set); ret = set->variant->kadt(set, skb, par, IPSET_TEST, opt); + ip_set_unlock(set); if (ret == -EAGAIN) { /* Type requests element to be completed */