Message ID | 20231005193548.515-1-andrew.kanner@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Commit | 90aeaa99f53eff726e499e4093b2f3a169ec8a88 |
Delegated to: | BPF |
Headers | show |
Series | [bpf,v3] net/xdp: fix zero-size allocation warning in xskq_create() | expand |
Hello: This patch was applied to bpf/bpf.git (master) by Martin KaFai Lau <martin.lau@kernel.org>: On Thu, 5 Oct 2023 22:35:49 +0300 you wrote: > Syzkaller reported the following issue: > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 2807 at mm/vmalloc.c:3247 __vmalloc_node_range (mm/vmalloc.c:3361) > Modules linked in: > CPU: 0 PID: 2807 Comm: repro Not tainted 6.6.0-rc2+ #12 > Hardware name: Generic DT based system > unwind_backtrace from show_stack (arch/arm/kernel/traps.c:258) > show_stack from dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) > dump_stack_lvl from __warn (kernel/panic.c:633 kernel/panic.c:680) > __warn from warn_slowpath_fmt (./include/linux/context_tracking.h:153 kernel/panic.c:700) > warn_slowpath_fmt from __vmalloc_node_range (mm/vmalloc.c:3361 (discriminator 3)) > __vmalloc_node_range from vmalloc_user (mm/vmalloc.c:3478) > vmalloc_user from xskq_create (net/xdp/xsk_queue.c:40) > xskq_create from xsk_setsockopt (net/xdp/xsk.c:953 net/xdp/xsk.c:1286) > xsk_setsockopt from __sys_setsockopt (net/socket.c:2308) > __sys_setsockopt from ret_fast_syscall (arch/arm/kernel/entry-common.S:68) > > [...] Here is the summary with links: - [bpf,v3] net/xdp: fix zero-size allocation warning in xskq_create() https://git.kernel.org/bpf/bpf/c/90aeaa99f53e You are awesome, thank you!
On 10/5/23 12:35 PM, Andrew Kanner wrote: > Syzkaller reported the following issue: > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 2807 at mm/vmalloc.c:3247 __vmalloc_node_range (mm/vmalloc.c:3361) > Modules linked in: > CPU: 0 PID: 2807 Comm: repro Not tainted 6.6.0-rc2+ #12 > Hardware name: Generic DT based system > unwind_backtrace from show_stack (arch/arm/kernel/traps.c:258) > show_stack from dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) > dump_stack_lvl from __warn (kernel/panic.c:633 kernel/panic.c:680) > __warn from warn_slowpath_fmt (./include/linux/context_tracking.h:153 kernel/panic.c:700) > warn_slowpath_fmt from __vmalloc_node_range (mm/vmalloc.c:3361 (discriminator 3)) > __vmalloc_node_range from vmalloc_user (mm/vmalloc.c:3478) > vmalloc_user from xskq_create (net/xdp/xsk_queue.c:40) > xskq_create from xsk_setsockopt (net/xdp/xsk.c:953 net/xdp/xsk.c:1286) > xsk_setsockopt from __sys_setsockopt (net/socket.c:2308) > __sys_setsockopt from ret_fast_syscall (arch/arm/kernel/entry-common.S:68) > > xskq_get_ring_size() uses struct_size() macro to safely calculate the > size of struct xsk_queue and q->nentries of desc members. But the > syzkaller repro was able to set q->nentries with the value initially > taken from copy_from_sockptr() high enough to return SIZE_MAX by > struct_size(). The next PAGE_ALIGN(size) is such case will overflow > the size_t value and set it to 0. This will trigger WARN_ON_ONCE in Please ignore the pw-bot email. A question just came to my mind after applying. > diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c > index f8905400ee07..c7e8bbb12752 100644 > --- a/net/xdp/xsk_queue.c > +++ b/net/xdp/xsk_queue.c > @@ -34,6 +34,11 @@ struct xsk_queue *xskq_create(u32 nentries, bool umem_queue) > q->ring_mask = nentries - 1; > > size = xskq_get_ring_size(q, umem_queue); > + if (unlikely(size == SIZE_MAX)) { What if "size" is SIZE_MAX-1? Would it still overflow the PAGE_ALIGN below? > + kfree(q); > + return NULL; > + } > + > size = PAGE_ALIGN(size); > > q->ring = vmalloc_user(size);
On Thu, Oct 05, 2023 at 06:00:46PM -0700, Martin KaFai Lau wrote: [...] > > diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c > > index f8905400ee07..c7e8bbb12752 100644 > > --- a/net/xdp/xsk_queue.c > > +++ b/net/xdp/xsk_queue.c > > @@ -34,6 +34,11 @@ struct xsk_queue *xskq_create(u32 nentries, bool umem_queue) > > q->ring_mask = nentries - 1; > > size = xskq_get_ring_size(q, umem_queue); > > + if (unlikely(size == SIZE_MAX)) { > > What if "size" is SIZE_MAX-1? Would it still overflow the PAGE_ALIGN below? > > > + kfree(q); > > + return NULL; > > + } > > + > > size = PAGE_ALIGN(size); > > q->ring = vmalloc_user(size); > I asked myself the same question before v1. E.g. thinking about the check: (size > SIZE_MAX - PAGE_SIZE + 1) But xskq_create() is called after the check for !is_power_of_2(entries) in xsk_init_queue(). So I tried the same reproducer and divided the (nentries) value by 2 in a loop - it hits either SIZE_MAX case or the normal cases without overflow (sometimes throwing vmalloc error complaining about size which exceed total pages in my arm setup). So I can't see a way size will be SIZE_MAX-1, etc. Correct me if I'm wrong, please. PS: In the output below the first 2 values of (nentries) hit SIZE_MAX case, the rest hit the normal case, vmalloc_user() is complaining about 1 allocation: 0x20000000 0x10000000 0x8000000 [ 41.759195][ T2807] pre PAGE_ALIGN size = 2147483968 (0x80000140), PAGE_SIZE = 4096 (0x1000) [ 41.759621][ T2807] repro-iter: vmalloc error: size 2147487744, exceeds total pages, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 [...] 0x4000000 0x2000000 0x1000000 0x800000 0x400000 0x200000 0x100000 0x80000 0x40000 0x20000 0x10000 0x8000 0x4000 0x2000 0x1000 0x800 0x400 0x200 0x100 0x80 0x40 0x20 0x10 0x8 0x4 0x2
On 10/6/23 12:09 AM, Andrew Kanner wrote: > On Thu, Oct 05, 2023 at 06:00:46PM -0700, Martin KaFai Lau wrote: > [...] >>> diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c >>> index f8905400ee07..c7e8bbb12752 100644 >>> --- a/net/xdp/xsk_queue.c >>> +++ b/net/xdp/xsk_queue.c >>> @@ -34,6 +34,11 @@ struct xsk_queue *xskq_create(u32 nentries, bool umem_queue) >>> q->ring_mask = nentries - 1; >>> size = xskq_get_ring_size(q, umem_queue); >>> + if (unlikely(size == SIZE_MAX)) { >> >> What if "size" is SIZE_MAX-1? Would it still overflow the PAGE_ALIGN below? >> >>> + kfree(q); >>> + return NULL; >>> + } >>> + >>> size = PAGE_ALIGN(size); >>> q->ring = vmalloc_user(size); >> > > I asked myself the same question before v1. E.g. thinking about the > check: (size > SIZE_MAX - PAGE_SIZE + 1) > > But xskq_create() is called after the check for > !is_power_of_2(entries) in xsk_init_queue(). So I tried the same > reproducer and divided the (nentries) value by 2 in a loop - it hits > either SIZE_MAX case or the normal cases without overflow (sometimes > throwing vmalloc error complaining about size which exceed total pages > in my arm setup). > > So I can't see a way size will be SIZE_MAX-1, etc. Correct me if I'm > wrong, please. > > PS: In the output below the first 2 values of (nentries) hit SIZE_MAX Thanks for the explanation, so iiuc it means it will overflow the struct_size() first because of the is_power_of_2(nentries) requirement? Could you help adding some comment to explain? Thanks. > case, the rest hit the normal case, vmalloc_user() is complaining > about 1 allocation: > > 0x20000000 > 0x10000000 > 0x8000000 > [ 41.759195][ T2807] pre PAGE_ALIGN size = 2147483968 (0x80000140), PAGE_SIZE = 4096 (0x1000) > [ 41.759621][ T2807] repro-iter: vmalloc error: size 2147487744, exceeds total pages, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 > [...] > 0x4000000 > 0x2000000 > 0x1000000 > 0x800000 > 0x400000 > 0x200000 > 0x100000 > 0x80000 > 0x40000 > 0x20000 > 0x10000 > 0x8000 > 0x4000 > 0x2000 > 0x1000 > 0x800 > 0x400 > 0x200 > 0x100 > 0x80 > 0x40 > 0x20 > 0x10 > 0x8 > 0x4 > 0x2 >
On Fri, Oct 06, 2023 at 10:37:44AM -0700, Martin KaFai Lau wrote: [...] > > > What if "size" is SIZE_MAX-1? Would it still overflow the PAGE_ALIGN below? > > > > > > > + kfree(q); > > > > + return NULL; > > > > + } > > > > + > > > > size = PAGE_ALIGN(size); > > > > q->ring = vmalloc_user(size); > > > > > > > I asked myself the same question before v1. E.g. thinking about the > > check: (size > SIZE_MAX - PAGE_SIZE + 1) > > > > But xskq_create() is called after the check for > > !is_power_of_2(entries) in xsk_init_queue(). So I tried the same > > reproducer and divided the (nentries) value by 2 in a loop - it hits > > either SIZE_MAX case or the normal cases without overflow (sometimes > > throwing vmalloc error complaining about size which exceed total pages > > in my arm setup). > > > > So I can't see a way size will be SIZE_MAX-1, etc. Correct me if I'm > > wrong, please. > > > > PS: In the output below the first 2 values of (nentries) hit SIZE_MAX > > Thanks for the explanation, so iiuc it means it will overflow the > struct_size() first because of the is_power_of_2(nentries) requirement? > Could you help adding some comment to explain? Thanks. > The overflow happens because there's no upper limit for nentries (userspace input). Let me add more context, e.g. from net/xdp/xsk.c: static int xsk_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen) { [...] if (copy_from_sockptr(&entries, optval, sizeof(entries))) return -EFAULT; [...] err = xsk_init_queue(entries, q, false); [...] } 'entries' is passed to xsk_init_queue() and there're 2 checks: for 0 and is_power_of_2() only, no upper bound check: static int xsk_init_queue(u32 entries, struct xsk_queue **queue, bool umem_queue) { struct xsk_queue *q; if (entries == 0 || *queue || !is_power_of_2(entries)) return -EINVAL; q = xskq_create(entries, umem_queue); if (!q) return -ENOMEM; [...] } The 'entries' value is next passed to struct_size() in net/xdp/xsk_queue.c. If it's large enough - SIZE_MAX will be returned. I'm not sure if some appropriate limit for the size of XDP_RX_RING / XDP_TX_RING and XDP_UMEM_FILL_RING / XDP_UMEM_COMPLETION_RING rings should be used. But anyway, vmalloc() will tell if it's not ok with the requested allocation size.
On 10/6/23 4:24 PM, Andrew Kanner wrote: >> Thanks for the explanation, so iiuc it means it will overflow the >> struct_size() first because of the is_power_of_2(nentries) requirement? >> Could you help adding some comment to explain? Thanks. >> > The overflow happens because there's no upper limit for nentries > (userspace input). Let me add more context, e.g. from net/xdp/xsk.c: > > static int xsk_setsockopt(struct socket *sock, int level, int optname, > sockptr_t optval, unsigned int optlen) > { > [...] > if (copy_from_sockptr(&entries, optval, sizeof(entries))) > return -EFAULT; > [...] > err = xsk_init_queue(entries, q, false); > [...] > } > > 'entries' is passed to xsk_init_queue() and there're 2 checks: for 0 > and is_power_of_2() only, no upper bound check: > > static int xsk_init_queue(u32 entries, struct xsk_queue **queue, > bool umem_queue) > { > struct xsk_queue *q; > > if (entries == 0 || *queue || !is_power_of_2(entries)) > return -EINVAL; > > q = xskq_create(entries, umem_queue); > if (!q) > return -ENOMEM; > [...] > } > > The 'entries' value is next passed to struct_size() in > net/xdp/xsk_queue.c. If it's large enough - SIZE_MAX will be returned. All make sense. I was mostly asking to add a comment at the "if (unlikely(size == SIZE_MAX)" check to explain this details on why checking SIZE_MAX is enough.
On Fri, Oct 06, 2023 at 04:58:18PM -0700, Martin KaFai Lau wrote: > On 10/6/23 4:24 PM, Andrew Kanner wrote: > > > Thanks for the explanation, so iiuc it means it will overflow the > > > struct_size() first because of the is_power_of_2(nentries) requirement? > > > Could you help adding some comment to explain? Thanks. > > > > > The overflow happens because there's no upper limit for nentries > > (userspace input). Let me add more context, e.g. from net/xdp/xsk.c: > > > > static int xsk_setsockopt(struct socket *sock, int level, int optname, > > sockptr_t optval, unsigned int optlen) > > { > > [...] > > if (copy_from_sockptr(&entries, optval, sizeof(entries))) > > return -EFAULT; > > [...] > > err = xsk_init_queue(entries, q, false); > > [...] > > } > > > > 'entries' is passed to xsk_init_queue() and there're 2 checks: for 0 > > and is_power_of_2() only, no upper bound check: > > > > static int xsk_init_queue(u32 entries, struct xsk_queue **queue, > > bool umem_queue) > > { > > struct xsk_queue *q; > > > > if (entries == 0 || *queue || !is_power_of_2(entries)) > > return -EINVAL; > > > > q = xskq_create(entries, umem_queue); > > if (!q) > > return -ENOMEM; > > [...] > > } > > > > The 'entries' value is next passed to struct_size() in > > net/xdp/xsk_queue.c. If it's large enough - SIZE_MAX will be returned. > > All make sense. I was mostly asking to add a comment at the "if > (unlikely(size == SIZE_MAX)" check to explain this details on why checking > SIZE_MAX is enough. Ok, I got it. Will add in v4. Thanks.
diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c index f8905400ee07..c7e8bbb12752 100644 --- a/net/xdp/xsk_queue.c +++ b/net/xdp/xsk_queue.c @@ -34,6 +34,11 @@ struct xsk_queue *xskq_create(u32 nentries, bool umem_queue) q->ring_mask = nentries - 1; size = xskq_get_ring_size(q, umem_queue); + if (unlikely(size == SIZE_MAX)) { + kfree(q); + return NULL; + } + size = PAGE_ALIGN(size); q->ring = vmalloc_user(size);
Syzkaller reported the following issue: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 2807 at mm/vmalloc.c:3247 __vmalloc_node_range (mm/vmalloc.c:3361) Modules linked in: CPU: 0 PID: 2807 Comm: repro Not tainted 6.6.0-rc2+ #12 Hardware name: Generic DT based system unwind_backtrace from show_stack (arch/arm/kernel/traps.c:258) show_stack from dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) dump_stack_lvl from __warn (kernel/panic.c:633 kernel/panic.c:680) __warn from warn_slowpath_fmt (./include/linux/context_tracking.h:153 kernel/panic.c:700) warn_slowpath_fmt from __vmalloc_node_range (mm/vmalloc.c:3361 (discriminator 3)) __vmalloc_node_range from vmalloc_user (mm/vmalloc.c:3478) vmalloc_user from xskq_create (net/xdp/xsk_queue.c:40) xskq_create from xsk_setsockopt (net/xdp/xsk.c:953 net/xdp/xsk.c:1286) xsk_setsockopt from __sys_setsockopt (net/socket.c:2308) __sys_setsockopt from ret_fast_syscall (arch/arm/kernel/entry-common.S:68) xskq_get_ring_size() uses struct_size() macro to safely calculate the size of struct xsk_queue and q->nentries of desc members. But the syzkaller repro was able to set q->nentries with the value initially taken from copy_from_sockptr() high enough to return SIZE_MAX by struct_size(). The next PAGE_ALIGN(size) is such case will overflow the size_t value and set it to 0. This will trigger WARN_ON_ONCE in vmalloc_user() -> __vmalloc_node_range(). The issue is reproducible on 32-bit arm kernel. Reported-and-tested-by: syzbot+fae676d3cf469331fc89@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000c84b4705fb31741e@google.com/T/ Link: https://syzkaller.appspot.com/bug?extid=fae676d3cf469331fc89 Reported-by: syzbot+b132693e925cbbd89e26@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000e20df20606ebab4f@google.com/T/ Fixes: 9f78bf330a66 ("xsk: support use vaddr as ring") Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> --- Notes (akanner): v3: - free kzalloc-ed memory before return, the leak was noticed by Daniel Borkmann <daniel@iogearbox.net> v2: https://lore.kernel.org/all/20231002222939.1519-1-andrew.kanner@gmail.com/raw - use unlikely() optimization for the case with SIZE_MAX return from struct_size(), suggested by Alexander Lobakin <aleksander.lobakin@intel.com> - cc-ed 4 more maintainers, mentioned by cc_maintainers patchwork test v1: https://lore.kernel.org/all/20230928204440.543-1-andrew.kanner@gmail.com/T/ - RFC notes: It was found that net/xdp/xsk.c:xsk_setsockopt() uses copy_from_sockptr() to get the number of entries (int) for cases with XDP_RX_RING / XDP_TX_RING and XDP_UMEM_FILL_RING / XDP_UMEM_COMPLETION_RING. Next in xsk_init_queue() there're 2 sanity checks (entries == 0) and (!is_power_of_2(entries)) for which -EINVAL will be returned. After that net/xdp/xsk_queue.c:xskq_create() will calculate the size multipling the number of entries (int) with the size of u64, at least. I wonder if there should be the upper bound (e.g. the 3rd sanity check inside xsk_init_queue()). It seems that without the upper limit it's quiet easy to overflow the allocated size (SIZE_MAX), especially for 32-bit architectures, for example arm nodes which were used by the syzkaller. In this patch I added a naive check for SIZE_MAX which helped to skip zero-size allocation after overflow, but maybe it's not quite right. Please, suggest if you have any thoughts about the appropriate limit for the size of these xdp rings. PS: the initial number of entries is 0x20000000 in syzkaller repro: syscall(__NR_setsockopt, (intptr_t)r[0], 0x11b, 3, 0x20000040, 0x20); Link: https://syzkaller.appspot.com/text?tag=ReproC&x=10910f18280000 net/xdp/xsk_queue.c | 5 +++++ 1 file changed, 5 insertions(+)