Message ID | 20221111092642.2333724-3-houtao@huaweicloud.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | libbpf: Fixes for ring buffer | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for bpf |
netdev/fixes_present | success | Fixes tag present in non-next series |
netdev/subject_prefix | success | Link |
netdev/cover_letter | success | Series has a cover letter |
netdev/patch_count | success | Link |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/cc_maintainers | success | CCed 12 of 12 maintainers |
netdev/build_clang | success | Errors and warnings before: 0 this patch: 0 |
netdev/module_param | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/check_selftest | success | No net selftest shell script |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 23 lines checked |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
bpf/vmtest-bpf-VM_Test-3 | fail | Logs for build for aarch64 with gcc |
bpf/vmtest-bpf-VM_Test-4 | fail | Logs for build for aarch64 with llvm-16 |
bpf/vmtest-bpf-VM_Test-5 | success | Logs for build for s390x with gcc |
bpf/vmtest-bpf-VM_Test-6 | success | Logs for build for x86_64 with gcc |
bpf/vmtest-bpf-VM_Test-7 | success | Logs for build for x86_64 with llvm-16 |
bpf/vmtest-bpf-VM_Test-8 | success | Logs for llvm-toolchain |
bpf/vmtest-bpf-VM_Test-9 | success | Logs for set-matrix |
bpf/vmtest-bpf-PR | success | PR summary |
bpf/vmtest-bpf-VM_Test-2 | success | Logs for llvm-toolchain |
bpf/vmtest-bpf-VM_Test-1 | success | Logs for ShellCheck |
On 11/11, Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries > will overflow u32 when mapping producer page and data pages. Only > casting max_entries to size_t is not enough, because for 32-bits > application on 64-bits kernel the size of read-only mmap region > also could overflow size_t. > So fixing it by casting the size of read-only mmap region into a __u64 > and checking whether or not there will be overflow during mmap. > Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support") > Signed-off-by: Hou Tao <houtao1@huawei.com> > --- > tools/lib/bpf/ringbuf.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c > index d285171d4b69..c4bdc88af672 100644 > --- a/tools/lib/bpf/ringbuf.c > +++ b/tools/lib/bpf/ringbuf.c > @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd, > __u32 len = sizeof(info); > struct epoll_event *e; > struct ring *r; > + __u64 ro_size; > void *tmp; > int err; > @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int > map_fd, > * data size to allow simple reading of samples that wrap around the > * end of a ring buffer. See kernel implementation for details. > * */ > - tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ, > - MAP_SHARED, map_fd, rb->page_size); > + ro_size = rb->page_size + 2 * (__u64)info.max_entries; [..] > + if (ro_size != (__u64)(size_t)ro_size) { > + pr_warn("ringbuf: ring buffer size (%u) is too big\n", > + info.max_entries); > + return libbpf_err(-E2BIG); > + } Why do we need this check at all? IIUC, the problem is that the expression "rb->page_size + 2 * info.max_entries" is evaluated as u32 and can overflow. So why doing this part only isn't enough? size_t mmap_size = rb->page_size + 2 * (size_t)info.max_entries; mmap(NULL, mmap_size, PROT_READ, MAP_SHARED, map_fd, ...); sizeof(size_t) should be 8, so no overflow is possible? > + tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd, > + rb->page_size); > if (tmp == MAP_FAILED) { > err = -errno; > ringbuf_unmap_ring(rb, r); > -- > 2.29.2
On Fri, Nov 11, 2022 at 9:54 AM <sdf@google.com> wrote: > > On 11/11, Hou Tao wrote: > > From: Hou Tao <houtao1@huawei.com> > > > The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries > > will overflow u32 when mapping producer page and data pages. Only > > casting max_entries to size_t is not enough, because for 32-bits > > application on 64-bits kernel the size of read-only mmap region > > also could overflow size_t. > > > So fixing it by casting the size of read-only mmap region into a __u64 > > and checking whether or not there will be overflow during mmap. > > > Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support") > > Signed-off-by: Hou Tao <houtao1@huawei.com> > > --- > > tools/lib/bpf/ringbuf.c | 11 +++++++++-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c > > index d285171d4b69..c4bdc88af672 100644 > > --- a/tools/lib/bpf/ringbuf.c > > +++ b/tools/lib/bpf/ringbuf.c > > @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd, > > __u32 len = sizeof(info); > > struct epoll_event *e; > > struct ring *r; > > + __u64 ro_size; I found ro_size quite a confusing name, let's call it mmap_sz? > > void *tmp; > > int err; > > > @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int > > map_fd, > > * data size to allow simple reading of samples that wrap around the > > * end of a ring buffer. See kernel implementation for details. > > * */ > > - tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ, > > - MAP_SHARED, map_fd, rb->page_size); > > + ro_size = rb->page_size + 2 * (__u64)info.max_entries; > > [..] > > > + if (ro_size != (__u64)(size_t)ro_size) { > > + pr_warn("ringbuf: ring buffer size (%u) is too big\n", > > + info.max_entries); > > + return libbpf_err(-E2BIG); > > + } > > Why do we need this check at all? IIUC, the problem is that the expression > "rb->page_size + 2 * info.max_entries" is evaluated as u32 and can > overflow. So why doing this part only isn't enough? > > size_t mmap_size = rb->page_size + 2 * (size_t)info.max_entries; > mmap(NULL, mmap_size, PROT_READ, MAP_SHARED, map_fd, ...); > > sizeof(size_t) should be 8, so no overflow is possible? not on 32-bit arches, presumably? > > > > + tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd, > > + rb->page_size); should we split this mmap into two mmaps -- one for producer_pos page, another for data area. That will presumably allow to mmap ringbuf with max_entries = 1GB? > > if (tmp == MAP_FAILED) { > > err = -errno; > > ringbuf_unmap_ring(rb, r); > > -- > > 2.29.2 >
On Fri, Nov 11, 2022 at 12:56 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Fri, Nov 11, 2022 at 9:54 AM <sdf@google.com> wrote: > > > > On 11/11, Hou Tao wrote: > > > From: Hou Tao <houtao1@huawei.com> > > > > > The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries > > > will overflow u32 when mapping producer page and data pages. Only > > > casting max_entries to size_t is not enough, because for 32-bits > > > application on 64-bits kernel the size of read-only mmap region > > > also could overflow size_t. > > > > > So fixing it by casting the size of read-only mmap region into a __u64 > > > and checking whether or not there will be overflow during mmap. > > > > > Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support") > > > Signed-off-by: Hou Tao <houtao1@huawei.com> > > > --- > > > tools/lib/bpf/ringbuf.c | 11 +++++++++-- > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c > > > index d285171d4b69..c4bdc88af672 100644 > > > --- a/tools/lib/bpf/ringbuf.c > > > +++ b/tools/lib/bpf/ringbuf.c > > > @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd, > > > __u32 len = sizeof(info); > > > struct epoll_event *e; > > > struct ring *r; > > > + __u64 ro_size; > > I found ro_size quite a confusing name, let's call it mmap_sz? > > > > void *tmp; > > > int err; > > > > > @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int > > > map_fd, > > > * data size to allow simple reading of samples that wrap around the > > > * end of a ring buffer. See kernel implementation for details. > > > * */ > > > - tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ, > > > - MAP_SHARED, map_fd, rb->page_size); > > > + ro_size = rb->page_size + 2 * (__u64)info.max_entries; > > > > [..] > > > > > + if (ro_size != (__u64)(size_t)ro_size) { > > > + pr_warn("ringbuf: ring buffer size (%u) is too big\n", > > > + info.max_entries); > > > + return libbpf_err(-E2BIG); > > > + } > > > > Why do we need this check at all? IIUC, the problem is that the expression > > "rb->page_size + 2 * info.max_entries" is evaluated as u32 and can > > overflow. So why doing this part only isn't enough? > > > > size_t mmap_size = rb->page_size + 2 * (size_t)info.max_entries; > > mmap(NULL, mmap_size, PROT_READ, MAP_SHARED, map_fd, ...); > > > > sizeof(size_t) should be 8, so no overflow is possible? > > not on 32-bit arches, presumably? Good point, he even mentions it in the description, I can't read apparently :-/ "Only casting max_entries to size_t is not enough" > > > > > > > > > + tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd, > > > + rb->page_size); > > should we split this mmap into two mmaps -- one for producer_pos page, > another for data area. That will presumably allow to mmap ringbuf with > max_entries = 1GB? > > > > if (tmp == MAP_FAILED) { > > > err = -errno; > > > ringbuf_unmap_ring(rb, r); > > > -- > > > 2.29.2 > >
Hi, On 11/12/2022 4:56 AM, Andrii Nakryiko wrote: > On Fri, Nov 11, 2022 at 9:54 AM <sdf@google.com> wrote: >> On 11/11, Hou Tao wrote: >>> From: Hou Tao <houtao1@huawei.com> >>> The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries >>> will overflow u32 when mapping producer page and data pages. Only >>> casting max_entries to size_t is not enough, because for 32-bits >>> application on 64-bits kernel the size of read-only mmap region >>> also could overflow size_t. >>> Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support") >>> Signed-off-by: Hou Tao <houtao1@huawei.com> >>> --- >>> tools/lib/bpf/ringbuf.c | 11 +++++++++-- >>> 1 file changed, 9 insertions(+), 2 deletions(-) >>> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c >>> index d285171d4b69..c4bdc88af672 100644 >>> --- a/tools/lib/bpf/ringbuf.c >>> +++ b/tools/lib/bpf/ringbuf.c >>> @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd, >>> __u32 len = sizeof(info); >>> struct epoll_event *e; >>> struct ring *r; >>> + __u64 ro_size; > I found ro_size quite a confusing name, let's call it mmap_sz? OK. > >>> void *tmp; >>> int err; >>> @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int >>> map_fd, >>> * data size to allow simple reading of samples that wrap around the >>> * end of a ring buffer. See kernel implementation for details. >>> * */ >>> - tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ, >>> - MAP_SHARED, map_fd, rb->page_size); >>> + ro_size = rb->page_size + 2 * (__u64)info.max_entries; >> [..] >> >>> + if (ro_size != (__u64)(size_t)ro_size) { >>> + pr_warn("ringbuf: ring buffer size (%u) is too big\n", >>> + info.max_entries); >>> + return libbpf_err(-E2BIG); >>> + } >> Why do we need this check at all? IIUC, the problem is that the expression >> "rb->page_size + 2 * info.max_entries" is evaluated as u32 and can >> overflow. So why doing this part only isn't enough? >> >> size_t mmap_size = rb->page_size + 2 * (size_t)info.max_entries; >> mmap(NULL, mmap_size, PROT_READ, MAP_SHARED, map_fd, ...); >> >> sizeof(size_t) should be 8, so no overflow is possible? > not on 32-bit arches, presumably? Yes. For 32-bits kernel, the total size of virtual address space for user space and kernel space is 4GB, so when map_entries is 2GB, the needed virtual address space will be 2GB + 4GB, so the mapping of ring buffer will fail either in kernel or in userspace. A extreme case is 32-bits userspace under 64-bits kernel. The mapping of 2GB ring buffer in kernel is OK, but 4GB will overflow size_t on 32-bits userspace. > > > >> >>> + tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd, >>> + rb->page_size); > should we split this mmap into two mmaps -- one for producer_pos page, > another for data area. That will presumably allow to mmap ringbuf with > max_entries = 1GB? I don't understand the reason for the splitting. Even without the splitting, in theory ring buffer with max_entries = 1GB will be OK for 32-bits kernel, despite in practice the mapping of 1GB ring buffer on 32-bits kernel will fail because the most common size of kernel virtual address space is 1GB (although ARM could use VMSPLIT_1G to increase the size of kernel virtual address to 3GB). > >>> if (tmp == MAP_FAILED) { >>> err = -errno; >>> ringbuf_unmap_ring(rb, r); >>> -- >>> 2.29.2
On Fri, Nov 11, 2022 at 7:34 PM Hou Tao <houtao@huaweicloud.com> wrote: > > Hi, > > On 11/12/2022 4:56 AM, Andrii Nakryiko wrote: > > On Fri, Nov 11, 2022 at 9:54 AM <sdf@google.com> wrote: > >> On 11/11, Hou Tao wrote: > >>> From: Hou Tao <houtao1@huawei.com> > >>> The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries > >>> will overflow u32 when mapping producer page and data pages. Only > >>> casting max_entries to size_t is not enough, because for 32-bits > >>> application on 64-bits kernel the size of read-only mmap region > >>> also could overflow size_t. > >>> Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support") > >>> Signed-off-by: Hou Tao <houtao1@huawei.com> > >>> --- > >>> tools/lib/bpf/ringbuf.c | 11 +++++++++-- > >>> 1 file changed, 9 insertions(+), 2 deletions(-) > >>> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c > >>> index d285171d4b69..c4bdc88af672 100644 > >>> --- a/tools/lib/bpf/ringbuf.c > >>> +++ b/tools/lib/bpf/ringbuf.c > >>> @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd, > >>> __u32 len = sizeof(info); > >>> struct epoll_event *e; > >>> struct ring *r; > >>> + __u64 ro_size; > > I found ro_size quite a confusing name, let's call it mmap_sz? > OK. > > > >>> void *tmp; > >>> int err; > >>> @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int > >>> map_fd, > >>> * data size to allow simple reading of samples that wrap around the > >>> * end of a ring buffer. See kernel implementation for details. > >>> * */ > >>> - tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ, > >>> - MAP_SHARED, map_fd, rb->page_size); > >>> + ro_size = rb->page_size + 2 * (__u64)info.max_entries; > >> [..] > >> > >>> + if (ro_size != (__u64)(size_t)ro_size) { > >>> + pr_warn("ringbuf: ring buffer size (%u) is too big\n", > >>> + info.max_entries); > >>> + return libbpf_err(-E2BIG); > >>> + } > >> Why do we need this check at all? IIUC, the problem is that the expression > >> "rb->page_size + 2 * info.max_entries" is evaluated as u32 and can > >> overflow. So why doing this part only isn't enough? > >> > >> size_t mmap_size = rb->page_size + 2 * (size_t)info.max_entries; > >> mmap(NULL, mmap_size, PROT_READ, MAP_SHARED, map_fd, ...); > >> > >> sizeof(size_t) should be 8, so no overflow is possible? > > not on 32-bit arches, presumably? > Yes. For 32-bits kernel, the total size of virtual address space for user space > and kernel space is 4GB, so when map_entries is 2GB, the needed virtual address > space will be 2GB + 4GB, so the mapping of ring buffer will fail either in > kernel or in userspace. A extreme case is 32-bits userspace under 64-bits > kernel. The mapping of 2GB ring buffer in kernel is OK, but 4GB will overflow > size_t on 32-bits userspace. > > > > > > > > > >> > >>> + tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd, > >>> + rb->page_size); > > should we split this mmap into two mmaps -- one for producer_pos page, > > another for data area. That will presumably allow to mmap ringbuf with > > max_entries = 1GB? > I don't understand the reason for the splitting. Even without the splitting, in > theory ring buffer with max_entries = 1GB will be OK for 32-bits kernel, despite > in practice the mapping of 1GB ring buffer on 32-bits kernel will fail because > the most common size of kernel virtual address space is 1GB (although ARM could > use VMSPLIT_1G to increase the size of kernel virtual address to 3GB). Yep, never mind. size_t is positive, so it can express up to 4GB, so 2GB + 4KB is fine as is already (even though it most probably will fail). > > > >>> if (tmp == MAP_FAILED) { > >>> err = -errno; > >>> ringbuf_unmap_ring(rb, r); > >>> -- > >>> 2.29.2 >
diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c index d285171d4b69..c4bdc88af672 100644 --- a/tools/lib/bpf/ringbuf.c +++ b/tools/lib/bpf/ringbuf.c @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd, __u32 len = sizeof(info); struct epoll_event *e; struct ring *r; + __u64 ro_size; void *tmp; int err; @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd, * data size to allow simple reading of samples that wrap around the * end of a ring buffer. See kernel implementation for details. * */ - tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ, - MAP_SHARED, map_fd, rb->page_size); + ro_size = rb->page_size + 2 * (__u64)info.max_entries; + if (ro_size != (__u64)(size_t)ro_size) { + pr_warn("ringbuf: ring buffer size (%u) is too big\n", + info.max_entries); + return libbpf_err(-E2BIG); + } + tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd, + rb->page_size); if (tmp == MAP_FAILED) { err = -errno; ringbuf_unmap_ring(rb, r);