[bpf,2/4] libbpf: Handle size overflow for ringbuf mmap

Message ID	20221111092642.2333724-3-houtao@huaweicloud.com (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> From: Hou Tao <houtao@huaweicloud.com> To: bpf@vger.kernel.org, Yonghong Song <yhs@fb.com> Cc: Martin KaFai Lau <martin.lau@linux.dev>, Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>, Hao Luo <haoluo@google.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>, John Fastabend <john.fastabend@gmail.com>, houtao1@huawei.com Subject: [PATCH bpf 2/4] libbpf: Handle size overflow for ringbuf mmap Date: Fri, 11 Nov 2022 17:26:40 +0800 Message-Id: <20221111092642.2333724-3-houtao@huaweicloud.com> In-Reply-To: <20221111092642.2333724-1-houtao@huaweicloud.com> References: <20221111092642.2333724-1-houtao@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	libbpf: Fixes for ring buffer \| expand [bpf,0/4] libbpf: Fixes for ring buffer [bpf,1/4] libbpf: Adjust ring buffer size when probing ring buffer map [bpf,2/4] libbpf: Handle size overflow for ringbuf mmap [bpf,3/4] libbpf: Handle size overflow for user ringbuf mmap [bpf,4/4] libbpf: Check the validity of size in user_ring_buffer__reserve()

Context	Check	Description
netdev/tree_selection	success	Clearly marked for bpf
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Series has a cover letter
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	success	CCed 12 of 12 maintainers
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 23 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-VM_Test-3	fail	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-4	fail	Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-5	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-6	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-7	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-8	success	Logs for llvm-toolchain
bpf/vmtest-bpf-VM_Test-9	success	Logs for set-matrix
bpf/vmtest-bpf-PR	success	PR summary
bpf/vmtest-bpf-VM_Test-2	success	Logs for llvm-toolchain
bpf/vmtest-bpf-VM_Test-1	success	Logs for ShellCheck

Hou Tao Nov. 11, 2022, 9:26 a.m. UTC

From: Hou Tao <houtao1@huawei.com>

The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries
will overflow u32 when mapping producer page and data pages. Only
casting max_entries to size_t is not enough, because for 32-bits
application on 64-bits kernel the size of read-only mmap region
also could overflow size_t.

So fixing it by casting the size of read-only mmap region into a __u64
and checking whether or not there will be overflow during mmap.

Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 tools/lib/bpf/ringbuf.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Stanislav Fomichev Nov. 11, 2022, 5:54 p.m. UTC | #1

On 11/11, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>

> The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries
> will overflow u32 when mapping producer page and data pages. Only
> casting max_entries to size_t is not enough, because for 32-bits
> application on 64-bits kernel the size of read-only mmap region
> also could overflow size_t.

> So fixing it by casting the size of read-only mmap region into a __u64
> and checking whether or not there will be overflow during mmap.

> Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>   tools/lib/bpf/ringbuf.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)

> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> index d285171d4b69..c4bdc88af672 100644
> --- a/tools/lib/bpf/ringbuf.c
> +++ b/tools/lib/bpf/ringbuf.c
> @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
>   	__u32 len = sizeof(info);
>   	struct epoll_event *e;
>   	struct ring *r;
> +	__u64 ro_size;
>   	void *tmp;
>   	int err;

> @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int  
> map_fd,
>   	 * data size to allow simple reading of samples that wrap around the
>   	 * end of a ring buffer. See kernel implementation for details.
>   	 * */
> -	tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ,
> -		   MAP_SHARED, map_fd, rb->page_size);
> +	ro_size = rb->page_size + 2 * (__u64)info.max_entries;

[..]

> +	if (ro_size != (__u64)(size_t)ro_size) {
> +		pr_warn("ringbuf: ring buffer size (%u) is too big\n",
> +			info.max_entries);
> +		return libbpf_err(-E2BIG);
> +	}

Why do we need this check at all? IIUC, the problem is that the expression
"rb->page_size + 2 * info.max_entries" is evaluated as u32 and can
overflow. So why doing this part only isn't enough?

size_t mmap_size = rb->page_size + 2 * (size_t)info.max_entries;
mmap(NULL, mmap_size, PROT_READ, MAP_SHARED, map_fd, ...);

sizeof(size_t) should be 8, so no overflow is possible?


> +	tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd,
> +		   rb->page_size);
>   	if (tmp == MAP_FAILED) {
>   		err = -errno;
>   		ringbuf_unmap_ring(rb, r);
> --
> 2.29.2

Andrii Nakryiko Nov. 11, 2022, 8:56 p.m. UTC | #2

On Fri, Nov 11, 2022 at 9:54 AM <sdf@google.com> wrote:
>
> On 11/11, Hou Tao wrote:
> > From: Hou Tao <houtao1@huawei.com>
>
> > The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries
> > will overflow u32 when mapping producer page and data pages. Only
> > casting max_entries to size_t is not enough, because for 32-bits
> > application on 64-bits kernel the size of read-only mmap region
> > also could overflow size_t.
>
> > So fixing it by casting the size of read-only mmap region into a __u64
> > and checking whether or not there will be overflow during mmap.
>
> > Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > ---
> >   tools/lib/bpf/ringbuf.c | 11 +++++++++--
> >   1 file changed, 9 insertions(+), 2 deletions(-)
>
> > diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> > index d285171d4b69..c4bdc88af672 100644
> > --- a/tools/lib/bpf/ringbuf.c
> > +++ b/tools/lib/bpf/ringbuf.c
> > @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
> >       __u32 len = sizeof(info);
> >       struct epoll_event *e;
> >       struct ring *r;
> > +     __u64 ro_size;

I found ro_size quite a confusing name, let's call it mmap_sz?

> >       void *tmp;
> >       int err;
>
> > @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int
> > map_fd,
> >        * data size to allow simple reading of samples that wrap around the
> >        * end of a ring buffer. See kernel implementation for details.
> >        * */
> > -     tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ,
> > -                MAP_SHARED, map_fd, rb->page_size);
> > +     ro_size = rb->page_size + 2 * (__u64)info.max_entries;
>
> [..]
>
> > +     if (ro_size != (__u64)(size_t)ro_size) {
> > +             pr_warn("ringbuf: ring buffer size (%u) is too big\n",
> > +                     info.max_entries);
> > +             return libbpf_err(-E2BIG);
> > +     }
>
> Why do we need this check at all? IIUC, the problem is that the expression
> "rb->page_size + 2 * info.max_entries" is evaluated as u32 and can
> overflow. So why doing this part only isn't enough?
>
> size_t mmap_size = rb->page_size + 2 * (size_t)info.max_entries;
> mmap(NULL, mmap_size, PROT_READ, MAP_SHARED, map_fd, ...);
>
> sizeof(size_t) should be 8, so no overflow is possible?

not on 32-bit arches, presumably?



>
>
> > +     tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd,
> > +                rb->page_size);

should we split this mmap into two mmaps -- one for producer_pos page,
another for data area. That will presumably allow to mmap ringbuf with
max_entries = 1GB?

> >       if (tmp == MAP_FAILED) {
> >               err = -errno;
> >               ringbuf_unmap_ring(rb, r);
> > --
> > 2.29.2
>

Stanislav Fomichev Nov. 11, 2022, 9:24 p.m. UTC | #3

On Fri, Nov 11, 2022 at 12:56 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Nov 11, 2022 at 9:54 AM <sdf@google.com> wrote:
> >
> > On 11/11, Hou Tao wrote:
> > > From: Hou Tao <houtao1@huawei.com>
> >
> > > The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries
> > > will overflow u32 when mapping producer page and data pages. Only
> > > casting max_entries to size_t is not enough, because for 32-bits
> > > application on 64-bits kernel the size of read-only mmap region
> > > also could overflow size_t.
> >
> > > So fixing it by casting the size of read-only mmap region into a __u64
> > > and checking whether or not there will be overflow during mmap.
> >
> > > Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
> > > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > > ---
> > >   tools/lib/bpf/ringbuf.c | 11 +++++++++--
> > >   1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > > diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> > > index d285171d4b69..c4bdc88af672 100644
> > > --- a/tools/lib/bpf/ringbuf.c
> > > +++ b/tools/lib/bpf/ringbuf.c
> > > @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
> > >       __u32 len = sizeof(info);
> > >       struct epoll_event *e;
> > >       struct ring *r;
> > > +     __u64 ro_size;
>
> I found ro_size quite a confusing name, let's call it mmap_sz?
>
> > >       void *tmp;
> > >       int err;
> >
> > > @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int
> > > map_fd,
> > >        * data size to allow simple reading of samples that wrap around the
> > >        * end of a ring buffer. See kernel implementation for details.
> > >        * */
> > > -     tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ,
> > > -                MAP_SHARED, map_fd, rb->page_size);
> > > +     ro_size = rb->page_size + 2 * (__u64)info.max_entries;
> >
> > [..]
> >
> > > +     if (ro_size != (__u64)(size_t)ro_size) {
> > > +             pr_warn("ringbuf: ring buffer size (%u) is too big\n",
> > > +                     info.max_entries);
> > > +             return libbpf_err(-E2BIG);
> > > +     }
> >
> > Why do we need this check at all? IIUC, the problem is that the expression
> > "rb->page_size + 2 * info.max_entries" is evaluated as u32 and can
> > overflow. So why doing this part only isn't enough?
> >
> > size_t mmap_size = rb->page_size + 2 * (size_t)info.max_entries;
> > mmap(NULL, mmap_size, PROT_READ, MAP_SHARED, map_fd, ...);
> >
> > sizeof(size_t) should be 8, so no overflow is possible?
>
> not on 32-bit arches, presumably?

Good point, he even mentions it in the description, I can't read apparently :-/

"Only casting max_entries to size_t is not enough"

>
>
> >
> >
> > > +     tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd,
> > > +                rb->page_size);
>
> should we split this mmap into two mmaps -- one for producer_pos page,
> another for data area. That will presumably allow to mmap ringbuf with
> max_entries = 1GB?
>
> > >       if (tmp == MAP_FAILED) {
> > >               err = -errno;
> > >               ringbuf_unmap_ring(rb, r);
> > > --
> > > 2.29.2
> >

Hou Tao Nov. 12, 2022, 3:34 a.m. UTC | #4

Hi,

On 11/12/2022 4:56 AM, Andrii Nakryiko wrote:
> On Fri, Nov 11, 2022 at 9:54 AM <sdf@google.com> wrote:
>> On 11/11, Hou Tao wrote:
>>> From: Hou Tao <houtao1@huawei.com>
>>> The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries
>>> will overflow u32 when mapping producer page and data pages. Only
>>> casting max_entries to size_t is not enough, because for 32-bits
>>> application on 64-bits kernel the size of read-only mmap region
>>> also could overflow size_t.
>>> Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>>> ---
>>>   tools/lib/bpf/ringbuf.c | 11 +++++++++--
>>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
>>> index d285171d4b69..c4bdc88af672 100644
>>> --- a/tools/lib/bpf/ringbuf.c
>>> +++ b/tools/lib/bpf/ringbuf.c
>>> @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
>>>       __u32 len = sizeof(info);
>>>       struct epoll_event *e;
>>>       struct ring *r;
>>> +     __u64 ro_size;
> I found ro_size quite a confusing name, let's call it mmap_sz?
OK.
>
>>>       void *tmp;
>>>       int err;
>>> @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int
>>> map_fd,
>>>        * data size to allow simple reading of samples that wrap around the
>>>        * end of a ring buffer. See kernel implementation for details.
>>>        * */
>>> -     tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ,
>>> -                MAP_SHARED, map_fd, rb->page_size);
>>> +     ro_size = rb->page_size + 2 * (__u64)info.max_entries;
>> [..]
>>
>>> +     if (ro_size != (__u64)(size_t)ro_size) {
>>> +             pr_warn("ringbuf: ring buffer size (%u) is too big\n",
>>> +                     info.max_entries);
>>> +             return libbpf_err(-E2BIG);
>>> +     }
>> Why do we need this check at all? IIUC, the problem is that the expression
>> "rb->page_size + 2 * info.max_entries" is evaluated as u32 and can
>> overflow. So why doing this part only isn't enough?
>>
>> size_t mmap_size = rb->page_size + 2 * (size_t)info.max_entries;
>> mmap(NULL, mmap_size, PROT_READ, MAP_SHARED, map_fd, ...);
>>
>> sizeof(size_t) should be 8, so no overflow is possible?
> not on 32-bit arches, presumably?
Yes. For 32-bits kernel, the total size of virtual address space for user space
and kernel space is 4GB, so when map_entries is 2GB, the needed virtual address
space will be 2GB + 4GB, so the mapping of ring buffer will fail either in
kernel or in userspace. A extreme case is 32-bits userspace under 64-bits
kernel. The mapping of 2GB ring buffer in kernel is OK, but 4GB will overflow
size_t on 32-bits userspace.
>


>
>
>>
>>> +     tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd,
>>> +                rb->page_size);
> should we split this mmap into two mmaps -- one for producer_pos page,
> another for data area. That will presumably allow to mmap ringbuf with
> max_entries = 1GB?
I don't understand the reason for the splitting. Even without the splitting, in
theory ring buffer with max_entries = 1GB will be OK for 32-bits kernel, despite
in practice the mapping of 1GB ring buffer on 32-bits kernel will fail because
the most common size of kernel virtual address space is 1GB (although ARM could
use VMSPLIT_1G to increase the size of kernel virtual address to 3GB).
>
>>>       if (tmp == MAP_FAILED) {
>>>               err = -errno;
>>>               ringbuf_unmap_ring(rb, r);
>>> --
>>> 2.29.2

Andrii Nakryiko Nov. 14, 2022, 7:51 p.m. UTC | #5

On Fri, Nov 11, 2022 at 7:34 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 11/12/2022 4:56 AM, Andrii Nakryiko wrote:
> > On Fri, Nov 11, 2022 at 9:54 AM <sdf@google.com> wrote:
> >> On 11/11, Hou Tao wrote:
> >>> From: Hou Tao <houtao1@huawei.com>
> >>> The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries
> >>> will overflow u32 when mapping producer page and data pages. Only
> >>> casting max_entries to size_t is not enough, because for 32-bits
> >>> application on 64-bits kernel the size of read-only mmap region
> >>> also could overflow size_t.
> >>> Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
> >>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> >>> ---
> >>>   tools/lib/bpf/ringbuf.c | 11 +++++++++--
> >>>   1 file changed, 9 insertions(+), 2 deletions(-)
> >>> diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c
> >>> index d285171d4b69..c4bdc88af672 100644
> >>> --- a/tools/lib/bpf/ringbuf.c
> >>> +++ b/tools/lib/bpf/ringbuf.c
> >>> @@ -77,6 +77,7 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd,
> >>>       __u32 len = sizeof(info);
> >>>       struct epoll_event *e;
> >>>       struct ring *r;
> >>> +     __u64 ro_size;
> > I found ro_size quite a confusing name, let's call it mmap_sz?
> OK.
> >
> >>>       void *tmp;
> >>>       int err;
> >>> @@ -129,8 +130,14 @@ int ring_buffer__add(struct ring_buffer *rb, int
> >>> map_fd,
> >>>        * data size to allow simple reading of samples that wrap around the
> >>>        * end of a ring buffer. See kernel implementation for details.
> >>>        * */
> >>> -     tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ,
> >>> -                MAP_SHARED, map_fd, rb->page_size);
> >>> +     ro_size = rb->page_size + 2 * (__u64)info.max_entries;
> >> [..]
> >>
> >>> +     if (ro_size != (__u64)(size_t)ro_size) {
> >>> +             pr_warn("ringbuf: ring buffer size (%u) is too big\n",
> >>> +                     info.max_entries);
> >>> +             return libbpf_err(-E2BIG);
> >>> +     }
> >> Why do we need this check at all? IIUC, the problem is that the expression
> >> "rb->page_size + 2 * info.max_entries" is evaluated as u32 and can
> >> overflow. So why doing this part only isn't enough?
> >>
> >> size_t mmap_size = rb->page_size + 2 * (size_t)info.max_entries;
> >> mmap(NULL, mmap_size, PROT_READ, MAP_SHARED, map_fd, ...);
> >>
> >> sizeof(size_t) should be 8, so no overflow is possible?
> > not on 32-bit arches, presumably?
> Yes. For 32-bits kernel, the total size of virtual address space for user space
> and kernel space is 4GB, so when map_entries is 2GB, the needed virtual address
> space will be 2GB + 4GB, so the mapping of ring buffer will fail either in
> kernel or in userspace. A extreme case is 32-bits userspace under 64-bits
> kernel. The mapping of 2GB ring buffer in kernel is OK, but 4GB will overflow
> size_t on 32-bits userspace.
> >
>
>
> >
> >
> >>
> >>> +     tmp = mmap(NULL, (size_t)ro_size, PROT_READ, MAP_SHARED, map_fd,
> >>> +                rb->page_size);
> > should we split this mmap into two mmaps -- one for producer_pos page,
> > another for data area. That will presumably allow to mmap ringbuf with
> > max_entries = 1GB?
> I don't understand the reason for the splitting. Even without the splitting, in
> theory ring buffer with max_entries = 1GB will be OK for 32-bits kernel, despite
> in practice the mapping of 1GB ring buffer on 32-bits kernel will fail because
> the most common size of kernel virtual address space is 1GB (although ARM could
> use VMSPLIT_1G to increase the size of kernel virtual address to 3GB).

Yep, never mind. size_t is positive, so it can express up to 4GB, so
2GB + 4KB is fine as is already (even though it most probably will
fail).

> >
> >>>       if (tmp == MAP_FAILED) {
> >>>               err = -errno;
> >>>               ringbuf_unmap_ring(rb, r);
> >>> --
> >>> 2.29.2
>

[bpf,2/4] libbpf: Handle size overflow for ringbuf mmap

Checks

Commit Message

Comments

Patch