diff mbox series

[bpf-next] bpf: use kvmalloc in map_lookup_elem

Message ID 20210816164832.1743675-1-sdf@google.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next] bpf: use kvmalloc in map_lookup_elem | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 5 maintainers not CCed: songliubraving@fb.com kafai@fb.com john.fastabend@gmail.com kpsingh@kernel.org yhs@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 11 this patch: 11
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 11 this patch: 11
netdev/header_inline success Link

Commit Message

Stanislav Fomichev Aug. 16, 2021, 4:48 p.m. UTC
Use kvmalloc/kvfree for temporary value when looking up a map.
kmalloc might not be sufficient for percpu maps where the value is big.

Can be reproduced with netcnt test on qemu with "-smp 255".

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 kernel/bpf/syscall.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Daniel Borkmann Aug. 16, 2021, 9:43 p.m. UTC | #1
On 8/16/21 6:48 PM, Stanislav Fomichev wrote:
> Use kvmalloc/kvfree for temporary value when looking up a map.
> kmalloc might not be sufficient for percpu maps where the value is big.
> 
> Can be reproduced with netcnt test on qemu with "-smp 255".
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>   kernel/bpf/syscall.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 9a2068e39d23..ae0b1c1c8ece 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1076,7 +1076,7 @@ static int map_lookup_elem(union bpf_attr *attr)
>   	value_size = bpf_map_value_size(map);
>   
>   	err = -ENOMEM;
> -	value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
> +	value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
>   	if (!value)
>   		goto free_key;

What about other cases like map_update_elem(), shouldn't they be adapted
similarly?

Thanks,
Daniel
Andrii Nakryiko Aug. 16, 2021, 10:32 p.m. UTC | #2
On Mon, Aug 16, 2021 at 2:43 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/16/21 6:48 PM, Stanislav Fomichev wrote:
> > Use kvmalloc/kvfree for temporary value when looking up a map.
> > kmalloc might not be sufficient for percpu maps where the value is big.
> >
> > Can be reproduced with netcnt test on qemu with "-smp 255".
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >   kernel/bpf/syscall.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 9a2068e39d23..ae0b1c1c8ece 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1076,7 +1076,7 @@ static int map_lookup_elem(union bpf_attr *attr)
> >       value_size = bpf_map_value_size(map);
> >
> >       err = -ENOMEM;
> > -     value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
> > +     value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
> >       if (!value)
> >               goto free_key;
>
> What about other cases like map_update_elem(), shouldn't they be adapted
> similarly?

And in the same vein (with keys potentially being big as well), should
we switch __bpf_copy_key() to use vmemdup_user() instead of
memdup_user()?

>
> Thanks,
> Daniel
Stanislav Fomichev Aug. 16, 2021, 10:46 p.m. UTC | #3
On 08/16, Andrii Nakryiko wrote:
> On Mon, Aug 16, 2021 at 2:43 PM Daniel Borkmann <daniel@iogearbox.net>  
> wrote:
> >
> > On 8/16/21 6:48 PM, Stanislav Fomichev wrote:
> > > Use kvmalloc/kvfree for temporary value when looking up a map.
> > > kmalloc might not be sufficient for percpu maps where the value is  
> big.
> > >
> > > Can be reproduced with netcnt test on qemu with "-smp 255".
> > >
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >   kernel/bpf/syscall.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index 9a2068e39d23..ae0b1c1c8ece 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -1076,7 +1076,7 @@ static int map_lookup_elem(union bpf_attr *attr)
> > >       value_size = bpf_map_value_size(map);
> > >
> > >       err = -ENOMEM;
> > > -     value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
> > > +     value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
> > >       if (!value)
> > >               goto free_key;
> >
> > What about other cases like map_update_elem(), shouldn't they be adapted
> > similarly?

> And in the same vein (with keys potentially being big as well), should
> we switch __bpf_copy_key() to use vmemdup_user() instead of
> memdup_user()?

Those are good questions :-)

I'm assuming that whatever is doing kmalloc on top of
bpf_map_value_size() should definitely use kvmalloc since
bpf_map_value_size() can blow up the value size. (will resend)

For __bpf_copy_key I'm less sure, but it might be a good idea as well.
Let me try to look at bit more into it, but feels like there shouldn't
be any downsides?
diff mbox series

Patch

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9a2068e39d23..ae0b1c1c8ece 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1076,7 +1076,7 @@  static int map_lookup_elem(union bpf_attr *attr)
 	value_size = bpf_map_value_size(map);
 
 	err = -ENOMEM;
-	value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
+	value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN);
 	if (!value)
 		goto free_key;
 
@@ -1091,7 +1091,7 @@  static int map_lookup_elem(union bpf_attr *attr)
 	err = 0;
 
 free_value:
-	kfree(value);
+	kvfree(value);
 free_key:
 	kfree(key);
 err_put: