diff mbox series

[v2,1/3] bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)

Message ID 20230502005218.3627530-1-drosen@google.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [v2,1/3] bpf: Allow NULL buffers in bpf_dynptr_slice(_rw) | expand

Checks

Context Check Description
bpf/vmtest-bpf-VM_Test-11 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-13 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-14 fail Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-16 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-17 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-18 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-19 fail Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-21 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-23 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-25 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-27 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-28 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-29 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-30 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-31 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-32 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-33 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-34 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-35 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-36 success Logs for veristat
bpf/vmtest-bpf-VM_Test-15 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-20 fail Logs for test_progs_no_alu32 on s390x with gcc
netdev/series_format warning Series does not have a cover letter
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5176 this patch: 5176
netdev/cc_maintainers warning 7 maintainers not CCed: kuba@kernel.org hawk@kernel.org netdev@vger.kernel.org davem@davemloft.net imagedong@tencent.com edumazet@google.com linux-doc@vger.kernel.org
netdev/build_clang success Errors and warnings before: 1988 this patch: 1988
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5391 this patch: 5391
netdev/checkpatch warning WARNING: line length of 105 exceeds 80 columns WARNING: line length of 130 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 93 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-VM_Test-10 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-1 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-VM_Test-2 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-3 fail Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-4 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-VM_Test-5 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-6 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-9 success Logs for veristat

Commit Message

Daniel Rosenberg May 2, 2023, 12:52 a.m. UTC
bpf_dynptr_slice(_rw) uses a user provided buffer if it can not provide
a pointer to a block of contiguous memory. This buffer is unused in the
case of local dynptrs, and may be unused in other cases as well. There
is no need to require the buffer, as the kfunc can just return NULL if
it was needed and not provided.

This adds another kfunc annotation, __opt, which combines with __sz and
__szk to allow the buffer associated with the size to be NULL. If the
buffer is NULL, the verifier does not check that the buffer is of
sufficient size.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
---
 Documentation/bpf/kfuncs.rst | 23 ++++++++++++++++++++++-
 include/linux/skbuff.h       |  2 +-
 kernel/bpf/helpers.c         | 30 ++++++++++++++++++------------
 kernel/bpf/verifier.c        | 17 +++++++++++++----
 4 files changed, 54 insertions(+), 18 deletions(-)


base-commit: 6e98b09da931a00bf4e0477d0fa52748bf28fcce

Comments

Andrii Nakryiko May 3, 2023, 6:49 p.m. UTC | #1
On Mon, May 1, 2023 at 5:52 PM Daniel Rosenberg <drosen@google.com> wrote:
>
> bpf_dynptr_slice(_rw) uses a user provided buffer if it can not provide
> a pointer to a block of contiguous memory. This buffer is unused in the
> case of local dynptrs, and may be unused in other cases as well. There
> is no need to require the buffer, as the kfunc can just return NULL if
> it was needed and not provided.
>
> This adds another kfunc annotation, __opt, which combines with __sz and
> __szk to allow the buffer associated with the size to be NULL. If the
> buffer is NULL, the verifier does not check that the buffer is of
> sufficient size.
>
> Signed-off-by: Daniel Rosenberg <drosen@google.com>
> ---

LGTM

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  Documentation/bpf/kfuncs.rst | 23 ++++++++++++++++++++++-
>  include/linux/skbuff.h       |  2 +-
>  kernel/bpf/helpers.c         | 30 ++++++++++++++++++------------
>  kernel/bpf/verifier.c        | 17 +++++++++++++----
>  4 files changed, 54 insertions(+), 18 deletions(-)
>

[...]
Jakub Kicinski July 18, 2023, 3:26 p.m. UTC | #2
On Mon,  1 May 2023 17:52:16 -0700 Daniel Rosenberg wrote:
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len,
>  	if (likely(hlen - offset >= len))
>  		return (void *)data + offset;
>  
> -	if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
> +	if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
>  		return NULL;

First off - please make sure you CC netdev on changes to networking!

Please do not add stupid error checks to core code for BPF safety.
Wrap the call if you can't guarantee that value is sane, this is
a very bad precedent.
Alexei Starovoitov July 18, 2023, 3:52 p.m. UTC | #3
On Tue, Jul 18, 2023 at 8:26 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon,  1 May 2023 17:52:16 -0700 Daniel Rosenberg wrote:
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len,
> >       if (likely(hlen - offset >= len))
> >               return (void *)data + offset;
> >
> > -     if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
> > +     if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
> >               return NULL;
>
> First off - please make sure you CC netdev on changes to networking!
>
> Please do not add stupid error checks to core code for BPF safety.
> Wrap the call if you can't guarantee that value is sane, this is
> a very bad precedent.

This is NOT for safety. You misread the code.
Jakub Kicinski July 18, 2023, 4:06 p.m. UTC | #4
On Tue, 18 Jul 2023 08:52:55 -0700 Alexei Starovoitov wrote:
> On Tue, Jul 18, 2023 at 8:26 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Mon,  1 May 2023 17:52:16 -0700 Daniel Rosenberg wrote:  
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len,
> > >       if (likely(hlen - offset >= len))
> > >               return (void *)data + offset;
> > >
> > > -     if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
> > > +     if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
> > >               return NULL;  
> >
> > First off - please make sure you CC netdev on changes to networking!
> >
> > Please do not add stupid error checks to core code for BPF safety.
> > Wrap the call if you can't guarantee that value is sane, this is
> > a very bad precedent.  
> 
> This is NOT for safety. You misread the code.

Doesn't matter, safety or optionality. skb_header_pointer() is used 
on the fast paths of the networking stack, adding heavy handed input
validation to it is not okay. No sane code should be passing NULL
buffer to skb_header_pointer(). Please move the NULL check to the BPF
code so the rest of the networking stack does not have to pay the cost.

This should be common sense. If one caller is doing something..
"special" the extra code should live in the caller, not the callee.
That's basic code hygiene.
Alexei Starovoitov July 18, 2023, 4:52 p.m. UTC | #5
On Tue, Jul 18, 2023 at 9:06 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 18 Jul 2023 08:52:55 -0700 Alexei Starovoitov wrote:
> > On Tue, Jul 18, 2023 at 8:26 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Mon,  1 May 2023 17:52:16 -0700 Daniel Rosenberg wrote:
> > > > --- a/include/linux/skbuff.h
> > > > +++ b/include/linux/skbuff.h
> > > > @@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len,
> > > >       if (likely(hlen - offset >= len))
> > > >               return (void *)data + offset;
> > > >
> > > > -     if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
> > > > +     if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
> > > >               return NULL;
> > >
> > > First off - please make sure you CC netdev on changes to networking!
> > >
> > > Please do not add stupid error checks to core code for BPF safety.
> > > Wrap the call if you can't guarantee that value is sane, this is
> > > a very bad precedent.
> >
> > This is NOT for safety. You misread the code.
>
> Doesn't matter, safety or optionality. skb_header_pointer() is used
> on the fast paths of the networking stack, adding heavy handed input
> validation to it is not okay. No sane code should be passing NULL
> buffer to skb_header_pointer(). Please move the NULL check to the BPF
> code so the rest of the networking stack does not have to pay the cost.
>
> This should be common sense. If one caller is doing something..
> "special" the extra code should live in the caller, not the callee.
> That's basic code hygiene.

you're still missing the point. Pls read the whole patch series.
It is _not_ input validation.
skb_copy_bits is a slow path. One extra check doesn't affect
performance at all. So 'fast paths' isn't a valid argument here.
The code is reusing
        if (likely(hlen - offset >= len))
                return (void *)data + offset;
which _is_ the fast path.

What you're requesting is to copy paste
the whole __skb_header_pointer into __skb_header_pointer2.
Makes no sense.
Jakub Kicinski July 18, 2023, 5:18 p.m. UTC | #6
On Tue, 18 Jul 2023 09:52:24 -0700 Alexei Starovoitov wrote:
> On Tue, Jul 18, 2023 at 9:06 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > This is NOT for safety. You misread the code.  
> >
> > Doesn't matter, safety or optionality. skb_header_pointer() is used
> > on the fast paths of the networking stack, adding heavy handed input
> > validation to it is not okay. No sane code should be passing NULL
> > buffer to skb_header_pointer(). Please move the NULL check to the BPF
> > code so the rest of the networking stack does not have to pay the cost.
> >
> > This should be common sense. If one caller is doing something..
> > "special" the extra code should live in the caller, not the callee.
> > That's basic code hygiene.  
> 
> you're still missing the point. Pls read the whole patch series.

Could you just tell me what the point is then? The "series" is one
patch plus some tiny selftests. I don't see any documentation for
how dynptrs are supposed to work either.

As far as I can grasp this makes the "copy buffer" optional from
the kfunc-API perspective (of bpf_dynptr_slice()).

> It is _not_ input validation.
> skb_copy_bits is a slow path. One extra check doesn't affect
> performance at all. So 'fast paths' isn't a valid argument here.
> The code is reusing
>         if (likely(hlen - offset >= len))
>                 return (void *)data + offset;
> which _is_ the fast path.
> 
> What you're requesting is to copy paste
> the whole __skb_header_pointer into __skb_header_pointer2.
> Makes no sense.

No, Alexei, the whole point of skb_header_pointer() is to pass 
the secondary buffer, to make header parsing dependable.

Passing NULL buffer to skb_header_pointer() is absolutely nonsensical.
It should *not* be supported. We had enough prod problems with people
thinking that the entire header will be in the linear portion.
Then either the NIC can't parse the header, someone enables jumbo,
disables GRO, adds new HW, adds encap, etc etc and things implode.

If you want to support it in BPF that's up to you, but I think it's
entirely reasonable for me to request that you don't do such things
in general networking code. The function is 5 LoC, so a local BPF
copy seems fine. Although I'd suggest skb_header_pointer_misguided()
rather than __skb_header_pointer2() as the name :)
Alexei Starovoitov July 18, 2023, 5:50 p.m. UTC | #7
On Tue, Jul 18, 2023 at 10:18 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 18 Jul 2023 09:52:24 -0700 Alexei Starovoitov wrote:
> > On Tue, Jul 18, 2023 at 9:06 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > This is NOT for safety. You misread the code.
> > >
> > > Doesn't matter, safety or optionality. skb_header_pointer() is used
> > > on the fast paths of the networking stack, adding heavy handed input
> > > validation to it is not okay. No sane code should be passing NULL
> > > buffer to skb_header_pointer(). Please move the NULL check to the BPF
> > > code so the rest of the networking stack does not have to pay the cost.
> > >
> > > This should be common sense. If one caller is doing something..
> > > "special" the extra code should live in the caller, not the callee.
> > > That's basic code hygiene.
> >
> > you're still missing the point. Pls read the whole patch series.
>
> Could you just tell me what the point is then? The "series" is one
> patch plus some tiny selftests. I don't see any documentation for
> how dynptrs are supposed to work either.
>
> As far as I can grasp this makes the "copy buffer" optional from
> the kfunc-API perspective (of bpf_dynptr_slice()).
>
> > It is _not_ input validation.
> > skb_copy_bits is a slow path. One extra check doesn't affect
> > performance at all. So 'fast paths' isn't a valid argument here.
> > The code is reusing
> >         if (likely(hlen - offset >= len))
> >                 return (void *)data + offset;
> > which _is_ the fast path.
> >
> > What you're requesting is to copy paste
> > the whole __skb_header_pointer into __skb_header_pointer2.
> > Makes no sense.
>
> No, Alexei, the whole point of skb_header_pointer() is to pass
> the secondary buffer, to make header parsing dependable.

of course. No one argues about that.

> Passing NULL buffer to skb_header_pointer() is absolutely nonsensical.

Quick grep through the code proves you wrong:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
__skb_header_pointer(NULL, start, sizeof(*hp), skb->data,
                     skb_headlen(skb), NULL);

was done before this patch. It's using __ variant on purpose
and explicitly passing skb==NULL to exactly trigger that line
to deliberately avoid the slow path.

Another example:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
skb_header_pointer(skb, 0, 0, NULL);

This one I'm not sure about. Looks buggy.

> It should *not* be supported. We had enough prod problems with people
> thinking that the entire header will be in the linear portion.
> Then either the NIC can't parse the header, someone enables jumbo,
> disables GRO, adds new HW, adds encap, etc etc and things implode.

I don't see how this is related.
NULL buffer allows to get a linear pointer and explicitly avoids
slow path when it's not linear.

> If you want to support it in BPF that's up to you, but I think it's
> entirely reasonable for me to request that you don't do such things
> in general networking code. The function is 5 LoC, so a local BPF
> copy seems fine. Although I'd suggest skb_header_pointer_misguided()
> rather than __skb_header_pointer2() as the name :)

If you insist we can, but bnxt is an example that buffer==NULL is
a useful concept for networking and not bpf specific.
It also doesn't make "people think the header is linear" any worse.
Jakub Kicinski July 18, 2023, 6:11 p.m. UTC | #8
On Tue, 18 Jul 2023 10:50:14 -0700 Alexei Starovoitov wrote:
> On Tue, Jul 18, 2023 at 10:18 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > you're still missing the point. Pls read the whole patch series.  
> >
> > Could you just tell me what the point is then? The "series" is one
> > patch plus some tiny selftests. I don't see any documentation for
> > how dynptrs are supposed to work either.
> >
> > As far as I can grasp this makes the "copy buffer" optional from
> > the kfunc-API perspective (of bpf_dynptr_slice()).
> >  
> > > It is _not_ input validation.
> > > skb_copy_bits is a slow path. One extra check doesn't affect
> > > performance at all. So 'fast paths' isn't a valid argument here.
> > > The code is reusing
> > >         if (likely(hlen - offset >= len))
> > >                 return (void *)data + offset;
> > > which _is_ the fast path.
> > >
> > > What you're requesting is to copy paste
> > > the whole __skb_header_pointer into __skb_header_pointer2.
> > > Makes no sense.  
> >
> > No, Alexei, the whole point of skb_header_pointer() is to pass
> > the secondary buffer, to make header parsing dependable.  
> 
> of course. No one argues about that.
> 
> > Passing NULL buffer to skb_header_pointer() is absolutely nonsensical.  
> 
> Quick grep through the code proves you wrong:
> drivers/net/ethernet/broadcom/bnxt/bnxt.c
> __skb_header_pointer(NULL, start, sizeof(*hp), skb->data,
>                      skb_headlen(skb), NULL);
> 
> was done before this patch. It's using __ variant on purpose
> and explicitly passing skb==NULL to exactly trigger that line
> to deliberately avoid the slow path.
> 
> Another example:
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> skb_header_pointer(skb, 0, 0, NULL);
> 
> This one I'm not sure about. Looks buggy.

These are both Tx path for setting up offloads, Linux doesn't request
offloads for headers outside of the linear part. The ixgbevf code is
completely pointless, as you say.

In general drivers are rarely a source of high quality code examples.
Having been directly involved in the bugs that lead to the bnxt code
being written - I was so happy that the driver started parsing Tx
packets *at all*, so I wasn't too fussed by the minor problems :(

> > It should *not* be supported. We had enough prod problems with people
> > thinking that the entire header will be in the linear portion.
> > Then either the NIC can't parse the header, someone enables jumbo,
> > disables GRO, adds new HW, adds encap, etc etc and things implode.  
> 
> I don't see how this is related.
> NULL buffer allows to get a linear pointer and explicitly avoids
> slow path when it's not linear.

Direct packet access via skb->data is there for those who want high
speed 
Alexei Starovoitov July 18, 2023, 8:34 p.m. UTC | #9
On Tue, Jul 18, 2023 at 11:11 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 18 Jul 2023 10:50:14 -0700 Alexei Starovoitov wrote:
> > On Tue, Jul 18, 2023 at 10:18 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > you're still missing the point. Pls read the whole patch series.
> > >
> > > Could you just tell me what the point is then? The "series" is one
> > > patch plus some tiny selftests. I don't see any documentation for
> > > how dynptrs are supposed to work either.
> > >
> > > As far as I can grasp this makes the "copy buffer" optional from
> > > the kfunc-API perspective (of bpf_dynptr_slice()).
> > >
> > > > It is _not_ input validation.
> > > > skb_copy_bits is a slow path. One extra check doesn't affect
> > > > performance at all. So 'fast paths' isn't a valid argument here.
> > > > The code is reusing
> > > >         if (likely(hlen - offset >= len))
> > > >                 return (void *)data + offset;
> > > > which _is_ the fast path.
> > > >
> > > > What you're requesting is to copy paste
> > > > the whole __skb_header_pointer into __skb_header_pointer2.
> > > > Makes no sense.
> > >
> > > No, Alexei, the whole point of skb_header_pointer() is to pass
> > > the secondary buffer, to make header parsing dependable.
> >
> > of course. No one argues about that.
> >
> > > Passing NULL buffer to skb_header_pointer() is absolutely nonsensical.
> >
> > Quick grep through the code proves you wrong:
> > drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > __skb_header_pointer(NULL, start, sizeof(*hp), skb->data,
> >                      skb_headlen(skb), NULL);
> >
> > was done before this patch. It's using __ variant on purpose
> > and explicitly passing skb==NULL to exactly trigger that line
> > to deliberately avoid the slow path.
> >
> > Another example:
> > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> > skb_header_pointer(skb, 0, 0, NULL);
> >
> > This one I'm not sure about. Looks buggy.
>
> These are both Tx path for setting up offloads, Linux doesn't request
> offloads for headers outside of the linear part. The ixgbevf code is
> completely pointless, as you say.
>
> In general drivers are rarely a source of high quality code examples.
> Having been directly involved in the bugs that lead to the bnxt code
> being written - I was so happy that the driver started parsing Tx
> packets *at all*, so I wasn't too fussed by the minor problems :(
>
> > > It should *not* be supported. We had enough prod problems with people
> > > thinking that the entire header will be in the linear portion.
> > > Then either the NIC can't parse the header, someone enables jumbo,
> > > disables GRO, adds new HW, adds encap, etc etc and things implode.
> >
> > I don't see how this is related.
> > NULL buffer allows to get a linear pointer and explicitly avoids
> > slow path when it's not linear.
>
> Direct packet access via skb->data is there for those who want high
> speed 
Jakub Kicinski July 18, 2023, 11:06 p.m. UTC | #10
On Tue, 18 Jul 2023 13:34:06 -0700 Alexei Starovoitov wrote:
> > Direct packet access via skb->data is there for those who want high
> > speed 
Alexei Starovoitov July 18, 2023, 11:17 p.m. UTC | #11
On Tue, Jul 18, 2023 at 4:06 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 18 Jul 2023 13:34:06 -0700 Alexei Starovoitov wrote:
> > > Direct packet access via skb->data is there for those who want high
> > > speed 
Jakub Kicinski July 18, 2023, 11:21 p.m. UTC | #12
On Tue, 18 Jul 2023 16:17:24 -0700 Alexei Starovoitov wrote:
> Which would encourage bnxt-like hacks.
> I don't like it tbh.
> At least skb_pointer_if_linear() has a clear meaning.
> It's more run-time overhead, since buffer__opt is checked early,
> but that's ok.

Alright, your version fine by me, too. Thanks!
Alexei Starovoitov July 18, 2023, 11:22 p.m. UTC | #13
On Tue, Jul 18, 2023 at 4:21 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 18 Jul 2023 16:17:24 -0700 Alexei Starovoitov wrote:
> > Which would encourage bnxt-like hacks.
> > I don't like it tbh.
> > At least skb_pointer_if_linear() has a clear meaning.
> > It's more run-time overhead, since buffer__opt is checked early,
> > but that's ok.
>
> Alright, your version fine by me, too. Thanks!

ok. will send it officially soon.
Daniel Borkmann July 19, 2023, 2:51 p.m. UTC | #14
On 7/19/23 1:21 AM, Jakub Kicinski wrote:
> On Tue, 18 Jul 2023 16:17:24 -0700 Alexei Starovoitov wrote:
>> Which would encourage bnxt-like hacks.
>> I don't like it tbh.
>> At least skb_pointer_if_linear() has a clear meaning.
>> It's more run-time overhead, since buffer__opt is checked early,
>> but that's ok.
> 
> Alright, your version fine by me, too. Thanks!

Looks good to me too. Agree that the !buffer check should not live in
__skb_header_pointer() and is better handled in the bpf_dynptr_slice()
internals.

Thanks,
Daniel
diff mbox series

Patch

diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index ea2516374d92..7a3d9de5f315 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -100,7 +100,7 @@  Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
 size parameter, and the value of the constant matters for program safety, __k
 suffix should be used.
 
-2.2.2 __uninit Annotation
+2.2.3 __uninit Annotation
 -------------------------
 
 This annotation is used to indicate that the argument will be treated as
@@ -117,6 +117,27 @@  Here, the dynptr will be treated as an uninitialized dynptr. Without this
 annotation, the verifier will reject the program if the dynptr passed in is
 not initialized.
 
+2.2.4 __opt Annotation
+-------------------------
+
+This annotation is used to indicate that the buffer associated with an __sz or __szk
+argument may be null. If the function is passed a nullptr in place of the buffer,
+the verifier will not check that length is appropriate for the buffer. The kfunc is
+responsible for checking if this buffer is null before using it.
+
+An example is given below::
+
+        __bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk)
+        {
+        ...
+        }
+
+Here, the buffer may be null. If buffer is not null, it at least of size buffer_szk.
+Either way, the returned buffer is either NULL, or of size buffer_szk. Without this
+annotation, the verifier will reject the program if a null pointer is passed in with
+a nonzero size.
+
+
 .. _BPF_kfunc_nodef:
 
 2.3 Using an existing kernel function
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 738776ab8838..8ddb4af1a501 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4033,7 +4033,7 @@  __skb_header_pointer(const struct sk_buff *skb, int offset, int len,
 	if (likely(hlen - offset >= len))
 		return (void *)data + offset;
 
-	if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
+	if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
 		return NULL;
 
 	return buffer;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8d368fa353f9..26efb6fbeab2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2167,13 +2167,15 @@  __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
  * bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data.
  * @ptr: The dynptr whose data slice to retrieve
  * @offset: Offset into the dynptr
- * @buffer: User-provided buffer to copy contents into
- * @buffer__szk: Size (in bytes) of the buffer. This is the length of the
- *		 requested slice. This must be a constant.
+ * @buffer__opt: User-provided buffer to copy contents into.  May be NULL
+ * @buffer__szk: Size (in bytes) of the buffer if present. This is the
+ *               length of the requested slice. This must be a constant.
  *
  * For non-skb and non-xdp type dynptrs, there is no difference between
  * bpf_dynptr_slice and bpf_dynptr_data.
  *
+ *  If buffer__opt is NULL, the call will fail if buffer_opt was needed.
+ *
  * If the intention is to write to the data slice, please use
  * bpf_dynptr_slice_rdwr.
  *
@@ -2190,7 +2192,7 @@  __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
  * direct pointer)
  */
 __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset,
-				   void *buffer, u32 buffer__szk)
+				   void *buffer__opt, u32 buffer__szk)
 {
 	enum bpf_dynptr_type type;
 	u32 len = buffer__szk;
@@ -2210,15 +2212,17 @@  __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
 	case BPF_DYNPTR_TYPE_RINGBUF:
 		return ptr->data + ptr->offset + offset;
 	case BPF_DYNPTR_TYPE_SKB:
-		return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer);
+		return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer__opt);
 	case BPF_DYNPTR_TYPE_XDP:
 	{
 		void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset + offset, len);
 		if (xdp_ptr)
 			return xdp_ptr;
 
-		bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer, len, false);
-		return buffer;
+		if (!buffer__opt)
+			return NULL;
+		bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer__opt, len, false);
+		return buffer__opt;
 	}
 	default:
 		WARN_ONCE(true, "unknown dynptr type %d\n", type);
@@ -2230,13 +2234,15 @@  __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
  * bpf_dynptr_slice_rdwr() - Obtain a writable pointer to the dynptr data.
  * @ptr: The dynptr whose data slice to retrieve
  * @offset: Offset into the dynptr
- * @buffer: User-provided buffer to copy contents into
- * @buffer__szk: Size (in bytes) of the buffer. This is the length of the
- *		 requested slice. This must be a constant.
+ * @buffer__opt: User-provided buffer to copy contents into. May be NULL
+ * @buffer__szk: Size (in bytes) of the buffer if present. This is the
+ *               length of the requested slice. This must be a constant.
  *
  * For non-skb and non-xdp type dynptrs, there is no difference between
  * bpf_dynptr_slice and bpf_dynptr_data.
  *
+ * If buffer__opt is NULL, the call will fail if buffer_opt was needed.
+ *
  * The returned pointer is writable and may point to either directly the dynptr
  * data at the requested offset or to the buffer if unable to obtain a direct
  * data pointer to (example: the requested slice is to the paged area of an skb
@@ -2267,7 +2273,7 @@  __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
  * direct pointer)
  */
 __bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 offset,
-					void *buffer, u32 buffer__szk)
+					void *buffer__opt, u32 buffer__szk)
 {
 	if (!ptr->data || bpf_dynptr_is_rdonly(ptr))
 		return NULL;
@@ -2294,7 +2300,7 @@  __bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 o
 	 * will be copied out into the buffer and the user will need to call
 	 * bpf_dynptr_write() to commit changes.
 	 */
-	return bpf_dynptr_slice(ptr, offset, buffer, buffer__szk);
+	return bpf_dynptr_slice(ptr, offset, buffer__opt, buffer__szk);
 }
 
 __bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fbcf5a4e2fcd..708ae7bca1fe 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9398,6 +9398,11 @@  static bool is_kfunc_arg_const_mem_size(const struct btf *btf,
 	return __kfunc_param_match_suffix(btf, arg, "__szk");
 }
 
+static bool is_kfunc_arg_optional(const struct btf *btf, const struct btf_param *arg)
+{
+	return __kfunc_param_match_suffix(btf, arg, "__opt");
+}
+
 static bool is_kfunc_arg_constant(const struct btf *btf, const struct btf_param *arg)
 {
 	return __kfunc_param_match_suffix(btf, arg, "__k");
@@ -10464,13 +10469,17 @@  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			break;
 		case KF_ARG_PTR_TO_MEM_SIZE:
 		{
+			struct bpf_reg_state *buff_reg = &regs[regno];
+			const struct btf_param *buff_arg = &args[i];
 			struct bpf_reg_state *size_reg = &regs[regno + 1];
 			const struct btf_param *size_arg = &args[i + 1];
 
-			ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1);
-			if (ret < 0) {
-				verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
-				return ret;
+			if (!register_is_null(buff_reg) || !is_kfunc_arg_optional(meta->btf, buff_arg)) {
+				ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1);
+				if (ret < 0) {
+					verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
+					return ret;
+				}
 			}
 
 			if (is_kfunc_arg_const_mem_size(meta->btf, size_arg, size_reg)) {